Blog of Sara Jakša

How to get Tumblr Posts with API

I need to create a project for my Business Intelligence course, which involved making some sort of prototype. It needs to be some sort of model, be it anything from OLAP cubes to text mining and sentiment analysis to dashboard for data visualization.

I eventually decided that I wanted to make a application for discovering the MBTI type from the words that people use. There is some research, that shows that people use different words depending on their Big Five personality results. So there is some merit in this project.

Originally I wanted to use twitter, because it is one of the three big social networks (I think they are Facebook, Twitter and LinkedIn), but they require the phone number to use their API. And I am not paying to get a phone, just so I would be able to use their data. And I did not saw much of MBTI conversations going on in the Facebook or LinkedIn.

On the other hand, I was aware of quite a couple of blogs on Tumblr that I could use in my research. So it seems to be a good alternative. Might not give such an accurate results. We will see.

I currently found out how to find posts by tag, get each post, and from the posts, get to the blog description. This should be helpful in finding out in the systematic way, which blogs have MBTI code written in description. So far I figured if there is only one MBTI code, then this blog belongs to that code.

Here is the code for that:

    import pytumblr
    import re

    #this is a regex, to be able to get the url of the Tumblr blog from the url post
    tumblr_url = r"\w+.tumblr.com"

    # Authenticate via OAuth
    client = pytumblr.TumblrRestClient("put your keys here")

    #find one posts tagged MBTI
    post = client.tagged('MBTI', limit=1)

    #get information out of the post
    i = 0
    post_id = post[i][u"id"]
    title = post[i][u"title"]
    body = post[i][u"body"]
    tags = post[i][u"tags"]
    post_url = post[i][u"post_url"]

    #now find the blog url from the post url
    user_url = re.search(tumblr_url, post_url)
    user_url = user_url.group()

    #get the information about the blog
    fictionalcharactermbti = client.blog_info(user_url)

    #get blog description and number of posts
    blog_description = fictionalcharactermbti[u"blog"][u"description"]
    number_of_posts = fictionalcharactermbti[u"blog"][u"posts"]

    print(post_url)
    print(title)

    print(user_url)
    print(blog_description)
    print(number_of_posts)