Thursday, August 28, 2014

Get various information from Facebook Graph API

Finding information from dataset has always fascinated me. And facebook is such a vast reservoir of data, that it's practically a crime for the enthusiasts to leave it untapped.

I've been looking for some way to get different questions answered by facebook.

Like, who has liked my post most? Whose post I liked most? Comments, photos everything can be explored to find these sort of information. There are also a lot of apps for these purposes, but as a developer, you should be able to do it yourself, at least I think so.

I'm going to explain this one: "Who liked your last N number of feeds most?"

In this post, I've explained the way to connect to graph api. Keep in mind that facebook has made the fql obsolete from versions 2.0+. So, being an expert in fql is not going to help much in the future.

Now, facebook has opened up different root nodes in the graph api. They could be found in this link . We are going to use the 'user/feed' node for our purpose. Now, of course facebook is not going to give all data in one request, nor you should expect it to. It returns data in a paged format. There are three types of pagination that facebook uses. Our node uses time based pagination.

Let's look at this code chunk:
import requests
from facepy import GraphAPI
import pprint
from pymongo import MongoClient
from bson.son import SON

client = MongoClient('localhost', 27017)
db = client.likes
like_table = db.like_table

We're going to use mongo a little bit. Due the scope of this post, I'm not going to describe it's installation or usage. This link has very good resources on it.
feeds = graph.get('friends_or_self_id/feed?limit=1')
To find this id, we can simply issue a request in the browser in the address 'http://graph.facebook.com/username'. We could also append parameters like until and since to specify the time period we are concerned about. However, I faced trouble getting them working. So, I've opted for counter instead.
while True:
    try:
        likes = feeds['data'][0]['likes']
        while True:
            try:
                for m in likes['data']:
                    like_table.insert(m)
                likes = requests.get(likes['paging']['next']).json()
            except KeyError:
                print("keyerror in likes pagination")
                break
        if(counter == 30):
            break
        counter = counter + 1
        feeds = requests.get(feeds['paging']['next']).json()
    except KeyError:
        print("keyerror happened")
        break
This shall retrieve the last 30 feeds on the timeline of that user. Then, let's find the most appearance of a particular id. This is where mongo is coming handy.
list = (like_table.aggregate([
    {'$group': {
        '_id': '$id',
        'count': {'$sum': 1}}
    },
    {"$sort": SON([("count", -1), ("_id", -1)])}
]))

for k in list['result']:
    print(k['_id'])
    print(k['count'])
This is just one simple example of the endless possibilities of facebook. I'm a big fan of it just because of it's database. People can practically be 'profiled'.
The code is also hosted in github.