Getting data from the Facebook Graph API + a script

Screenshot-Facebook Developers - Google Chrome

After the last few weeks spent on tweaking the tools for data-grabbing on Sina Weibo and Twitter, we’re now moving on to Facebook. Or going back to it, as this was what I was on before focusing on the microblogs.

Python script to get data from the Facebook Graph API:

Facebook has a very interesting API, which it dubs its Graph API. It basically models all entities of Facebook, let them be users (people), events, pages, groups, but also stuff like your links or even Facebook messages “mail” as nodes of Mark Zuckerberg’s wacky vision of organising the world’s social information as points and arrow, or rather vertices and edges in graph theory. It’s people and things and their connections, sitting in a database and rendered as the interface of or on your mobile device or more eventually (a sky map of Facebook, anyone?).

Basically, just like for Twitter and Sina Weibo’s open APIs, you can use Facebook’s API to access the same information you would be able to see as a normal user and perhaps retrieve it to your local storage for future use, for comparisons over time or other analysis that are more practical to do on your local server. Only this time, you can automate the process and do it without involving web crawlers — those are not tailor-made, and the API provides a single standard format, JSON.

Without even logging in, you can already get information our of searches on the Graph API (see Openbook’s experiment). For instance, a search on Hong Kong gives you this: If you knew my Facebook username, then you could point to the Graph API and get my basic info: full name, username, locale, gender ( All of this is not very special: it’s just the same data that you could get on a search on Google without logging in to Facebook.

What becomes powerful is when you get an access token (associated with a dummy app that you presumably created), after following the instructions on the documentation, which allows you to navigate the same data as a logged in user. It’s becoming interesting, because you can then navigate data like membership to groups (how about co-membership?) or let’s say the popularity of a link that was shared across Facebook by different users. That’s very interesting for social network analysis.

In any case, I started writing a Python script to try to manage the different use cases of the Graph API to either output to CSV or store in a PostgreSQL database. I am posting here a first almost-vanilla version of this on our GitHub:

What’s popular on Sina Weibo in Hong Kong

Screenshot-Sina Weibo in Hong Kong - Google Chrome

We’ve been working on a system that archives tweets from Sina Weibo (the biggest Twitter clone in China) users living in Hong Kong. We have been following close to 60,000 users and compiled the contents of their latest statuses, along with available metadata. This is a screenshot of the report page that we are building. For instance, you would be able to see what have been the most retweeted/reposted entries. Unsurprisingly, the vast majority of the stuff being reposted is non-political, and rather focused on entertainment and memes (horoscope, lucky forwards, etc.).

One of the most esoteric posting, and most popular overall in the past two weeks, was a tweet on the NASA’s big big surprise announcement to be made last week. It turned out (just) to be that the NASA found the youngest black hole ever discovered in our vicinity. No big deal, but short of the end of the world that Chinese netizens were expecting, Hong Kong ones included.

The gender imbalance on the Chinese Weibosphere

Gender divide on Sina Weibo in Hong Kong (n=53,821)
3 Sina Weibo users out of 4 in Hong Kong are women

Gender divide on Sina Weibo across the network (n=539,274)
57% of Sina Weibo users (China+World) are women, just like it is on Twitter, actually

“Sina Weibo users living in HK: 40,268 women女, but only 13,553 men男? Around the world: 307,916女, 231,358男. Is the API or my sample wrong?”

– From my personal Twitter

There might be more men than women in China, but that is just not the case on the Sina Weibo online social network. Just like location, gender is a required field when you sign up for a Sina Weibo account, and while doing a simple database query the other day, I found this interesting statistic: 3 Weibo users out of 4 in Hong Kong were women, and the proportion of women in Weibo across the world is of 57%, a difference of 14 percentage points.

We collected data on the users using Sina’s public search API, so these are active users over the course of about a month. While the Hong Kong sample seemed mysteriously skewed, the China+World one sample is a lot more expected, at 57% of women, versus 43% of men, which is consistent with data for Twitter.

The China+World sample comprised of a big Hong Kong sample (the focus of our main research) and of the 400 or more most popular users by followers in each province/city of China and world regions (a lot more for big Chinese cities like Beijing, Shanghai, Chongqing, etc.). Even if we didn’t count the Hong Kong sample of about 55,000 users, the gender imbalance is still notesworthy: 267,648女Women (55.13%), 217,805男Men (44.87%).

David McCandless of Information is Beautiful did a nice graphic of the gender imbalance in Western online social media.