We are now on GitHub!

Naturally, to share code, there is no equal to GitHub, a Web-based hosting service for code using the popular Git revision control. In plain language, it means that the code that I am writing, mostly for the JMSC nowadays, will be available on GitHub.

To start things, I put some of the Python scripts that I wrote for our online social media research project: http://github.com/JMSCHKU/Social/

For those who are unfamiliar with GitHub, it has been ubiquitous whenever I needed source code (mostly for compiling into usable programs). I assume that many people already know of SourceForge, the open-source code repository started at the turn of the century. GitHub innovates compared with SF.net by decentralizing version control.

For starters,


Python code to get tweets through Twitter API

I’m just posting the current version of a series of Python scripts that I’ve been using for the past week or so to fetch data through the Twitter API using OAuth. Since the beginning of September, Twitter has required developers to authenticate through OAuth to access certain of its functions, including the very useful statuses/user_timeline. This function gets you the tweets (statuses) from a user timeline, up to 3200, split up by 200 per page.

I particularly optimized the script so that it doesn’t make you go through the limit of 350 authenticated requests to the API and make you miss some tweets. You can adjust the sleep times if you want, but it’s been pretty reliable so far for me: I’m making up to 16 calls per each of 1000 users in my list. Sleep times were also adjusted to not bombard Twitter, and also to try a same request again if it failed, up to a certain number of times (changeable in the code) until the program stops, writes a complaint to the CSV output and exits (particularly for option 4 and 5, which I spent more time developing).

Here’s the link. This function is “option 5” in my scripts bundled up in twitter.oauth.py.

Option 4 is to do the same search but by user_id, which may not be as easy to find (you have to look in the Atom feed URL). You need to replace some values in the script, like the app id and tokens, etc. Option 3 fetches user info by screen name and stores to a DB (4 and 5 only writes to a CSV).

This script works on UNIX, and hasn’t been tested on Windows, but probably won’t work unless some minor customizations for file I/O, I think.