A new tweet-gathering tool + sample with #liuxiaobo and 刘晓波 as search words

We’ve developed a simple tool that uses the Twitter API to collect tweets onto our local database at the JMSC. We have initially selected a list of 1000 China-based tweeters, removed those with private profiles (which are impossible to gather), and proceeded to download their 3200 last tweets (the maximum allowed by the API), starting a few weeks ago. For some, it meant tweets as old as 2008, and in most cases at least well into 2009. And since this week, I’ve started to continuously collect the tweets four times a day.

The tool has a web interface, but it is not yet ready to be released to the public because of load issues and other unresolved questions. Twitter is notoriously bad at providing meaningful search, because of the large volume of users. But since we are keeping track of only about 1000 tweeters, and most of them from a list of “influential” tweeters, we hope that we can give more sense to this particular (and authoritative) slice of the Chinese Twittersphere. We hope to eventually have data of a more “popular” Hong Kong Twittersphere, the focus of our research (the problem is how to select a sample of the public, which makes the latter more difficult).

For now, I can provide CSV files of individual search queries if you send an e-mail to me at cedsam@hku.hk. The files may look like the following CSV files with #liuxiaobo and 刘晓波 as search words.

Leave a Reply