Searching our Sina and QQ Weibo archive

Screenshot at 2012-01-19 17:15:45

We had a search engine built a while ago for Sina Weibo archive, and since yesterday, also for the QQ Weibo archive. We use Lucene as the indexer (to do quick full-text searches) and then store all linked information in our standard database. The difference with the real search engines provided on the Sina and QQ Weibo websites is that we don’t currently implement any weighing, and the results are just everything we got, ordered by publication date.

We index at every four hours, so there’s at least a 30 minutes delay, and at most around 4 hrs 30 minutes. There’s paging, too. Because we’re not Google, be sure to understand that queries normally take up to 1 minute to run (more if there’s lots of activity on the server). The search by region / province on the Sina search is also uber-slow.

Cool feature: you can link directly to searches! For instance, if you were interested in racing celebrity Han Han (韩寒) who has been under fire recently, you may use a link such as these:
http://research.jmsc.hku.hk/social/search.py/qqweibo/?q=韩寒
http://research.jmsc.hku.hk/social/search.py/sinaweibo/?q=韩寒

Other cool feature: Google Translate! Write your search query in your language, and behind the scenes, we’ll try to send a query to the Google Translate API. You’ll know whether it worked when you get your results.

4 responses to “Searching our Sina and QQ Weibo archive”

  1. mr.周 says:

    Great feature, thanks. I have a question though.

    Search with a date-range doesn’t respond, so I seem to be doing something wrong. For example, [2012-01-01[00:00]] as start date should work right?

  2. Great resource.

    Thanks for making it available.

  3. mr.周 says:

    Just saw your reply, it worked this morning, thanks.

Leave a Reply