We had a search engine built a while ago for Sina Weibo archive, and since yesterday, also for the QQ Weibo archive. We use Lucene as the indexer (to do quick full-text searches) and then store all linked information in our standard database. The difference with the real search engines provided on the Sina and QQ Weibo websites is that we don’t currently implement any weighing, and the results are just everything we got, ordered by publication date.
We index at every four hours, so there’s at least a 30 minutes delay, and at most around 4 hrs 30 minutes. There’s paging, too. Because we’re not Google, be sure to understand that queries normally take up to 1 minute to run (more if there’s lots of activity on the server). The search by region / province on the Sina search is also uber-slow.
Cool feature: you can link directly to searches! For instance, if you were interested in racing celebrity Han Han (韩寒) who has been under fire recently, you may use a link such as these:
http://research.jmsc.hku.hk/social/search.py/qqweibo/?q=韩寒
http://research.jmsc.hku.hk/social/search.py/sinaweibo/?q=韩寒
Other cool feature: Google Translate! Write your search query in your language, and behind the scenes, we’ll try to send a query to the Google Translate API. You’ll know whether it worked when you get your results.
Great feature, thanks. I have a question though.
Search with a date-range doesn’t respond, so I seem to be doing something wrong. For example, [2012-01-01[00:00]] as start date should work right?
Try without the square brackets. I think it works, unless there was a temporary bug. 🙂
Great resource.
Thanks for making it available.
Just saw your reply, it worked this morning, thanks.