What an inconsistent API…

The Sina Weibo API will not always return you what you expect. The web interface is one way to access the data posted on Weibo, but the API (application programming interface) is what programmers and applications will talk to. If a post is gone (for whatever reason) from the Weibo website, is it also really gone from Weibo’s databases and gone if you wanted to access it through the API?

Take the example of post #3351528048407216, made by a user called 北京徐晓, an author living in Beijing. It was posted on Monday night, just passed midnight.

The microblogger wrote: “太好玩了,党报《光明日报》网站发表文章,抨击骆家辉轻车简从的背后,是资本主义及西方价值观的渗透,是美国的“新殖民主义”、“文化殖民主义”的体现。恼羞成怒挂不住了不如直接说,何必这么牛头不对马嘴的瞎拽呢?”. It google-translates to “Too much fun, the party newspaper “Guangming Daily” Web site published an article criticizing Locke pomp behind the Western values ​​of capitalism and the infiltration of America’s “new colonialism”, “cultural colonialism” is all about. Angry embarrassing as a direct say, irrelevant of the blind so why pull it?”

The article (our snapshot) was one of the most popular in the last 24 hours. Traces of the post cannot be found on the user’s timeline (see screenshot): there is now a gap between 00:37 and 00:48, whereas the post was made at 00:46 on August 29th.

The following screenshot shows how it now appeared on the site of one of the users (we counted 27,536 posts in our archive so far) who reposted the post in the meanwhile:

The message on the website is “該微博已被刪除”, which is “This Weibo has already been deleted” (example here). It’s different from the message “此微博已被原作者刪除”, which is “This Weibo has been deleted by the user” (example here), and which may also appear on your timeline when a post you reposted was deleted by the original user (but your message remains intact).

What is it now, if you take the ID of the post (3351528048407216) and query it against the API (link, may not work if not logged in Weibo)? You get that the post is still accessible from a programmers’ standpoint:

EDIT 2012-02-02: It seems like Sina has changed the deleted posts error message. From the normal website, self-deleted and presumable system-deleted posts are indistinguishable now. But if you look at deleted posts through the API (using the statuses/show function), they are definitely not: a self-deleted post says “weibo does not exist” and the system-deleted posts says “permission denied”. We just started investigating different deleted posts through a fully automated method.

