To facilitate the sharing of contents, I’ve decided to move my personal work blog to Tumblr. Thus, The Rice Cooker has now become The Electric Rice Cooker.
Screenshot of completed buildings map in Hong Kong (2005-2011)
The following is a map of completed buildings in Hong Kong from 2005 to 2011, according to data from Buildings Department as processed by us (errors may occur): http://opengov.jmsc.hku.hk/datamap/#completed . It is still being worked on as we speak, so bugs might also be found.
Using a text processing tool on Linux, we extracted the text of every section 5.6 of Buildings Department’s PDF monthly digests (here’s one of the 80-something published between 2005 and 2011).
The data was cleaned with Google Refine to the best of our capacities, and mapped with Google Fusion Tables according to the town planning units. We couldn’t do with addresses were often messy, and those in the New Territories often referred to their lot number only.
There are still lots of semi-open data (because not originally in a raw text format) provided by the Hong Kong government that could use a bit of repackaging job. We’ll get back to you soon on this.
Unfortunately, we noticed that we’re not optimized to support external requests to our search tool and other tools mentioned in the previous posts. If you need data, please contact Dr. King-wa Fu directly: kwfu@hku.hk
Last week, I started working with data from Buildings Department, concerning building permits.
Despite the PDF documents being “protected” (preventing copying when opening with Acrobat), you can use a common utility for Linux called lesspipe, a pre-processor for less, that can process many file types into readable text.
Readable does not necessarily mean structured. By no means, the lesspipe output is usable as it (it looks like this after separating the sections and aggregating across different PDF files). With the fantastic Google Refine tool, you can however try your best to parse the data, clean the different fields manually and then even perform geocoding inside the tool (with “Add column by fetching URLs”).
After the cleaning was done (it took a few hours last Thursday, and a few more hours today), I did an export in TSV, and sent it to Google Fusion Tables. I customized the map visualisation with the “month” field, and here is the result:
2005-2011 data for “Table 5.2 Buildings for which building authority has issued demolition consent” from Hong Kong Buildings Department’s monthly digests (alpha)
This is not even close to our final product yet, because the Google Maps JavaScript API V3 now lets you add layers from Fusion Tables data. Effectively, it means that you can build Web applications with different kinds of filters (in pull-down menus, etc.) that dynamically change how the data is displayed. The example here above only shows the single view specified inside Fusion Tables by the owner of the table (me). You could take possibly use the ID of the table (3546150) and make your own visualisation.
For now, the data hasn’t been vetted after refining (maybe the govt will provide us with raw data?), so I would recommend using with high caution as to the validity of the data. It should be largely correct, but some data points may not have been geocoded properly, if at all. For this particular data, corrigendums to Buildings Department monthly digests are not yet taken into account.
Here is another Google Refine + Google Fusion Tables trick on Hong Kong government data:
Map for data from “Short Term Tenancy (STT) Tender Forecast” from Hong Kong Lands Department (alpha)
These are the Short Term Tenancy (STT) Tender Forecast from Lands Department. They are the sites for sale on short term tenancy, for a few years, for uses such as car parks. The color code on this custom map is based on the square meters area of each site for sale (from purple 0-1000 sqm to red for 5000+).
Ever looked for an automated archive of Chinese news websites? For months, we’ve been collecting screenshots and HTML snapshots of up to 20 websites based in China or covering China. We now have a webpage for it.
http://research.jmsc.hku.hk/social/chinanews/index.py/listScreenshots/
The screenshots are classified by news source and with a minimalistic (if not just minimal) interface, organised by day and regrouped by month. For instance, you could go to the QQ News archive, an archive for February 2011, and a particular link to today.
There’s also a version for accessing navigeable HTML pages when available.
I’ll start filtering weibos that contain links with photo.weibo.com and event.weibo.com. It seems like most containing such links go to spam-like galleries. Bad, bad, mega-bad. The amount of spam is just ridiculous today.