Importing Hong Kong boundaries shapefiles to a database

For current and mostly future projects, I needed digital geographic boundaries of Hong Kong, per district and lower delineations used by the census bureau of Hong Kong.

First of all, I checked out the Hong Kong 2001 Population Census MAP CD-ROM from our HKU Library, with a special arrangement. The CD can be bought from the Census bureau for HK$840, but I am not sure about its use license. As far as we know, there’s no geographic data for the 2006 census, unless it stayed the same (I didn’t verify this).

Inside the CD-ROM, you find a bunch of files which I think you could install (I didn’t need it, so didn’t try to find out). What’s useful to us is contained in two folders, one called SHP and the other E00. They both contain regular flat files in two of the most popular digital cartography formats. In fact, you find the following in the Readme:

(i) Coastline (COAST)
(ii) District Council District Boundary (DC)
(iii) District Council District (Land Area) Boundary (DC_LAND)
(iv) Constituency Area Boundary (DCCA)
(v) Large Tertiary Planning Unit Group Boundary (TPU_LARGE)
(vi) Small Tertiary Planning Unit Group Boundary (TPU_SMALL)
(vii) Tertiary Planning Unit Boundary (TPU)
(viii) Small Street Block Group Boundary (TPUSB_SMALL)
(ix) New Town Boundary (DIST_NT)

You get files for these boundaries in both formats. I am going to use SHP, or Shapefile, which is ESRI’s format, which contains the primary geographic reference data. E00 is used in MapInfo. We don’t have ArcGIS or any ESRI products in our lab, so we will use a PostgreSQL database with Postgis, everyone’s favourite geographic database extension.

I use an utility command called shp2pgsql to get tables with a geo column with the data in the shapefile. There are also “shp2…” converters for MySQL and for exporting as an image.

I follow the postgis documentation, and here’s a slightly modified command that I use (not all options are necessary):

# shp2pgsql -c -I -W big5-hkscs dc_land.shp dc_land > dc_land.sql
# psql -d jmsc -f dc_land.sql

The -W is required if you want to keep the Chinese characters from attribute columns, which in this case with a district land boundaries gives us the district name in Chinese. The right encoding to use is big5-hkscs, which contains an extra set of Chinese characters used only in Cantonese. I used big5 and it gave me an error mistake when trying to process the Sham Shui Po line (because 埗 for Po is actually a Cantonese-only character).

Once in a database, mash with other data (like census data) and export for visualization apps, such as in KML using libkml for Google Earth.

Leave a Reply