Feeding WordPress articles to a Google Map

Soon after the release of the China Media Map last week, we implemented a new feature of the map : geo-tagged events. Unfortunately, there isn’t yet a content management system running behind the map, and just a plain database that isn’t easy to update for our editors.

What we decided upon to feed the map with points of interest ended up being the same WordPress blog powering the CMP site. I used two third-party WordPress plugins, wp-json and WP Geo.

WP Geo was chosen for its user-friendly interface that my colleague David Bandurski uses to geotag articles, with the latitude, longitude, zoom level, etc., stored as numbers (rather than in true spatial representation) in specific WordPress meta tables. wp-json in turn extends the feed system by providing a feed of latest entries. I modified the script a little (my version of wp-json.php) so that it also generates the fields stored by WP Geo, as well as to feed by the category specified in the URL. You should change the “~” that we use in place of the default “category” string used to indicate category archives). For better management of articles made specifically for the map, I created special categories starting with “geo-” and whose articles are not displayed on the regular website. Here’s an example of the geo-events JSON feed.

Screenshot-China Media Map - Events

From the map, you would asynchronously get the JSON file from cmp.hku.hk. But being on a different server and domain name, you would necessarily encounter the client-side cross-domain security problem. The solution? Use a proxy script. I wrote one in Python (proxy.py) that goes as follows:

#!/usr/bin/env python
#-*- coding: utf-8
import sys
import pg
import mod_python
import json
import pycurl
import StringIO

def cmp_jsonfeed(req, p):
	prefix = "http://cmp.hku.hk/~/"
	suffix = "/feed/json/"
	url = prefix + p + suffix
	c = pycurl.Curl()
	b = StringIO.StringIO()
	c.setopt(pycurl.URL, url)
	#c.setopt(pycurl.HTTPHEADER, ["Content-Type: text/javascript"])
	c.setopt(pycurl.WRITEFUNCTION, b.write)
	c.setopt(pycurl.FOLLOWLOCATION, 1)
	c.setopt(pycurl.MAXREDIRS, 5)
	c.perform()
	return b.getvalue()

I tried the best I could to be secure and not allow just everyone to use my script to make proxy requests of anything. Argument “p” sticks itself between specified prefix and suffix. The following link gets the geo-events JSON feed, brings it to the same domain where you’re making the asynchronous javascript call from, thereby bypassing the cross-domain problem:

http://jmsc.no-ip.org/maps/cmp/proxy.py/cmp_jsonfeed?p=geo-events

The rest is just a matter of parsing the JSON feed in JavaScript and creating the markers and other spaces to fill with the data.


Python script for automated geocoding with Google’s Geocoding Web Service

I have a bunch of known addresses or locations as strings (varchars) in my database and I want to store as points in a spatial database. And I also don’t want to query the Google Maps API every time I need to place them on a map. What do I do? I geocode them using the Google Maps API Geocoding Service, and more specifically the Web service directly accessible through a simple URL. Demonstration with two of the supported formats (JSON and CSV):

http://maps.google.com/maps/geo?q=China&output=json&sensor=false

http://maps.google.com/maps/geo?q=China&output=csv&sensor=false

Note: my script uses the Geocoding V2 Web Service, but could be easily adapted to take advantage of V3’s extra features.

Here’s my geocode.py script:

import sys
import pg
import httplib
import time
import string
import urllib
import urllib2

if len(sys.argv) > 1:
	q = ""
	for i in range(1,len(sys.argv)):
		q += " " + string.lower(sys.argv[i])
	q = string.strip(q)
	pgconn = pg.connect('YOUR_DB', '127.0.0.1', 5432, None, None, 'YOUR_DBUSER', 'YOUR_PASSWORD')
	key = 'YOUR_GOOGLE_KEY'
	#path = "/maps/geo?q=%(q)s&gl=ca&sensor=false&output=csv&key=%(key)s" % {'q' : urllib.urlencode(q), 'key' : key}
	china_bounds = '18.0,73.45|53.55,134.8'
	values = {'q' : q, 'key' : key, 'sensor' : 'false', 'output' : 'csv', 'region' : 'cn', 'bounds' : china_bounds}
	data = urllib.urlencode(values)
	headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "text/plain"}
	conn = httplib.HTTPConnection("maps.google.com")
	try:
		conn.request("GET", "/maps/geo?" + data)
		#print data
	except (Exception):
		sys.exit(sys.exc_info())
	r = conn.getresponse()
	if r.status == 200:
		a = r.read().split(',')
		acc = a[1]
		lat = a[2]
		lng = a[3]
		print '%(status)s: %(latlng)s' % {'status' : r.status, 'latlng' : lng + ',' + lat}
		if lat == "0" and lng == "0":
			sys.exit("Location not found for "%s" " % q)
		wkt = "POINT(" + lng + " " + lat + ")"
		sel = pgconn.query("SELECT * FROM google_geocoding WHERE query = '%(q)s' " % { 'q': q, 'wkt': wkt})
		res = sel.dictresult()
		if sel.ntuples() > 0:
			pgconn.query("UPDATE google_geocoding SET point = ST_GeomFromText('%(wkt)s',4326), fetchdate = NOW() WHERE query = '%(q)s' " % { 'q': q, 'wkt': wkt})
			print "UPDATE: writing over last fetchdate of " + res[0]['fetchdate']
		else:
			pgconn.query("INSERT INTO google_geocoding (query, fetchdate, point) VALUES ('%(q)s', NOW(), ST_GeomFromText('%(wkt)s',4326)) " % { 'q': q, 'wkt': wkt})
			print "INSERT: a new row was added"
	else:
		print 'error: %s' % r.status
else:
	sys.exit("usage: getPoint.py [query]")

This Python script takes a single query (potentially with spaces) as an argument and sends it to the Google Geocoding Web Service. It gets back the result, parses it, and puts it in a database table called google_geocoding, which I use later. It has four columns: a unique id, the query string, the point geometry column, and the timestamp of its last update (I made my table unique on the query string).

In a practical use of this script, on the command-line, I would read from a file containing the strings to query line by line (in a loop), and then send it to the python script. Here’s one improvised:

for i in `cat text_file_with_strings_to_query.txt`
	do echo $i
	python ./geocode.py $i
done

Using MapServer to feed tiles to Google Maps

While working on the CMP‘s China Media Map, I had the chance to use MapServer, an Open Source platform for publishing spatial data and interactive mapping applications on the Web.

We start with a need: display big datasets of geo-tagged data in an efficient manner for the Web. What do you do? We have over 4000 data points corresponding to individual news agencies in China. Almost each of these points (containing an address) were geotagged using Google’s geocoding service (method shown in another entry). All these points are modeled in a Postgis-enabled PostgreSQL database. We use Google Maps as our display platform. How can we display the points?

One of the early versions, which I knew wouldn’t work in a production environment, was to generate all points as markers. Of course, when you are in the lowest zoom level, at the country-level (China, here), 4000 points risks crashing any more or less recent and decent computer. The alternative is to give users a preview of those 4000+ points, without creating the marker object. What do you do? You check this video out first:

This was shown at Google I/O 2009, and the part of interest is at about 26:31 into the video. This is where the presenter talks about tiny markers. How does Google do it? They don’t actually go into details in the video.

MapServer is the tool you need to use. One of its many uses is to help you create your own custom layer. As you might know, Google Maps works by feeding you 256x256px squares of their map data. These tiles are generated by Google’s in-house map tiles server, which gives the following:

http://mt0.google.com/vt/lyrs=m@124&hl=en&x=836&y=446&z=10&s=Ga

But it is possible to produce your own, with data coming from your own spatial database. Google has an outline of how it’s possible to integrate these easily with their version 3 of the Google Maps API (still in Labs).

You may use MapServer, which has binaries available the major platforms (Windows, Mac, Linux). I use Linux, the Ubuntu distro, so all I had to do was set up the UbuntuGIS Personal Package Archive (PPA) in Unstable to get the latest version of all the GIS software you need.

Use a configuration file such as the one I have: news.map

MAP
	DEBUG	5
	CONFIG "MS_ERRORFILE" "/home/csam/geo/mapserver/error.log"
	FONTSET "/home/csam/geo/mapserver/fontset.txt"
	IMAGETYPE	PNG
	#EXTENT	105 15 132 54
	EXTENT	-40.900557 -99.293024 68.194131 174.885971
	#EXTENT	-141.018073 41.676949 -52.582296 89.999427
	SIZE	256 256
	IMAGECOLOR	255 255 255
	TRANSPARENT	ON
	OUTPUTFORMAT
		NAME png
		DRIVER "GD/PNG"
	MIMETYPE "image/png"
		IMAGEMODE RGB
	EXTENSION "png"
	FORMATOPTION "INTERLACE=OFF"
	END
	PROJECTION	"init=epsg:4326"
	END
	WEB
	METADATA
		WMS_SRS "EPSG:4326 EPSG:900913"
	END
	END
	LAYER
		NAME		news_overview
	CONNECTIONTYPE POSTGIS
	CONNECTION "YOUR_OWN_CONNECTION_STRING"
	DATA	"point FROM (SELECT g.gid, g.point, na.name AS placename FROM google_geocoding AS g LEFT JOIN news_agencies AS na ON g.query = lower(na.address) WHERE na.new_id IS NOT NULL UNION SELECT g.gid, g.point, na.name AS placename FROM google_geocoding AS g LEFT JOIN news_agencies AS na ON g.query = lower(na.name) WHERE na.address IS NULL AND na.new_id IS NOT NULL) AS subquery USING UNIQUE gid USING srid=4326 "
	STATUS	ON
	TYPE	POINT
	CLASS
		NAME	"Points"
		STYLE	
		SIZE	3
		COLOR   255 0 0 
		END
	END
	END
	LAYER
		NAME		news
	CONNECTIONTYPE POSTGIS
	CONNECTION "YOUR_OWN_CONNECTION_STRING"
	DATA	"point FROM (SELECT g.gid, g.point, na.name AS placename FROM google_geocoding AS g LEFT JOIN news_agencies AS na ON g.query = lower(na.address) WHERE na.new_id IS NOT NULL UNION SELECT g.gid, g.point, na.name AS placename FROM google_geocoding AS g LEFT JOIN news_agencies AS na ON g.query = lower(na.name) WHERE na.address IS NULL AND na.new_id IS NOT NULL) AS subquery USING UNIQUE gid USING srid=4326 "
	STATUS	ON
	TYPE	ANNOTATION
	CLASS
		NAME	"pimple"
		STYLE
		SYMBOL	"pimple"
		SIZE	10
		COLOR   0 0 0 
		END
			LABEL
				PARTIALS FALSE
				FORCE TRUE
			END
		TEXT ' '
	END
	END
	SYMBOL
		NAME 'pimple'
	TYPE PIXMAP
	IMAGE "google-pimple.png"
	TRANSPARENT 0
	END

	LAYER
		NAME		labels
	CONNECTIONTYPE POSTGIS
	CONNECTION "YOUR_OWN_CONNECTION_STRING"
	DATA	"point FROM (SELECT g.gid, g.point, ST_AsText(g.point) AS labl FROM google_geocoding AS g LEFT JOIN news_agencies AS na ON g.query = lower(na.address) WHERE na.new_id IS NOT NULL ) AS subquery USING UNIQUE gid "
	STATUS	ON
	TYPE	ANNOTATION
	MAXSCALE 100000000000 
	LABELITEM labl
	CLASS 
		LABEL 
		ANGLE auto 
		SIZE 10 
		COLOR 192 0 0
		TYPE truetype 
		FONT arial
		END
	END 
	METADATA
		WMS_SRS "EPSG:4326 EPSG:900913"
	END
	END
END

Point to the mapserv CGI with the appropriate arguments. map is the path to your mapscript configuration file, and mode is tile for this purpose, and tilemode will be gmap for Google Maps (could be Bing Maps and OpenStreetMap too).

This is a link to a tile on the China Media Map:
http://jmsc.no-ip.org/cgi-bin/mapserv?map=/home/csam/geo/mapserver/news.map&layers=news&mode=tile&tilemode=gmap&tile=3+1+2

Both the MapServer and PostGIS documentation sites give you examples of how to write the configuration file. The open source geospatial consortium also has a page for showing you what PostGIS is, and how to use MapServer with it for the Web.