Python script for automated geocoding with Google’s Geocoding Web Service

I have a bunch of known addresses or locations as strings (varchars) in my database and I want to store as points in a spatial database. And I also don’t want to query the Google Maps API every time I need to place them on a map. What do I do? I geocode them using the Google Maps API Geocoding Service, and more specifically the Web service directly accessible through a simple URL. Demonstration with two of the supported formats (JSON and CSV):

http://maps.google.com/maps/geo?q=China&output=json&sensor=false

http://maps.google.com/maps/geo?q=China&output=csv&sensor=false

Note: my script uses the Geocoding V2 Web Service, but could be easily adapted to take advantage of V3’s extra features.

Here’s my geocode.py script:

import sys
import pg
import httplib
import time
import string
import urllib
import urllib2

if len(sys.argv) > 1:
	q = ""
	for i in range(1,len(sys.argv)):
		q += " " + string.lower(sys.argv[i])
	q = string.strip(q)
	pgconn = pg.connect('YOUR_DB', '127.0.0.1', 5432, None, None, 'YOUR_DBUSER', 'YOUR_PASSWORD')
	key = 'YOUR_GOOGLE_KEY'
	#path = "/maps/geo?q=%(q)s&gl=ca&sensor=false&output=csv&key=%(key)s" % {'q' : urllib.urlencode(q), 'key' : key}
	china_bounds = '18.0,73.45|53.55,134.8'
	values = {'q' : q, 'key' : key, 'sensor' : 'false', 'output' : 'csv', 'region' : 'cn', 'bounds' : china_bounds}
	data = urllib.urlencode(values)
	headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "text/plain"}
	conn = httplib.HTTPConnection("maps.google.com")
	try:
		conn.request("GET", "/maps/geo?" + data)
		#print data
	except (Exception):
		sys.exit(sys.exc_info())
	r = conn.getresponse()
	if r.status == 200:
		a = r.read().split(',')
		acc = a[1]
		lat = a[2]
		lng = a[3]
		print '%(status)s: %(latlng)s' % {'status' : r.status, 'latlng' : lng + ',' + lat}
		if lat == "0" and lng == "0":
			sys.exit("Location not found for "%s" " % q)
		wkt = "POINT(" + lng + " " + lat + ")"
		sel = pgconn.query("SELECT * FROM google_geocoding WHERE query = '%(q)s' " % { 'q': q, 'wkt': wkt})
		res = sel.dictresult()
		if sel.ntuples() > 0:
			pgconn.query("UPDATE google_geocoding SET point = ST_GeomFromText('%(wkt)s',4326), fetchdate = NOW() WHERE query = '%(q)s' " % { 'q': q, 'wkt': wkt})
			print "UPDATE: writing over last fetchdate of " + res[0]['fetchdate']
		else:
			pgconn.query("INSERT INTO google_geocoding (query, fetchdate, point) VALUES ('%(q)s', NOW(), ST_GeomFromText('%(wkt)s',4326)) " % { 'q': q, 'wkt': wkt})
			print "INSERT: a new row was added"
	else:
		print 'error: %s' % r.status
else:
	sys.exit("usage: getPoint.py [query]")

This Python script takes a single query (potentially with spaces) as an argument and sends it to the Google Geocoding Web Service. It gets back the result, parses it, and puts it in a database table called google_geocoding, which I use later. It has four columns: a unique id, the query string, the point geometry column, and the timestamp of its last update (I made my table unique on the query string).

In a practical use of this script, on the command-line, I would read from a file containing the strings to query line by line (in a loop), and then send it to the python script. Here’s one improvised:

for i in `cat text_file_with_strings_to_query.txt`
	do echo $i
	python ./geocode.py $i
done

Leave a Reply