I have a bunch of known addresses or locations as strings (varchars) in my database and I want to store as points in a spatial database. And I also don’t want to query the Google Maps API every time I need to place them on a map. What do I do? I geocode them using the Google Maps API Geocoding Service, and more specifically the Web service directly accessible through a simple URL. Demonstration with two of the supported formats (JSON and CSV):
http://maps.google.com/maps/geo?q=China&output=json&sensor=false
http://maps.google.com/maps/geo?q=China&output=csv&sensor=false
Note: my script uses the Geocoding V2 Web Service, but could be easily adapted to take advantage of V3’s extra features.
Here’s my geocode.py script:
import sys import pg import httplib import time import string import urllib import urllib2 if len(sys.argv) > 1: q = "" for i in range(1,len(sys.argv)): q += " " + string.lower(sys.argv[i]) q = string.strip(q) pgconn = pg.connect('YOUR_DB', '127.0.0.1', 5432, None, None, 'YOUR_DBUSER', 'YOUR_PASSWORD') key = 'YOUR_GOOGLE_KEY' #path = "/maps/geo?q=%(q)s&gl=ca&sensor=false&output=csv&key=%(key)s" % {'q' : urllib.urlencode(q), 'key' : key} china_bounds = '18.0,73.45|53.55,134.8' values = {'q' : q, 'key' : key, 'sensor' : 'false', 'output' : 'csv', 'region' : 'cn', 'bounds' : china_bounds} data = urllib.urlencode(values) headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "text/plain"} conn = httplib.HTTPConnection("maps.google.com") try: conn.request("GET", "/maps/geo?" + data) #print data except (Exception): sys.exit(sys.exc_info()) r = conn.getresponse() if r.status == 200: a = r.read().split(',') acc = a[1] lat = a[2] lng = a[3] print '%(status)s: %(latlng)s' % {'status' : r.status, 'latlng' : lng + ',' + lat} if lat == "0" and lng == "0": sys.exit("Location not found for "%s" " % q) wkt = "POINT(" + lng + " " + lat + ")" sel = pgconn.query("SELECT * FROM google_geocoding WHERE query = '%(q)s' " % { 'q': q, 'wkt': wkt}) res = sel.dictresult() if sel.ntuples() > 0: pgconn.query("UPDATE google_geocoding SET point = ST_GeomFromText('%(wkt)s',4326), fetchdate = NOW() WHERE query = '%(q)s' " % { 'q': q, 'wkt': wkt}) print "UPDATE: writing over last fetchdate of " + res[0]['fetchdate'] else: pgconn.query("INSERT INTO google_geocoding (query, fetchdate, point) VALUES ('%(q)s', NOW(), ST_GeomFromText('%(wkt)s',4326)) " % { 'q': q, 'wkt': wkt}) print "INSERT: a new row was added" else: print 'error: %s' % r.status else: sys.exit("usage: getPoint.py [query]")
This Python script takes a single query (potentially with spaces) as an argument and sends it to the Google Geocoding Web Service. It gets back the result, parses it, and puts it in a database table called google_geocoding, which I use later. It has four columns: a unique id, the query string, the point geometry column, and the timestamp of its last update (I made my table unique on the query string).
In a practical use of this script, on the command-line, I would read from a file containing the strings to query line by line (in a loop), and then send it to the python script. Here’s one improvised:
for i in `cat text_file_with_strings_to_query.txt` do echo $i python ./geocode.py $i done
Leave a Reply
You must be logged in to post a comment.