Location Data

To begin you will need 2 sources of data, US zip code data, and population statistics.
You can often find this data for free, but you get what you pay for.

Several places to get zip data from

* www.zip-code-database.org
* zipcodedownload.com
* www.zip-codes.com
* http://databases.about.com/od/access/a/zipcodedatabase.htm

You should have at least 60-80K zip codes, and check for regular updates as well.
Try to get something that has city names, county names and full state names as well, I
currently use zipcodedownload.com

Populations statistics are sometimes included with your zip code data, but I generally
put it together from the US census ZCTA data

* http://www.census.gov/geo/ZCTA/zcta.html

The 2000 data set is fine for our purpose.

Load all of these into a database, like mysql for manipulation. You will be generating a zipcode table
and a zcta table, lots of mysql clients offer an import csv feature, or use mysql's Load INFILE.

The schema's I use are
zipcodes

    +------------+---------------+------+-----+---------+----------------+
    | Field      | Type          | Null | Key | Default | Extra          |
    +------------+---------------+------+-----+---------+----------------+
    | id         | int(11)       | NO   | PRI | NULL    | auto_increment |
    | ZIPCode    | char(5)       | YES  | MUL | NULL    |                |
    | ZIPType    | char(1)       | YES  |     | NULL    |                |
    | CityName   | varchar(64)   | YES  |     | NULL    |                |
    | CityType   | char(1)       | YES  |     | NULL    |                |
    | CountyName | varchar(64)   | YES  |     | NULL    |                |
    | CountyFIPS | char(5)       | YES  |     | NULL    |                |
    | StateName  | varchar(64)   | YES  |     | NULL    |                |
    | StateAbbr  | char(2)       | YES  |     | NULL    |                |
    | StateFIPS  | char(2)       | YES  |     | NULL    |                |
    | population | int(11)       | YES  |     | NULL    |                |
    | MSACode    | char(4)       | YES  |     | NULL    |                |
    | AreaCode   | varchar(16)   | YES  |     | NULL    |                |
    | TimeZone   | varchar(16)   | YES  |     | NULL    |                |
    | UTC        | decimal(3,1)  | YES  |     | NULL    |                |
    | DST        | char(1)       | YES  |     | NULL    |                |
    | Latitude   | decimal(9,6)  | YES  |     | NULL    |                |
    | Longitude  | decimal(9,6)  | YES  |     | NULL    |                |
    | density    | decimal(10,0) | YES  |     | NULL    |                |
    +------------+---------------+------+-----+---------+----------------+
    

And
zcta5

    +------------+--------------+------+-----+---------+-------+
    | Field      | Type         | Null | Key | Default | Extra |
    +------------+--------------+------+-----+---------+-------+
    | st         | varchar(255) | YES  |     | NULL    |       |
    | zip        | varchar(5)   | YES  | MUL | NULL    |       |
    | zcat       | varchar(255) | YES  |     | NULL    |       |
    | population | int(11)      | YES  |     | NULL    |       |
    | field5     | int(11)      | YES  |     | NULL    |       |
    | landmass     | int(11)      | YES  |     | NULL    |       |
    | field7     | int(11)      | YES  |     | NULL    |       |
    | field8     | float        | YES  |     | NULL    |       |
    | field9     | float        | YES  |     | NULL    |       |
    | field10    | float        | YES  |     | NULL    |       |
    | field11    | varchar(255) | YES  |     | NULL    |       |
    +------------+--------------+------+-----+---------+-------+
    

There are 2 fields in the zipcodes table that I generate from the zcta5 data, population and density. Population isn't really required
but it's good to have.

First population

update zipcodes a, zcta5 b set a.population = b.population where b.zip= a.ZIPCode;

You can then populate the density field of the zipcodes table with

update zipcodes a, zcta5 b set a.density = ( (b.population * 100000) / b.field6) where a.ZIPCode = b.zip

Multiplying by 100000 simply gives us an easier variable to work with.

Now extract the data to a csv file, for uploading to SOLR.

    select id, ZIPCode, CityName, CountyName,
             StateName, StateAbbr, ifnull(population,"e;"e;),
             concat( CityName,"e; "e;,StateName ), concat(CountyName, "e; "e;,StateName),
             concat(CityName,"e; "e;, CountyName), ifnull(density, "e;"e;),
             Latitude, Longitude
    into OUTFILE '/tmp/solr-zips.txt'
    FIELDS TERMINATED BY ','
    OPTIONALLY ENCLOSED BY '"e;'
    from zipcodes
    

Next we'll configure SOLR.