Background

This service estimates latitude and longitude coordinates from address strings. For more information on how the program works and how you can implement it yourself, see https://github.com/cole-brokamp/geocoder

The program implements a Ruby interface to parse US street addresses and perform fuzzy lookup against an SQLite3 database. It will return the best matches found, with geographic coordinates using address interpolation based on 2015 TIGER/Line files. It will attempt to fill in missing information, and it knows about standard and common non-standard postal abbreviations, ordinal versus cardinal numbers, and more.

Perhaps the most useful feature to users at CCHMC is that the process is HIPAA compliant. Geocoding is completed on a server inside the CCHMC network without using internet access.

Input

Below are some tips for getting the best results from the offline geocoder located here. For more information on how the program works and how you can implement it yourself, including as a Docker image, see https://github.com/cole-brokamp/geocoder.

Tips for Getting the Best Results

  • omit apartment numbers or “second address line”
  • plus4 zip codes are ignored, but if they must be included make sure to separate them with a dash (i.e. 37209-0000 instead of 372090000)
  • capitalization does not affect results
  • separate the different address components with a space
  • abbreviations may be used (i.e. St. instead of Street or OH instead of Ohio)
  • use arabic numerals instead of written numbers (i.e. 13 instead of thirteen)
  • spelling should be as accurate as possible, but the program does complete “fuzzy matching” so an exact match is not necessary
  • address strings with out of order items could return NA (i.e. 3333 Burnet Ave Cincinnati 45229 OH)

Output

The following fields are included in the geocoded output.

Results Usable for Further Analysis

We strongly recommend that only addresses that are geocoded with the precision method of range are used for further analysis or mapping.

Other method results are included in the output if range is not available, but should likely not be used as they may be widely inaccurate or imprecise. In order of decreasing accuracy, the following are the possible values for precision in the output file:

  • range: interpolated based on address ranges from street segments
  • street: center of the matched street
  • intersection: intersection of two streets
  • zip: centroid of the matched zip code
  • city: centroid of the matched city

If results geocoded with methods other than range must be used, also consider using the score field in the output file to filter quality results.

Citation

If you use any geocoding results in a scientific publication, please cite the use of this software. The DOI number and example citations are located at http://doi.org/10.5281/zenodo.344621.