This all started over at Flowing Data where Nathan used Python to hack an existing SVG file to colour the counties of the US according to data read in from a file of county unemployment levels. Dave Smith of Revolution Computing issued a challenge: do this in R. I'm always up for a challenge so I found a shapefile of the US counties with the FIPS county codes, and then using the rgdal and sp packages had a plot up in about twenty minutes. The next day Dave posted the challenge results.
Today on the R-Sig-Geo mailing list there has been a bit of traffic on using R or GIS for cartography. Various people arguing one way or the other about how what constitutes 'publication quality' and various opinions bouncing around. So I thought I'd revisit the challenge - this time using open-source GIS software.
Problem one is getting the unemployment data and map data - the unemployment data is linked as a CSV file from the Flowing Data blog entry, and shapefiles of US counties are downloadable after a quick google search.
There are various variations of the US counties depending on the time they were defined, whether Alaska and Hawaii are in their proper place or tucked neatly around the 'lower 48' to make mapping easier, and whether the data has the right FIPS codes. I found a shapefile with county data from 2000, full FIPS codes, and with AK and HI slotted in.
The next step is to create a geographic data set with the unemployment data as attributes. But first I had to fiddle with the CSV file. It had the FIPS codes in two parts - one for the state and one for the county. The full FIPS code is five digits, made up from the two parts with leading zeroes. I didn't want to use R or Python for this, so I loaded the CSV into Calc, the spreadsheet component of OpenOffice.
The first thing I did was to shift all the data down one row and add some headers. The only important ones are the two parts of the FIPS codes and the unemployment percentage.
To create the full FIPS code I have to concatenate the two parts (in columns B and C) with the leading zeroes - I put this formula into J2 and copy it down the column:
then I add the word 'FIPS' as the header in J1 for this column.
Now I have a spreadsheet with a FIPS and an UNEMPLOYMENT column, and a shapefile with a FIPS attribute. How do I join them?
A couple of ways spring to mind:
- Load everything into PostGIS and do a join or create a view
- Create a shapefile with the Unemployment data in the DBF
In the Tools menu of Qgis I found the Data Management options and the Join Attributes function. Exactly what I needed! I set my Join Data to be the DBF of the unemployment data and FIPS to be the field to join with. Hit OK, and in under a minute I had a shapefile with the DBF data in place.
Here's how it looks in Qgis:
Qgis also has a map composer where you can do basic page layout for cartography. Here's one rendering of the unemployment map:
I'd call these publication-quality - at least they're a better quality than many maps I've seen published (wacky handwriting font excepted).
This has taken me about 45 minutes while waiting for the rain to stop so I can go home. Any gvSIG orArcGIS people out there want to have a go?