In the world of data and computers, the tools of data analysis have always evolved towards more graphics and away from text. One could say that graphs are hot and tables are not.
The work of data analysis always involves describing the data. Early on, during data collection and cleaning, the tasks involve repeated checking using frequency counts, means and ranges. Over and over again, simple tables such as the table "AIDS Rates by States" in Figure 1 are used to search for outliers, look for inconsistencies, and make initial and progress reports. This allows research data preparation to proceed. As this reporting continues, one gains confidence in the accuracy and consistency of the data set from frequent listings and cross-tabulations. Simple univariate statistics from the analysis of dependent variables or scales and demographic variables often make the most useful, understandable and valuable presentations in the final report.
| Figure 1: AIDS Rates per 100,000 by State or District | ||
|
The AIDS data has two variables, State and Rate. A bar chart lets you quickly see which has the highest rate and how big the difference is from one state to another. However, a bar chart treats the independent variable, State, as if it were a purely categorical variable, when it's not. The state names have more information than simply that New York and New Jersey are different values. There are spatial, political and geographical differences as well. We miss out on visualizing patterns of the disease.
Certainly, data with a geographic variable begs for a map. Where are the states that have higher incidents of AIDS? Are there geographic patterns in the spread of the infection? Are there political factors in the clustering of AIDS cases? Figure 2 shows the simplest default SAS/MAP produced displaying these data.
SAS's PROC GMAP produces four types of thematic maps: choropleth, block, prism and surface. Choropleth maps show relative values of a dependent variable using different colors. Usually we would use a color scale, for example going from white to red where white represents a low value, progressively darker shades of pink represent higher values, and the darkest red shows the highest value in the range.
![]() Figure 2: Block map of AIDS rates by state. |
![]() Figure 3: A prism map. |
![]() Figure 4: Block map pivoted to view New York State. |
SAS's PROC GMAP procedure requires a map data set and a response data set. A map data set is an SAS data set that defines the boundaries of unit areas, such as states or counties. A map data set must contain two numeric variables, X and Y, along with a variable ID that identifies the unit that the X and Y coordinate helps define. In other words, the unit Alabama has enough pairs, X and Y, to define a reasonable boundary of the state.
For these maps, I prepared the response data set in SPSS and then used the software DBMS/COPY to make the required SAS data set. Response data is also called attribute data -- that is, the data values attributed to the states on the map. Given the map data set, all you need to produce these maps is a response data set with one variable, State, and a second, Rate. The variable State in the response data set must be able to be matched against the values of the variable State from the map data set.
The SAS maps used above are available in the SAS System for Windows. Both packages are available at ACF's Tisch Lab in LC 8. SAS maps of the U.S. and the world are included with the software.
We can also create maps if we have files that define the boundaries. ACF's Statistics and Mapping Lab in Tisch LC7 has a digitizer board where your hard copies of maps can be automated to create thematic map graphs. Refer to SAS Technical Report A-107, "Creating Your Own SAS/GRAPH Map Data Sets with a Digitizer." You can automate a scanned image to create a map suitable for the SAS GMAP procedure by using your computer's mouse to trace geographical boundaries.
Researchers no longer have to be jealous of the statistical maps in the Times. For a complete description of PROC GMAP, see the GMAP procedure description in "SAS/GRAPH Software: Reference, Version 6," part of the SAS reference set.![]()
Posted October 5,1998
|
|
|
| |