Connect Summer 1998  Statistics and Social Sciences


GIS on the World Wide Web

Antonio Lopez

[Ed: Links to web pages and/or e-mail addresses which have become inactive since the publication of this article have been enclosed in curly brackets { }. Replacement links have been provided where possible.]

A Geographic Information System (GIS) can be defined as a computerized database system for the capture, storage, retrieval, analysis and display of tabular and spatial data. GIS is used by many disciplines, including geography, urban planning, engineering, landscape architecture, environmental sciences and sociology. It provides individuals from varied disciplines with a set of tools to improve the efficiency and effectiveness of working with map information of spatial and non-graphic tabular attributes.

The origins of GIS date back to the 1960s at Harvard University, with the successful development of SYMAP. This product was raster-based and had the capability to shade slope maps. In the 1970s, Harvard's computer graphics lab produced Odyssey, a primitive GIS with polygon overlay functions. These two products were the first to be identified as having GIS functionality. It was the combination of computer-aided design (CAD) technology supporting a variety of design and drafting tasks, and database management system technology, for the manipulation of digital tabular data, that led to the development of GIS.

The Internet is another valuable tool for accessing information. The Internet was originally developed about 30 years ago by the United States Department of Defense as an experimental network for military research. Universities and government agencies soon demanded access to this technology as well. In the 1980s, the National Science Foundation developed a network called NSFNET that connected universities via telephone lines for scholarly research. In 1987, with growing network traffic and maintenance costs, a management and upgrade contract was awarded to Merit Network Inc. in partnership with IBM and MCI. The old network was replaced with faster computers and telephone lines. Today, Internet access is available to anyone with a modem and a personal computer. In fact, it has become a ubiquitous resource.

With increased use of both GIS and the Internet, the marriage of these two technologies was inevitable. The incredible potential for the use of geographic data on the Web has led GIS vendors to develop new software that brings GIS functionality to the desktop of Internet and intranet users. The information superhighway is becoming a valuable medium for sharing spatial data. Government agencies including the U.S. Environmental Protection Agency, the U.S. Geological Survey and the Census Bureau are in the process of providing direct access to their geographic databases via the Web. With the use of new web-based GIS technologies, geographic information is being deployed and accessed via the Internet and organizational intranets by a variety of users for diverse applications.

Vector and Raster Data Models

A GIS can be thought of as a layered model of the real world, where each layer represents a specific theme. There are two primary data models in a GIS: the vector and raster environments. The vector data model separates the surface of the earth into discrete objects consisting of points, lines and polygons. Each layer is separated by object type and by theme. A point layer stores events, such as automobile accidents, that occur on a particular point in space. A line layer consists of linear features, such as a road network, where road conditions are stored in a table that is related to individual road segments. And census tracts are polygons that could make up a theme called demographics. Furthermore, vector-based GIS technology has polygon processing capabilities that support automated spatial analysis functions between separate layers. These functions include creating buffers around geographic objects, analyzing attribute characteristics within a given radius of a point, and determining the proximity of one object to another.

The raster data model, on the other hand, divides the earth's surface into a grid consisting of individual cells with an associated value. A raster layer can be an aerial photograph or a satellite image. Both are commonly used as base layers upon which the geometry and geographic coordinates of other layers are built.

Raster layers can also represent individual attributes. In this case, the value of the individual cell indicates the value of the attribute it represents. Furthermore, each raster layer only represents one theme. Topography and soils for an area would be stored in separate layers where each cell has a value for elevation or soil type. Operations on multiple raster layers involve the retrieval and processing of the data in corresponding cells positioned in different layers. In order to find all cells with an elevation greater than 1,000 feet and having a sandy soil type, for example, each cell in the elevation layer and each corresponding cell in the soils layer would be identified and output to a new combined layer (Figure 1).


Figure 1: A raster based map of combined layers, with darker cells representing areas of higher construction costs. Each line from point A to point B represents a different solution to the least-cost problem.

Calculating the least-cost path is an example of a possible application of the raster data structure. In this procedure, each cell in a layer has an associated cost to traverse it. An example of layers in the raster model could be geology, vegetation, slope, aspect, soils and land use. These layers would be mathematically combined to create a final cost surface. A cost surface consists of a grid of cells containing the summation of the cost of each corresponding cell from all the layers in the model.

The next step in calculating the least-cost path requires defining the cells of origin and destination. Once this is done, the software runs an algorithm which derives the unique least-cost path (Figure 1). This is particularly useful when planning a road or a pipeline. Finding the least-cost path is a common function available in commercial GIS packages supporting the raster data structure. Two such packages available at ACF are ARC/INFO and GRASS. ARC/INFO is produced by Environmental Systems Research Institute (ESRI) and is the most commonly used high-ended GIS package available. GRASS is public domain software developed by the U.S. Army Corp of Engineers.

Dr. Yakov Smotritsky's Work with Raster Data

Dr. Yakov Smotritsky has developed a unique algorithm that improves the least-cost path solutions of ARC/INFO and GRASS. The ARC/INFO and GRASS algorithms calculate the lowest cost of moving from the center of a cell to the center of neighboring cells. This approach, however, is not computationally or geometrically optimal. Dr. Smotritsky's solution, on the other hand, traverses a cell at the shortest angle possible to reach the next cell. When the sum of traversing each cell in the path is calculated, an improved least-cost path is derived. As a result, the difference in the solutions reduces the total cost of constructing a road or pipeline.

Dr. Smotritsky's solution, EARL, (Environmentally Acceptable Route Location), has been developed with funds from a National Science Foundation grant. It has been moved to the Web with the help of the Statistics and GIS group at ACF.

Putting Interactive Route Calculation Online

The Interactive Route Calculation website (Figure 2) will soon be online through the Social Sciences website. It is intended to demonstrate and evaluate route calculation techniques employed by ARC/INFO, GRASS, and EARL. The route calculation is performed using various raster layers from the Spearfish dataset. This dataset was developed for researchers by the U.S. Army Construction Engineering Research Laboratory, and represents a geographic region in South Dakota. The Interactive Route Calculation website allows the user to select raster layers from a list, choose the start and end points, and select the method of calculation (i.e., ARC/INFO, Grass, or EARL algorithms). The calculations can be performed simultaneously and compared.

Once the calculations are complete, the solutions are displayed on a graphic report window that also displays the numeric representation of the total costs. From these numbers, one can determine which solution yields the lowest cost. In addition, the cost associated with specific cells can be changed independently to respond to specific needs, and a new path will then be calculated almost immediately. This is useful if factors other than lowest cost should be considered when planning a route. For example, if a planned road traverses an Indian reservation that had not previously been noted, the planner can set the cost of the cells in the reservation artificially high, to force the program to find an alternate route around the area.

GIS Industry Goes Online

There are many factors that need to be taken into consideration when developing a GIS website. The web developer can use one of two primary approaches, depending on the user's needs when preparing geographic data for the Web. In the first approach, the developer can use graphics software, such as Adobe Illustrator, to create a map that can then be put on the Web. Using a graphics software package as a cartographic tool to create a raster-based image is similar to creating a hard-copy map. The web developer anticipates the user's actions, such as zooming in or out of an area. As a result, the developer has full control of the resulting map. Maps created in this way do not have a database behind them.

The second method is more involved and allows the system to create the map dynamically in response to the user's request. Objects on the map are linked to a database that can be queried by the user. Mapping engines are served by data stored by spatial engines that directly plug into relational databases. These mapping and spatial engines play a role in developing a GIS website.

Spatial Engines

When large amounts of data need to be accessed on the Web, response times can be tiresomely long. In response to the demands of users, a new generation of GIS technology has emerged. Today, spatial engines and object-oriented GIS components improve the availability of geographic data through the Web. Spatial engines plug directly into relational databases, which store geo-spatial data in a relational database environment. The data is spatially indexed and generates topology in real time to support spatial queries in Structured Query Language (SQL). Spatial engines such as ESRI's Spatial Database Engine (SDE) store long-integer-format spatial data as Binary Large Objects (BLOB) in a relational database. Spatial indices are created before the data is used the first time. These indices allow for very fast retrievals.

The software provides an application programming interface (API), based on the C programming language, for building spatial queries. It supports Oracle, Informix, Sybase, IBM DB2 and Microsoft SQL. SpatialWare, another example of a spatial engine, was developed by Blue Bell and Unisys, and was purchased by MapInfo in October 1997. SpatialWare is based on object-oriented technology, and stores data as a spatial abstract data type in an Oracle server. Unlike SDE, SpatialWare stores the genuine geometry instead of a series of vertices for a geographic object.

Despite these recent advances in spatial engines, challenges still remain. For example, spatial engines can not effectively handle raster-based geographic data.

Mapping Engines

Spatial engines need software to serve the spatial data out of a relational database. Unfortunately, most existing GIS packages can not be used to do this directly. Mapping engines, however, can visualize data in a relational database, from SQL search results, and support spatial analysis. A map engine based on a spatially enabled relational database can be developed using Object Linking and Embedding Custom Controls (OCX) designed for GIS, such as ESRI's MapObjects.

Various operations, such as address matching, map composition, spatial queries and database links, can be performed with MapObjects because it is a programming environment. In other words, it is possible to create a system in which complex spatial operations are performed at the user's request, with the results sent out as a map. MapInfo's equivalent to MapObjects is called MapX. Windows NT has proven to be an optimal platform for map servers. Windows NT custom controls can be easily developed and deployed inside industry standards tools such as Visual Basic.

The Future of GIS Web Technology

Internet GIS applications enable users to hurdle proprietary GIS database retrieval issues. Soon, spatial engines will drive web-based GIS. As a result, researchers will be able to use GIS for a variety of applications. The Internet will bring affordable geospatial-enabled technology to academic life. [ C ]


Antonio Lopez was a graduate student in the Wagner School of Public Service at the time of this article's publication.
{aql0710@is.nyu.edu}

Posted May 18, 1998. Revised May 24, 2004.