Simple Geometry Language


This page describes some incomplete work undertaken in 2003 to define a standard, XML-based language for describing geographic regions, i.e., geometric regions on the Earth's surface. For brevity, we'll call any such language a geometry language. A geometry language defines a set of possible shapes and standard representations and encodings of those shapes, and also addresses the handling of cartographic quantities (Earth datums, projections, and coordinate systems), either by mandating standard quantities or by providing standard declaration mechanisms.

The motivation for a standard geometry language is rooted in the observation that every system/service/effort that has had to deal with geographic regions has ended up defining its own geometry language. All these languages have broadly similar capabilities to varying degrees, yet all have enough idiosyncracies to bedevil easy interoperability. It is instructive to compare and contrast the geometry languages embedded in specifications such as:

(A number of additional geometry languages are derived from one or more of the above.) A standard geometry language would facilitate interoperability across different systems, particularly among consumers of geographic regions such as renderers and spatial indexers.

From the perspective of distributed geospatial digital libraries and distributed gazetteer services, which use geometry only for the limited purposes of representing object footprints and query regions and performing spatial comparisons between the two, a geometry language must satisfy three requirements:

  1. The language must support enough possible shapes—and complex enough shapes—so that spatial matching over those shapes yields acceptable search precision. For gazetteers a sufficient set of shapes is not known, but necessary shapes include points for point features such as water wells, polylines for linear features such as rivers, and at least simple polygons for areal features.
  2. The spatial reference system (SRS) in which shapes are defined (i.e., the Earth datum and coordinate system) must not be mandated by the language, but must be declarable in a standard way. Mandating a particular SRS forces language users to translate SRSs, which can be mathematically complex and can introduce unintended consequences such as formation of aggregate shapes.
  3. The language must provide a lingua franca that virtually all geometry producers and consumers can operate on; in practice, due to simplicity of implementation, ease of mappability, and general widespread support, the lingua franca is latitude/longitude-aligned minimum bounding rectangles, or bounding boxes for short.
    1. Notwithstanding requirement 2 above, to support interoperability, bounding boxes must be defined in a standard SRS, e.g., WGS84 latitude/longitude coordinates. (It is reasonably easy to compute such bounding boxes from commonly-used cylindrical and polar projections.)
    2. In principle, bounding boxes are deterministically computable from primary shapes; nevertheless, bounding boxes must explicitly accompany all primary shapes in instance documents. To fail in this regard is to place the burden of computing bounding boxes on the very geometry consumers that are incapable of doing so: those that rely on bounding boxes because they're incapable of operating on more complex shapes.
    3. Bounding boxes must be defined in a manner that supports geodetic continuity, that is, in a manner that recognizes that the Earth is, topologically, a sphere. In particular, there must be no discontinuity that bounding boxes are not allowed to cross such as, in many geometry languages, the ±180° meridian.

The Open GIS Consortium's Geography Markup Language (GML), version 3.0, is one well-known attempt to define a standard geometry langauge. It is a comprehensive specification having many desirable characteristics, but it suffers from two defects that are shared by many of the aforementioned geometry languages. First, in balancing the concerns of consumers of the language, who generally prefer uniformity and simplicity, versus producers, who generally prefer expressiveness and flexibility, GML weighs heavily in favor of producers. It defines many, many possible shapes and shape-related options. The effect of this imbalance is that, in practice, consumers can not and do not accept but an idiosyncratic fraction of the entire GML language. The second defect is that GML does not meet any of the conditions of requirement 3 above.

The XML schema below represents a first effort at defining a geometry language that addresses these concerns. The language is a profile of GML, that is, a subset and logical restriction of GML such that any instance document that adheres to the language below also adheres to GML and can be interpreted by any GML consumer.

This deliberately simple geometry language supports just three possible shapes: points, polylines ("linestrings" in GML parlance), and simple (i.e., self-intersection-free and hole-free) polygons. Each shape is represented in the language by both an XML schema type (e.g., PolygonType) and an XML element (e.g., <Polygon>). However, the intention of the language is that only schema type AbstractFeatureType be referenced by application schemas; this usage forces a bounding box to be associated with every shape in instance documents. SRSs can be declared using the srsName attribute.

ADL-geometry.xsd
<?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:gml="http://www.opengis.net/gml"
  targetNamespace="http://www.opengis.net/gml"
  elementFormDefault="qualified">

<element name="coordinates" type="string"/>

<!-- needed only by ADL-geometry-extended.xsd -->
<element name="radius">
  <complexType>
    <simpleContent>
      <extension base="double">
        <attribute name="uom" type="anyURI" use="required"/>
      </extension>
    </simpleContent>
  </complexType>
</element>

<complexType name="AbstractGeometryType" abstract="true">
  <attribute name="srsName" type="anyURI"/>
</complexType>

<element name="_Geometry" type="gml:AbstractGeometryType"/>

<complexType name="PointType">
  <complexContent>
    <extension base="gml:AbstractGeometryType">
      <sequence>
        <element ref="gml:coordinates"/>
      </sequence>
    </extension>
  </complexContent>
</complexType>

<element name="Point" type="gml:PointType"
  substitutionGroup="gml:_Geometry"/>

<complexType name="LineStringType">
  <complexContent>
    <extension base="gml:AbstractGeometryType">
      <sequence>
        <element ref="gml:coordinates"/>
      </sequence>
    </extension>
  </complexContent>
</complexType>

<element name="LineString" type="gml:LineStringType"
  substitutionGroup="gml:_Geometry"/>

<complexType name="PolygonType">
  <complexContent>
    <extension base="gml:AbstractGeometryType">
      <sequence>
        <element name="exterior">
          <complexType>
            <sequence>
              <element name="LinearRing">
                <complexType>
                  <sequence>
                    <element ref="gml:coordinates"/>
                  </sequence>
                </complexType>
              </element>
            </sequence>
          </complexType>
        </element>
      </sequence>
    </extension>
  </complexContent>
</complexType>

<element name="Polygon" type="gml:PolygonType"
  substitutionGroup="gml:_Geometry"/>

<complexType name="AbstractFeatureType" abstract="true">
  <sequence>
    <element name="boundedBy">
      <complexType>
        <sequence>
          <element name="Envelope">
            <complexType>
              <sequence>
                <element ref="gml:coordinates"/>
              </sequence>
            </complexType>
          </element>
        </sequence>
      </complexType>
    </element>
    <element name="location">
      <complexType>
        <sequence>
          <element ref="gml:_Geometry"/>
        </sequence>
      </complexType>
    </element>
  </sequence>
</complexType>

</schema>

The above geometry language, expressed as a profile of GML, has a number of nice properties, not the least of which is that it weeds out 99% of the 600-plus-page GML specification. However, there are a number of serious deficiencies which are still unresolved:

  • To satisfy requirement 3(i), the bounding box SRS must be standardized to, for example, WGS84 latitude/longitude coordinates. If the above language were an independent specification, such a requirement could be stated as part of the specification itself; an explicit declaration of the SRS need not be present in instance documents or even in the schema. But to avoid ambiguity as a profile of GML, all SRSs must be made explicit, and GML makes this possible by allowing an srsName attribute to be placed on the <Envelope> element. Unfortunately, at the time of this work, there appears to be no standard means of referring to SRSs.
  • In GML, an <Envelope> element "defines an extent using a pair of positions defining opposite corners," that is, using a pair of minimum and maximum coordinate values. A consequence of being defined this way, as opposed to being defined in terms of explicitly-labeled east and west boundaries, is that it is not possible to describe a bounding box that crosses the ±180° meridian (or other discontinuity).
    Figure 1aIf east/west bounding coordinates are mapped to minimum/maximum coordinates according to their values, then a bounding box such as Russia's will be misinterpreted (its east bounding coordinate, being less than its west, will be considered the minimum coordinate value), with the result that the GML envelope will describe the longitudinal complement of the desired bounding box. Always mapping the west (east) bounding coordinate to the minimum (maximum) coordinate value, even when west is numerically greater than east, would solve the problem (this is effectively equivalent to explicitly labeling the east and west boundaries), but the GML specification gives no indication that this is admissible or that SRSs may employ such modular arithmetic.
    Figure 1bIt seems that the only unambiguous and correct method of encoding a bounding box that crosses the ±180° meridian is to convert the bounding box to a whole-world band. But this loss of shape fidelity results in many false positives by spatial search engines and is unacceptable.
  • The language is both clumsy and misleading. It is clumsy because, to satisfy requirement 3(ii), application schemas that use the language must restrict their use to the AbstractFeatureType element type. Thus, instead of being able to say
    <element name="my-element" type="gml:PolygonType"/>
    the application schema must say
    <element name="my-element">
      <complexType>
        <complexContent>
          <extension base="gml:AbstractFeatureType"/>
        </complexContent>
      </complexType>
    </element>
    But notice in the above that the ability has been lost to restrict the possible shapes <my-element> can take on to, say, polygons. The geometry language is further misleading because declarations such as PolygonType and <Polygon> are publicly visible, and application schemas will naturally assume that they can be directly referenced. An alternative approach would be to abandon AbstractFeatureType altogether, and use GML's <metaDataProperty> element for storing associated bounding boxes. In this approach, an application could say
    <element name="my-element" type="gml:PolygonType"/>
    and an instance document would look like
    <my-element>
      <gml:metaDataProperty>
        <gml:GenericMetaData>
          <gml:boundedBy>
            <gml:Envelope>
              <gml:coordinates>...</gml:coordinates>
            </gml:Envelope>
          </gml:boundedBy>
        </gml:GenericMetaData>
      </gml:metaDataProperty>
      <gml:exterior>
        ...
      </gml:exterior>
    </my-element>
  • Figure 2Whether the language should support aggregate shapes (i.e., sets of disjoint shapes treated as first-order shapes)—and if so, which kinds—is an open question. On the one hand, aggregate shapes are desirable because they can offer vastly greater fidelity to true region shapes: consider the footprint of the United States described as an aggregate of three shapes (contiguous 48 states; Alaska; Hawaii) versus as a convex hull or bounding box of those shapes. On the other hand, aggregate shapes bring concomitantly large increases in interface and implementation complexity. Then again, if the language were to support aggregates, consumers would always have the option of falling back to bounding boxes.

Finally, below is an extension to the above geometry language that adds a disk shape (defined by center and radius) and several convenience declarations. As an extension, it is necessarily incompatible with GML.

ADL-geometry-extended.xsd
<?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema"
  xmlns:adlgml="tag:alexandria.ucsb.edu,2003:geometry"
  xmlns:gml="http://www.opengis.net/gml"
  targetNamespace="tag:alexandria.ucsb.edu,2003:geometry"
  elementFormDefault="qualified">

<import namespace="http://www.opengis.net/gml"
  schemaLocation="ADL-geometry.xsd"/>

<complexType name="DiskType">
  <complexContent>
    <extension base="gml:AbstractGeometryType">
      <sequence>
        <element ref="gml:coordinates"/>
        <element ref="gml:radius"/>
      </sequence>
    </extension>
  </complexContent>
</complexType>

<element name="Disk" type="adlgml:DiskType"
  substitutionGroup="gml:_Geometry"/>

<complexType name="FeatureType">
  <complexContent>
    <extension base="gml:AbstractFeatureType"/>
  </complexContent>
</complexType>

<element name="Feature" type="adlgml:FeatureType"/>
<element name="Footprint" type="adlgml:FeatureType"/>
<element name="QueryRegion" type="adlgml:FeatureType"/>

</schema>


Greg Janée
Created: 2004-08-25
Last modified: 2008-02-28 10:58