|
The ADL Gazetteer Protocol
Linda L. Hill
Center for Global Georeferencing Research
Version 1.2
- Other versions of this specification:
-
Latest
1.2 (diffs from 1.1)
1.1
Contents
This document describes a protocol for accessing general-purpose
gazetteer services.
A gazetteer is a dictionary of geographic placenames.
Gazetteers have traditionally appeared as back-of-the-book indexes in
atlases; as place encyclopedias, such as the Columbia Gazetteer of the
World; as thesauri, such as the Getty
Thesaurus of Geographic Names; and as toponymic authority files,
such as NIMA's GEOnet Names
Server and the U.S. Geological Survey's Geographic Names Information
System. In an atlas, a gazetteer provides an alphabetical list of
the placenames that appear in the atlas, and it maps those names to
page numbers and map grid locations. Place encyclopedias often
include descriptive information for locations, as does the Getty
Thesaurus, and sometimes include latitude/longitude coordinates as
well. Toponymic authority files focus on differentiating official
placenames versus variant names, and they associate names with
coordinate locations primarily for disambiguation purposes. Other
toponymic reference works publish scholarly information about the
origins of geographic names.
A digital gazetteer builds on these traditional gazetteers. It
maps geographic placenames (the names of natural features such as
mountains and lakes and the names of human constructs such as cities
and states) to coordinate-based geographic locations. The services it
provides are largely oriented around searching: answering "Where
is...?" queries given all or a portion of a geographic name ("Where is
the place named 'Santa Barbara'?") and "What's there?" queries which
return all places, or all places of a specified class, within a given
region ("What schools are in Santa Barbara County?"). Digital
gazetteers augment traditional gazetteers by providing bidirectional
mappings among placenames, map locations, and classifications. And
they expand on the notion of named geographic features to include
virtually any category of feature that can be geolocated (e.g.,
weather events such as hurricanes), any type of name or label for a
place (e.g., postal codes and UTM grid names), and names with only
local or specialized scope (e.g., research study areas). Descriptive
information and associated data (e.g., population and elevation) can
also be included in digital gazetteers. The ADL gazetteer protocol
builds on this generalized concept of what a gazetteer is.
This document first semi-formally defines an abstract model of a
gazetteer. That model is then used as the basis for defining a set of
services (i.e., a set of network-invokable functions), several report
formats, and a query language.
A caveat: the gazetteer protocol described herein provides
relatively low-level services. The services are intended to be simple
enough that they can be implemented by all gazetteers, yet powerful
enough to be useful to clients both in their own right and for
combining into higher-level services. To get a sense of the level of
this specification, consider the common gazetteer functionality of
finding places by entering qualified placenames, as in "find 'Santa
Barbara, California'". The ADL gazetteer protocol does not provide
such high-level functionality, but it does provide sufficient building
blocks for achieving that functionality. Specifically, the protocol
supports 1) finding a place named "California" belonging to class
"states"; 2) disambiguation in the case of multiple returns; and 3)
finding a place named "Santa Barbara" that is contained within the
place named "California".
In this section we semi-formally define an abstract model of a
gazetteer. The ADL gazetteer protocol is built on (i.e., is written
against) this model.
A gazetteer is a set of gazetteer entries. There is no
intrinsic structure to a gazetteer beyond simple containment of
gazetteer entries, although relationships between entries may be
explicitly represented by the gazetteer (see below).
A gazetteer entry describes a single, conceptual
geographic place by:
- an identifier,
- zero or more codes,
and several key attributes of the place:
- a place status,
- one or more names,
- one or more footprints, and
- zero or more classes.
There should be a one-to-one correspondence between gazetteer
entries and conceptual places (i.e., two gazetteer entries should not
describe the same place) but, strictly speaking, this is not required
or enforced by the gazetteer protocol.
A gazetteer entry's identifier and its zero or more codes are all
strings that unambiguously identify the entry or place. The
identifier identifies the entry within the gazetteer; it need
not be universally unique. A code identifies the place
within a specified code scheme, namespace, or system. For example,
the state of California is identified by FIPS 5-2 code
"06".
The place status is the status of the place's existence,
and may be former (the place itself no longer exists),
current (the place exists), or proposed (the place
does not exist, but its creation is anticipated). For example, the
place status of the now-nonexistent country of Yugoslavia would be
former.
A name is a complete, unqualified name for the place. For
example, the name of the city of Los Angeles is "Los Angeles", not
"Los Angeles, California". A gazetteer entry can have more than one
name, in which case the names may denote alternative names for the
place (e.g., the city "Köln" is also known as "Cologne") or
varying names over time (e.g., the country "Thailand" was formerly
known as "Siam").
A footprint is an approximation, expressed in
latitude/longitude coordinates, of the subset of the Earth's surface
occupied by the place. Note that a footprint need not be contiguous.
For example, a footprint for the state of Hawaii might consist of a
union of disjoint polygons, one per island. A gazetteer entry can
have more than one footprint, though the semantics of this are
undefined by the protocol.
A class classifies the place with respect to a set of
terms. More specifically, a class is the association of the place
with a term drawn from a simple vocabulary of terms or thesaurus (a
vocabulary augmented with inter-term relationships). A gazetteer
entry may belong to multiple classes, and even to multiple classes
from the same thesaurus. Note that if a gazetteer consists of a
single class of places (consider "The Knopf Gazetteer of Cemeteries of
the Southwest"), its entries will not be considered to be classified
for the purposes of the protocol unless each entry carries the
classification for searching and reporting purposes.
Certain attribute values of a gazetteer entry (namely, each name,
each footprint, and each class) are further qualified using two
qualifiers. The primary qualifier, a boolean, indicates if
the attribute is the preferred or official value. For example, a
gazetteer entry for the city Köln may mark the name "Köln"
as primary but not "Cologne". The status qualifier indicates
the validity of the attribute value using the same terms
(former, current, proposed) as the entry's
place status attribute. The place status attribute and the status
qualifier on attribute values should not be confused; the former
refers to the place as a whole, the latter to just the attribute
value. For example, a gazetteer entry for the country Thailand may
have the place status current but qualify the name "Siam" as
former.
For each gazetteer entry, the following conditions on qualifiers
must hold:
- Exactly one name must be marked as primary.
- Exactly one footprint must be marked as primary.
- If the entry has been classified, at least one class must be
marked as primary.
Finally, a gazetteer may be augmented with inter-entry
relationships. A relationship is a named, directed, binary
association between gazetteer entries. For example, a gazetteer might
support a capital-of relationship which relates capital
cities and administrative areas: the city of Sacramento is the capital
of the state of California, and so forth. (Note that the ADL
gazetteer protocol defines the necessary structures to support
relationships in general, but it does not define any particular
relationships, just as it does not define any particular
classification scheme.)
Functionally speaking, the ADL gazetteer protocol consists of the
following three independent, stateless services. Each service follows
the classical model of function invocation: zero or more arguments are
passed to the service, the service executes synchronously, and a
result and/or an error indication is returned. Support for the
get-capabilities service is mandatory; the other services
are optional. Clients should anticipate that gazetteers may apply
different access control policies to different services.
- capabilities description
<-
get-capabilities()
Returns a description of the overall capabilities of the
gazetteer (the services and query types the gazetteer supports, the
thesauri the gazetteer uses, etc.). See Capabilities below.
- reports
<-
query(query,
{"standard"|"extended"} [, geometry
language])
Returns reports for the gazetteer entries selected by a query.
query is a query expressed in the gazetteer query language;
see Query language below. Either
standard or extended reports may be returned; see Reports below. The geometry language used in the
reports may optionally be requested. The geometry language(s) and the
subset of the query language that the gazetteer supports are described
in the gazetteer's capabilities; see Capabilities below. Clients should
anticipate that a gazetteer may return an error indication in response
to a nominally supported query due to implementation limitations.
Also, a gazetteer may return both reports and an error indication, as
when an internal result limit is reached during otherwise successful
query processing.
- reports
<-
download({"standard"|"extended"} [,
geometry language])
Similar to the query service, the
download service returns standard or extended reports for
every entry in the gazetteer.
An XML-over-HTTP implementation of the services is described next.
In this formulation, a gazetteer service is invoked by submitting an
HTTP POST request to a URL representing the gazetteer's common access
point for all services. The format and discovery of this URL are
outside the scope of the protocol.
Both service requests and service responses must have MIME content
type text/xml and consist of a single
<gazetteer-service> element in namespace
"http://www.alexandria.ucsb.edu/gazetteer". The
version attribute of this element indicates the version
of the gazetteer protocol used by the client (in requests) or the
gazetteer implementation (in responses).
In a service request, the <gazetteer-service>
element must contain a single subelement expressing the request.
Subelement <S-request>
corresponds to service S above, e.g., subelement
<get-capabilities-request> corresponds to the
get-capabilities service. Arguments to the request, if
any, are encoded as subelements of the request subelement.
In a service response, the <gazetteer-service>
element must contain a single subelement containing the response.
Similar to requests, subelement
<S-response> corresponds to
service S. Each response subelement contains optional,
service-specific, "successful" content (e.g., reports in the case of
the query service) and an optional
<error> subelement that describes a service
processing error by an implementation-specific code and/or text
description. An implementation may return both successful content
and an error, such as when a query is successfully processed
and results are successfully returned, but the number of results
returned is limited due to an implementation constraint.
Gazetteer implementations should generally return HTTP status code
200 (OK), and should use HTTP error codes only for low-level errors
such as syntactically malformed requests and authentication problems.
Higher-level errors should be returned using the mechanism described
above.
| gazetteer-service.xsd |
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:gaz="http://www.alexandria.ucsb.edu/gazetteer"
targetNamespace="http://www.alexandria.ucsb.edu/gazetteer"
elementFormDefault="qualified">
<include schemaLocation="gazetteer-capabilities.xsd"/>
<include schemaLocation="gazetteer-query.xsd"/>
<include schemaLocation="gazetteer-standard-report.xsd"/>
<include schemaLocation="gazetteer-types.xsd"/>
<element name="gazetteer-service">
<complexType>
<choice>
<element ref="gaz:get-capabilities-request"/>
<element ref="gaz:get-capabilities-response"/>
<element ref="gaz:query-request"/>
<element ref="gaz:query-response"/>
<element ref="gaz:download-request"/>
<element ref="gaz:download-response"/>
</choice>
<attribute name="version" type="string" use="required"/>
</complexType>
</element>
<element name="get-capabilities-request">
<complexType/>
</element>
<element name="get-capabilities-response">
<complexType>
<sequence>
<element ref="gaz:gazetteer-capabilities"
minOccurs="0"/>
<element ref="gaz:error" minOccurs="0"/>
</sequence>
</complexType>
</element>
<element name="query-request">
<complexType>
<sequence>
<element ref="gaz:gazetteer-query"/>
<element name="report-format"
type="gaz:report-format-type"/>
<element name="geometry-language" type="anyURI"
minOccurs="0"/>
</sequence>
</complexType>
</element>
<element name="query-response">
<complexType>
<sequence>
<choice minOccurs="0">
<element name="standard-reports">
<complexType>
<sequence>
<element ref="gaz:gazetteer-standard-report"
minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
<element name="extended-reports">
<complexType>
<sequence>
<any processContents="lax" minOccurs="0"
maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
</choice>
<element ref="gaz:error" minOccurs="0"/>
</sequence>
</complexType>
</element>
<element name="download-request">
<complexType>
<sequence>
<element name="report-format"
type="gaz:report-format-type"/>
<element name="geometry-language" type="anyURI"
minOccurs="0"/>
</sequence>
</complexType>
</element>
<element name="download-response">
<complexType>
<sequence>
<choice minOccurs="0">
<element name="standard-reports">
<complexType>
<sequence>
<element ref="gaz:gazetteer-standard-report"
minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
<element name="extended-reports">
<complexType>
<sequence>
<any processContents="lax" minOccurs="0"
maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
</choice>
<element ref="gaz:error" minOccurs="0"/>
</sequence>
</complexType>
</element>
<element name="error">
<complexType>
<sequence>
<element name="code" type="string" minOccurs="0"/>
<element name="description" type="string"
minOccurs="0"/>
</sequence>
</complexType>
</element>
</schema> |
An example of a service request is shown below. The request asks a
gazetteer for standard reports for all populated places whose names
contain the phrase "las vegas".
<?xml version="1.0" encoding="UTF-8"?>
<gazetteer-service
xmlns="http://www.alexandria.ucsb.edu/gazetteer"
version="1.2">
<query-request>
<gazetteer-query>
<and>
<name-query operator="contains-phrase"
text="las vegas"/>
<class-query thesaurus="ADL Feature Type Thesaurus"
term="populated places"/>
</and>
</gazetteer-query>
<report-format>standard</report-format>
</query-request>
</gazetteer-service> |
A possible successful response to the above request is shown below.
The response contains a single standard report for a place named "Las
Vegas", formerly known as "Sin City". The successfulness of the
response is indicated by the lack of an <error>
subelement.
<?xml version="1.0" encoding="UTF-8"?>
<gazetteer-service
xmlns="http://www.alexandria.ucsb.edu/gazetteer"
xmlns:gml="http://www.opengis.net/gml"
xmlns:xlink="http://www.w3.org/1999/xlink"
version="1.2">
<query-response>
<standard-reports>
<gazetteer-standard-report>
<identifier>1001652</identifier>
<codes>
<code scheme="FIPS 55-3">40000</code>
</codes>
<place-status>current</place-status>
<display-name>Las Vegas, Nevada</display-name>
<names>
<name primary="true">Las Vegas</name>
<name status="former">Sin City</name>
</names>
<bounding-box>
<gml:coord>
<gml:X>-115.25</gml:X>
<gml:Y>36.15</gml:Y>
</gml:coord>
<gml:coord>
<gml:X>-115.12</gml:X>
<gml:Y>36.25</gml:Y>
</gml:coord>
</bounding-box>
<footprints>
<footprint-reference xlink:href="http://..."
geometry-type="Polygon" num-points="4632"
primary="true"/>
</footprints>
<classes>
<class thesaurus="ADL Feature Type Thesaurus"
primary="true">populated places</class>
</classes>
</gazetteer-standard-report>
</standard-reports>
</query-response>
</gazetteer-service> |
Finally, here's a possible error response to the above request:
<?xml version="1.0" encoding="UTF-8"?>
<gazetteer-service
xmlns="http://www.alexandria.ucsb.edu/gazetteer"
version="1.2">
<query-response>
<error>
<code>-908</code>
<description>Database connection failure.</description>
</error>
</query-response>
</gazetteer-service> |
The ADL gazetteer protocol is defined in terms of the relatively
simple abstract model given in Gazetteer
model above. In practice, however, gazetteer implementations will
typically be able to represent more elaborate information about
geographic places and model more complex relationships between and
among gazetteer entries and attributes. To allow clients to take
advantage of such information in a structured manner, the gazetteer
protocol defines two transfer formats for gazetteer entries: the
standard report and the extended report.
The extended report of a gazetteer entry is a
gazetteer-specific format; its actual structure is undefined by the
gazetteer protocol. The intention is that all of the information a
gazetteer possesses about an entry be representable by the format. If
a gazetteer supports extended reports, the report format must be
defined by an XML schema; see Capabilities
below.
The standard report of a gazetteer entry corresponds to
the abstract gazetteer model. An XML schema for the report format is
listed below. The schema defines element
<gazetteer-standard-report> in namespace
"http://www.alexandria.ucsb.edu/gazetteer". Subelements
<identifier>, <codes>,
<place-status>, <names>,
<footprints>, <classes>, and
<relationships> and element attributes
primary and status correspond directly to
the model.
For the convenience of gazetteer clients, the standard report
includes two additional required elements and one additional required
attribute. Element <display-name> is the entry's
primary name as it is commonly displayed, typically including
qualifications. For example, the display name for the city of Las
Vegas might be "Las Vegas, Clark County, Nevada". Element
<bounding-box> is the bounding box (i.e., the
smallest enclosing graticule-aligned rectangle) of the entry's primary
footprint. And in the <relationship> element, the
target-name attribute is the target gazetteer entry's
primary name. In a slight extension to the abstract gazetteer model,
the <relationship> element's
target-identifier attribute may be omitted, thereby
allowing a gazetteer entry to have a relationship to a place not
represented in the gazetteer.
Each footprint in a standard report may be described either
directly using a <footprint> element or indirectly
using a <footprint-reference> element. In the
direct case the footprint is defined as a single subelement (the
"footprint-defining element") of the <footprint>
element. In the indirect case, the footprint-defining element is
indirectly referred to by a URL, and the optional
geometry-type and num-points attributes can
be used to give clients an indication of the size and type of the
footprint. Attribute geometry-type, if present, must be
the unqualified XML name of the footprint-defining element and
num-points must be the number of points in the
geometry.
In both of the above cases, the possible footprint-defining
elements may be drawn from the Open GIS Consortium's Geography Markup
Language (GML), version 2, or from another geometry language
supported by the gazetteer; see Capabilities, below. Support for GML is
mandatory. GML's footprint-defining elements
(<gml:Box> and elements in class
gml:_Geometry) are defined in terms of an abstract
Cartesian coordinate system, but we mandate here that the coordinate
system must be the WGS84 latitude/longitude coordinate system.
Specifically, the first (X) coordinate must be longitude in signed
decimal degrees east of the Greenwich meridian and the second (Y)
coordinate must be latitude in signed decimal degrees north of the
equator. Longitudes must be in the range [-180,180] except in a
<gml:Box> element, where exactly one of the
longitudinal coordinates may be outside this range to indicate that
the box crosses the ±180 meridian.
| gazetteer-standard-report.xsd |
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:gaz="http://www.alexandria.ucsb.edu/gazetteer"
xmlns:gml="http://www.opengis.net/gml"
xmlns:xlink="http://www.w3.org/1999/xlink"
targetNamespace="http://www.alexandria.ucsb.edu/gazetteer"
elementFormDefault="qualified">
<include schemaLocation="gazetteer-types.xsd"/>
<import namespace="http://www.opengis.net/gml"
schemaLocation="geometry.xsd"/>
<import namespace="http://www.w3.org/1999/xlink"
schemaLocation="xlinks.xsd"/>
<attributeGroup name="qualifiers">
<attribute name="primary" type="boolean" default="false"/>
<attribute name="status" type="gaz:status-type"
default="current"/>
</attributeGroup>
<element name="gazetteer-standard-report">
<complexType>
<sequence>
<element name="identifier" type="string"/>
<element name="codes" minOccurs="0">
<complexType>
<sequence>
<element name="code" minOccurs="0"
maxOccurs="unbounded">
<complexType>
<simpleContent>
<extension base="string">
<attribute name="scheme" type="string"
use="required"/>
</extension>
</simpleContent>
</complexType>
</element>
</sequence>
</complexType>
</element>
<element name="place-status" type="gaz:status-type"/>
<element name="display-name" type="string"/>
<element name="names">
<complexType>
<sequence>
<element name="name" maxOccurs="unbounded">
<complexType>
<simpleContent>
<extension base="string">
<attributeGroup ref="gaz:qualifiers"/>
</extension>
</simpleContent>
</complexType>
</element>
</sequence>
</complexType>
</element>
<element name="bounding-box" type="gml:BoxType"/>
<element name="footprints">
<complexType>
<choice maxOccurs="unbounded">
<element name="footprint">
<complexType>
<choice>
<element ref="gml:_Geometry"/>
<element ref="gml:Box"/>
<element name="other-footprint">
<complexType>
<sequence>
<any processContents="lax"/>
</sequence>
</complexType>
</element>
</choice>
<attributeGroup ref="gaz:qualifiers"/>
</complexType>
</element>
<element name="footprint-reference">
<complexType>
<attributeGroup ref="xlink:locatorLink"/>
<attribute name="geometry-type">
<simpleType>
<restriction base="string">
<enumeration value="Box"/>
<enumeration value="Point"/>
<enumeration value="LineString"/>
<enumeration value="Polygon"/>
<enumeration value="MultiPoint"/>
<enumeration
value="MultiLineString"/>
<enumeration value="MultiPolygon"/>
<enumeration value="other"/>
</restriction>
</simpleType>
</attribute>
<attribute name="num-points"
type="positiveInteger"/>
<attributeGroup ref="gaz:qualifiers"/>
</complexType>
</element>
</choice>
</complexType>
</element>
<element name="classes" minOccurs="0">
<complexType>
<sequence>
<element name="class" minOccurs="0"
maxOccurs="unbounded">
<complexType>
<simpleContent>
<extension base="string">
<attribute name="thesaurus"
type="string" use="required"/>
<attributeGroup ref="gaz:qualifiers"/>
</extension>
</simpleContent>
</complexType>
</element>
</sequence>
</complexType>
</element>
<element name="relationships" minOccurs="0">
<complexType>
<sequence>
<element name="relationship" minOccurs="0"
maxOccurs="unbounded">
<complexType>
<attribute name="relation" type="string"
use="required"/>
<attribute name="target-name" type="string"
use="required"/>
<attribute name="target-identifier"
type="string"/>
</complexType>
</element>
</sequence>
</complexType>
</element>
</sequence>
</complexType>
</element>
</schema> |
Here's an example of a standard report with an indirect
footprint:
<?xml version="1.0" encoding="UTF-8"?>
<gazetteer-standard-report
xmlns="http://www.alexandria.ucsb.edu/gazetteer"
xmlns:gml="http://www.opengis.net/gml"
xmlns:xlink="http://www.w3.org/1999/xlink">
<identifier>1001652</identifier>
<codes>
<code scheme="FIPS 55-3">40000</code>
</codes>
<place-status>current</place-status>
<display-name>Las Vegas, Nevada</display-name>
<names>
<name primary="true">Las Vegas</name>
<name status="former">Sin City</name>
</names>
<bounding-box>
<gml:coord>
<gml:X>-115.25</gml:X>
<gml:Y>36.15</gml:Y>
</gml:coord>
<gml:coord>
<gml:X>-115.12</gml:X>
<gml:Y>36.25</gml:Y>
</gml:coord>
</bounding-box>
<footprints>
<footprint-reference xlink:href="http://..."
geometry-type="Polygon" num-points="4632"
primary="true"/>
</footprints>
<classes>
<class thesaurus="ADL Feature Type Thesaurus"
primary="true">cities</class>
</classes>
<relationships>
<relationship relation="principal-city-of"
target-name="Nevada" target-identifier="1241232"/>
</relationships>
</gazetteer-standard-report> |
The footprint corresponding to the above example might like
something like this:
<?xml version="1.0" encoding="UTF-8"?>
<Polygon xmlns="http://www.opengis.net/gml">
<outerBoundaryIs>
<LinearRing>
<coordinates>-115.12,36.25 -115.17,...</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon> |
The query service, described under Services above, returns all gazetteer entries
that satisfy one or more constraints placed against entry attributes.
The constraints are expressed in the form of a language.
The gazetteer query language consists of boolean combinations
(and, or, and and
not) of seven types of queries. Support for any given type
of query is optional. The query types are as follows:
identifier-query
identifier
-
Returns the gazetteer entry identified by identifier.
code-query [scheme]
code
-
Returns the gazetteer entry identified by code code. If
scheme is given, it indicates the code's scheme, and matching
occurs only against like codes; otherwise, matching occurs against all
codes. A code query in which the scheme is unsupported or
unrecognized by the gazetteer must not be treated as erroneous, but
should simply yield zero results.
place-status-query status
-
Returns all gazetteer entries whose place status matches
status, which must be former,
current, or proposed.
name-query operator
text
-
Returns all gazetteer entries having at least one name that matches
text according to text-matching operator operator.
If a gazetteer supports name queries, it must support the following
operator:
equals
- A gazetteer entry name matches text if it equals
text, ignoring insignificant differences in whitespace.
Other text-matching operators gazetteers are encouraged to support
include:
contains-all-words
- A gazetteer entry name matches text if it contains all of
the words in text. For example, entry name "San Luis Obispo"
matches text "obispo luis" under this operator.
contains-any-words
- A gazetteer entry name matches text if it contains any of
the words in text. For example, entry name "Hope Ranch"
matches text "hope" under this operator.
contains-phrase
- A gazetteer entry name matches text if it contains all of
the words in text in the same consecutive order. For
example, entry name "Black Forest Drive" matches text "forest drive"
under this operator, but entry names "Forest Lake Drive" and "Drive
Forest" do not.
matches-pattern
- A gazetteer entry name matches text if it matches
text when the latter is treated as a regular expression.
Specifically, an asterisk ("
*") in text matches
zero or more characters and a question mark ("?") matches
any single character. Note that a gazetteer implementation may limit
the regular expressions it accepts. For example, a gazetteer may
support right truncation only (i.e., it may accept asterisks only at
the end of text).
The semantics of all of the above operators have deliberately been
left somewhat fuzzy to accommodate differing implementations.
Specifically, exactly what constitutes a word is left undefined, and
it is unspecified whether the gazetteer implementation employs word
stemming or other fuzzy word matching techniques. In any case, the
above operators should be case-insensitive.
footprint-query
operator
{polygon|box|identifier}
-
Returns all gazetteer entries having a footprint that matches a
query region according to spatial operator operator. (If a
gazetteer entry has multiple footprints, it is unspecified by the
protocol which footprint(s) are used for matching.) The query region
may take any of the three forms listed next; note that support for any
given form is optional.
- polygon
- A simple polygon with geodesic edges, defined in WGS84
latitude/longitude coordinates.
- box
- A rectangle whose edges are aligned with the WGS84
latitude/longitude graticule.
- identifier
- One of the footprints of the gazetteer entry identified by
identifier (which footprint is unspecified).
If a gazetteer supports footprint queries, it must support the
following operator:
within
- A gazetteer entry footprint matches the query region if the
footprint is a subset of the region.
Other spatial operators gazetteers are encouraged to support
include:
contains
- A gazetteer entry footprint matches the query region if the
footprint is a superset of the region.
overlaps
- A gazetteer entry footprint matches the query region if the
footprint intersects the region.
A gazetteer implementation may limit the query regions it accepts.
For example, an implementation may disallow polygons that enclose a
pole. Also, an implementation may support matching on footprint
bounding boxes only.
class-query thesaurus
term
-
Returns all gazetteer entries belonging to class term, or
any subclass of term recursively (if the gazetteer supports
subclasses or thesaurus relationships), where term is a term
drawn from a thesaurus or simple vocabulary associated with the
gazetteer. For example, if class "capital cities" is a subclass
(i.e., a specialization) of class "cities", then a class query of
"cities" will return all cities (capital and not) whereas a query of
"capital cities" will return only capital cities.
relationship-query
relation target-identifier
-
Returns all gazetteer entries having relationship relation
to a target gazetteer entry identified by target-identifier.
Note that a gazetteer must not consider a relationship query with an
inappropriate target to be malformed or erroneous. For example,
suppose a gazetteer supports the capital-of relationship,
but only for target gazetteer entries that are countries. A
relationship query in which the target is a cemetery is not to be
considered malformed, but should simply yield zero results.
Clients should be aware that a gazetteer implementation may not be
able to search over all attribute values of a gazetteer entry. For
example, an implementation may be able to search over primary names
only.
An XML schema for the gazetteer query language is listed below.
The schema defines element <gazetteer-query> in
namespace "http://www.alexandria.ucsb.edu/gazetteer".
Subelements <identifier-query>,
<code-query>,
<place-status-query>,
<name-query>, <footprint-query>,
<class-query>, and
<relationship-query> correspond to the query types
described above. The elements <and>,
<or>, and <and-not> support
boolean combinations of queries.
Query regions in footprint queries may be specified using the Open GIS Consortium's Geography Markup
Language (GML), version 2, or another geometry language supported
by the gazetteer; see Capabilities, below.
Support for GML is mandatory. GML defines the
<gml:Box> and <gml:Polygon>
elements in terms of an abstract Cartesian coordinate system, but we
mandate here that the coordinate system must be the WGS84
latitude/longitude coordinate system. Specifically, the first (X)
coordinate must be longitude in signed decimal degrees east of the
Greenwich meridian and the second (Y) coordinate must be latitude in
signed decimal degrees north of the equator. Longitudes must be in
the range [-180,180] except in a <gml:Box> element,
where exactly one of the longitudinal coordinates may be outside this
range to indicate that the box crosses the ±180 meridian.
| gazetteer-query.xsd |
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:gaz="http://www.alexandria.ucsb.edu/gazetteer"
xmlns:gml="http://www.opengis.net/gml"
targetNamespace="http://www.alexandria.ucsb.edu/gazetteer"
elementFormDefault="qualified">
<include schemaLocation="gazetteer-types.xsd"/>
<import namespace="http://www.opengis.net/gml"
schemaLocation="geometry.xsd"/>
<element name="gazetteer-query">
<complexType>
<sequence>
<group ref="gaz:query"/>
</sequence>
</complexType>
</element>
<group name="query">
<choice>
<element ref="gaz:identifier-query"/>
<element ref="gaz:code-query"/>
<element ref="gaz:place-status-query"/>
<element ref="gaz:name-query"/>
<element ref="gaz:footprint-query"/>
<element ref="gaz:class-query"/>
<element ref="gaz:relationship-query"/>
<element ref="gaz:and"/>
<element ref="gaz:or"/>
<element ref="gaz:and-not"/>
</choice>
</group>
<element name="identifier-query">
<complexType>
<attribute name="identifier" type="string"
use="required"/>
</complexType>
</element>
<element name="code-query">
<complexType>
<attribute name="scheme" type="string"/>
<attribute name="code" type="string" use="required"/>
</complexType>
</element>
<element name="place-status-query">
<complexType>
<attribute name="status" type="gaz:status-type"
use="required"/>
</complexType>
</element>
<element name="name-query">
<complexType>
<attribute name="operator" use="required">
<simpleType>
<restriction base="string">
<enumeration value="contains-all-words"/>
<enumeration value="contains-any-words"/>
<enumeration value="contains-phrase"/>
<enumeration value="equals"/>
<enumeration value="matches-pattern"/>
</restriction>
</simpleType>
</attribute>
<attribute name="text" type="string" use="required"/>
</complexType>
</element>
<element name="footprint-query">
<complexType>
<choice>
<element ref="gml:Box"/>
<element ref="gml:Polygon"/>
<element name="identifier" type="string"/>
<element name="other-region">
<complexType>
<sequence>
<any processContents="lax"/>
</sequence>
</complexType>
</element>
</choice>
<attribute name="operator" use="required">
<simpleType>
<restriction base="string">
<enumeration value="contains"/>
<enumeration value="overlaps"/>
<enumeration value="within"/>
</restriction>
</simpleType>
</attribute>
</complexType>
</element>
<element name="class-query">
<complexType>
<attribute name="thesaurus" type="string"
use="required"/>
<attribute name="term" type="string" use="required"/>
</complexType>
</element>
<element name="relationship-query">
<complexType>
<attribute name="relation" type="string"
use="required"/>
<attribute name="target-identifier" type="string"
use="required"/>
</complexType>
</element>
<element name="and">
<complexType>
<sequence>
<group ref="gaz:query" maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
<element name="or">
<complexType>
<sequence>
<group ref="gaz:query" maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
<element name="and-not">
<complexType>
<sequence>
<group ref="gaz:query" minOccurs="2" maxOccurs="2"/>
</sequence>
</complexType>
</element>
</schema> |
An example of a gazetteer query is shown below. This example
requests all currently existing places whose names contain the phrase
"santa barbara" and that overlap a given spatial region, and that are
neither populated places nor cemeteries. A place named "Santa Barbara
County Hospital" might match such a query.
<?xml version="1.0" encoding="UTF-8"?>
<gazetteer-query
xmlns="http://www.alexandria.ucsb.edu/gazetteer"
xmlns:gml="http://www.opengis.net/gml">
<and-not>
<and>
<place-status-query status="current"/>
<name-query operator="contains-phrase"
text="santa barbara"/>
<footprint-query operator="overlaps">
<gml:Box>
<gml:coordinates>-140,30 110,35</gml:coordinates>
</gml:Box>
</footprint-query>
</and>
<or>
<class-query thesaurus="ADL Feature Type Thesaurus"
term="populated places"/>
<class-query thesaurus="ADL Feature Type Thesaurus"
term="cemeteries"/>
</or>
</and-not>
</gazetteer-query> |
The get-capabilities service described under Services above returns a description of a
gazetteer's overall capabilities. An XML schema for the description
is listed below. The schema defines element
<gazetteer-capabilities> in namespace
"http://www.alexandria.ucsb.edu/gazetteer". Within this
element are the following subelements:
<version>
- The version of the gazetteer protocol the gazetteer supports.
<name>
- The gazetteer's name, if it has one.
<description>
- Optionally, a human-readable description of the gazetteer. It is
suggested that the description include: the scope and purpose of the
gazetteer; details of the gazetteer's interpretation and
implementation of the protocol; appropriate usage guidelines; and
rights and liability clauses.
<ADL-collection-metadata>
- Optionally, the URL of the ADL
collection metadata for the gazetteer, which gives synoptic and
statistical views of the gazetteer's content.
<extended-report-schema>
- If the gazetteer supports extended reports, the URL of the
reports' XML schema.
<code-schemes>
- The code schemes the gazetteer supports. Each code scheme is
described by a name and, optionally, a URL that leads to a description
of the scheme.
<thesauri>
- The thesauri (or simple vocabularies) the gazetteer uses to
classify its entries. Each thesaurus is described by a name and the
URL of its ADL
Thesaurus Protocol interface.
<relationships>
- The names of the relationships the gazetteer is capable of
representing.
<other-geometry-languages>
- The geometry languages the gazetteer supports (other than GML, which is
required). Each language is described by an XML namespace.
<services>
- The services the gazetteer supports.
<maximum-query-results>
- If present, the maximum number of results the gazetteer returns in
response to a query; if absent or zero, query results are not
specifically limited in number (though they may still be limited by
other metrics, such as query processing time).
-
<query-types>
- The types of queries the gazetteer supports.
<name-query-operators>
- If the gazetteer supports name queries, the text-matching
operators the gazetteer supports.
<footprint-query-operators> and
<footprint-query-operands>
- If the gazetteer supports footprint queries, the spatial operators
and geometry types the gazetteer supports.
| gazetteer-capabilities.xsd |
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:gaz="http://www.alexandria.ucsb.edu/gazetteer"
xmlns:xlink="http://www.w3.org/1999/xlink"
targetNamespace="http://www.alexandria.ucsb.edu/gazetteer"
elementFormDefault="qualified">
<import namespace="http://www.w3.org/1999/xlink"
schemaLocation="xlinks.xsd"/>
<element name="gazetteer-capabilities">
<complexType>
<sequence>
<element name="version" type="string"/>
<element name="name" type="string" minOccurs="0"/>
<element name="description" type="string"
minOccurs="0"/>
<element name="ADL-collection-metadata" minOccurs="0">
<complexType>
<attributeGroup ref="xlink:locatorLink"/>
</complexType>
</element>
<element name="extended-report-schema" minOccurs="0">
<complexType>
<attributeGroup ref="xlink:locatorLink"/>
</complexType>
</element>
<element name="code-schemes" minOccurs="0">
<complexType>
<sequence>
<element name="scheme" minOccurs="0"
maxOccurs="unbounded">
<complexType>
<attribute name="name" type="string"
use="required"/>
<attributeGroup ref="xlink:simpleLink"/>
</complexType>
</element>
</sequence>
</complexType>
</element>
<element name="thesauri" minOccurs="0">
<complexType>
<sequence>
<element name="thesaurus" minOccurs="0"
maxOccurs="unbounded">
<complexType>
<attribute name="name" type="string"
use="required"/>
<attributeGroup ref="xlink:locatorLink"/>
</complexType>
</element>
</sequence>
</complexType>
</element>
<element name="relationships" minOccurs="0">
<complexType>
<sequence>
<element name="relationship" type="string"
minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>
</element>
<element name="other-geometry-languages"
minOccurs="0">
<complexType>
<sequence>
<element name="geometry-language" minOccurs="0"
maxOccurs="unbounded">
<complexType>
<attribute name="namespace" type="anyURI"/>
</complexType>
</element>
</sequence>
</complexType>
</element>
<element name="services">
<complexType>
<attribute name="get-capabilities" type="boolean"
fixed="true"/>
<attribute name="query" type="boolean"
default="false"/>
<attribute name="download" type="boolean"
default="false"/>
</complexType>
</element>
<element name="maximum-query-results"
type="nonNegativeInteger" minOccurs="0"/>
<element name="query-types" minOccurs="0">
<complexType>
<attribute name="identifier" type="boolean"
default="false"/>
<attribute name="place-status" type="boolean"
default="false"/>
<attribute name="name" type="boolean"
default="false"/>
<attribute name="footprint" type="boolean"
default="false"/>
<attribute name="class" type="boolean"
default="false"/>
<attribute name="relationship" type="boolean"
default="false"/>
</complexType>
</element>
<element name="name-query-operators" minOccurs="0">
<complexType>
<attribute name="contains-all-words"
type="boolean" default="false"/>
<attribute name="contains-any-words"
type="boolean" default="false"/>
<attribute name="contains-phrase"
type="boolean" default="false"/>
<attribute name="equals" type="boolean"
fixed="true"/>
<attribute name="matches-pattern"
type="boolean" default="false"/>
</complexType>
</element>
<element name="footprint-query-operators"
minOccurs="0">
<complexType>
<attribute name="contains" type="boolean"
default="false"/>
<attribute name="overlaps" type="boolean"
default="false"/>
<attribute name="within" type="boolean"
fixed="true"/>
</complexType>
</element>
<element name="footprint-query-operands"
minOccurs="0">
<complexType>
<attribute name="box" type="boolean"
default="false"/>
<attribute name="identifier" type="boolean"
default="false"/>
<attribute name="polygon" type="boolean"
default="false"/>
</complexType>
</element>
</sequence>
</complexType>
</element>
</schema> |
Here's an example of a gazetteer capabilities description:
<?xml version="1.0" encoding="UTF-8"?>
<gazetteer-capabilities
xmlns="http://www.alexandria.ucsb.edu/gazetteer"
xmlns:xlink="http://www.w3.org/1999/xlink">
<version>1.2</version>
<name>ADL Gazetteer</name>
<description>This gazetteer...</description>
<ADL-collection-metadata xlink:href="http://..."/>
<extended-report-schema xlink:href="http://..."/>
<code-schemes>
<scheme name="FIPS 55-3"
xlink:href="http://www.itl.nist.gov/fipspubs/fip55-3.htm"/>
</code-schemes>
<thesauri>
<thesaurus name="ADL Feature Type Thesaurus"
xlink:href="http://www.alexandria.ucsb.edu/..."/>
</thesauri>
<relationships>
<relationship>adjacent-to</relationship>
<relationship>capital-of</relationship>
</relationships>
<other-geometry-languages>
<geometry-language
namespace="http://www.esri.com/ArcXML"/>
</other-geometry-languages>
<services query="true"/>
<maximum-query-results>100</maximum-query-results>
<query-types identifier="true" name="true" footprint="true"
class="true"/>
<name-query-operators contains-all-words="true"
contains-any-words="true" contains-phrase="true"/>
<footprint-query-operators contains="true"/>
<footprint-query-operands box="true" identifier="true"/>
</gazetteer-capabilities> |
The gazetteer protocol is formally defined by a main XML schema:
and four subschemas:
the last of which is displayed below.
Applications that use this protocol can and should reference only
the main schema, as it implicitly includes the others. The canonical
URL prefix at which all protocol schemas reside is
"http://www.alexandria.ucsb.edu/gazetteer/protocol/" for
the latest version of the gazetteer protocol and
"http://www.alexandria.ucsb.edu/gazetteer/protocol/N/"
for version N specifically.
| gazetteer-types.xsd |
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:gaz="http://www.alexandria.ucsb.edu/gazetteer"
targetNamespace="http://www.alexandria.ucsb.edu/gazetteer"
elementFormDefault="qualified">
<simpleType name="report-format-type">
<restriction base="string">
<enumeration value="standard"/>
<enumeration value="extended"/>
</restriction>
</simpleType>
<simpleType name="status-type">
<restriction base="string">
<enumeration value="former"/>
<enumeration value="current"/>
<enumeration value="proposed"/>
</restriction>
</simpleType>
</schema> |
- 1.2
-
Gazetteer model: added the code and place status
attributes. Replaced the boolean historical qualifier with
the tri-valued status qualifier. Added a note on one-to-one
correspondence between gazetteer entries and conceptual places.
Services: removed the update services
(add-entry, relate-entries, and
remove-entry). Changed the success indicator from a nil
<error> subelement to the absence of an
<error> subelement.
Reports: added the <codes>,
<place-status>, and
<display-name> subelements. Replaced the
historical attribute with the tri-valued
status attribute. Renamed the
<relationship> subelement's name and
identifier attributes to relation and
target-identifier, respectively, and added the
target-name attribute. Relaxed the requirement that the
target of a relationship must be another gazetteer entry; it may now
be just a name. Added a note on the interpretation of out-of-range
longitudinal coordinates.
Query language: added two query types,
<code-query> and
<place-status-query>. Renamed the
<relationship-query> attributes to
relation and target-identifier. Added a
note on the interpretation of out-of-range longitudinal
coordinates.
Capabilities: added subelements
<name>,
<ADL-collection-metadata>,
<code-schemes>, and
<maximum-query-results>. Added a
place-status attribute to the
<query-types> subelement. Removed the
add-entry, relate-entries, and
remove-entry attributes from the
<services> subelement.
Schemas: new section.
Other: numerous documentation changes and
clarifications throughout.
- 1.1
- Swapped the interpretation of the GML first (X) and second (Y)
coordinates. Added a
<description> subelement to
the <gazetteer-capabilities> element.
- 1.0a
- Clarified the meaning of a gazetteer entry having more than one
footprint. Other, minor changes.
- 1.0
- Original version.
Greg
Janée
Created: 2003-09-19
Last modified: 2004-10-18 12:45
|