Identifiers as History


The NGDA archive data model specifies that archival object identifiers must be absolute, fragment-less URIs. Here is one approach to creating such identifiers.

In the context of a long-term archive that obtains objects from external providers and assigns identifiers to those objects at the time of ingest, and that subsequently maintains associations between identifiers and objects in perpetuity, the basic requirements on identifiers are as follows:

  • Identifiers must be universally unique.
  • Identifiers must remain universally unique. No effort must be required to maintain identifier uniqueness.

Additional desirable characteristics of identifiers:

  • Minting identifiers should be trivial. In particular, no database or other centralized online system should be required to mint an identifier, or to satisfy the above requirements regarding uniqueness.
  • Bidirectionally mapping between archive-assigned identifiers and provider identifiers should be easy (at ingest time, that is).

In light of the above requirements and desirable characteristics, here is a syntax for identifiers and a method for minting them:

  • Identifiers use the "tag" URI scheme.
  • The tagging entity identifies the archive. In NGDA's case, the tagging entity is "ngda.org,2005".
  • The entity-specific portion of the identifier begins with "oid:".
  • Following that is, recursively, a "tag" URI that identifies the provider and object.
    • In this nested identifier, the tagging entity identifies the provider.
    • The date in the tagging entity should, if possible, reflect the approximate date of ingest. (Per the rules of "tag" URIs, the date may be any point in time at which the provider owned its domain name. Thus this rule will be impossible to satisfy only if the provider does not exist at the time of ingest.)
    • The entity-specific portion of the identifier is the provider's identifier for the object at the time of ingest.

Here's an example of such an identifier:

tag: ngda.org,2005 :oid: gis.ca.gov,2006 : doqq/c32114e4ne
archive provider provider's
object
identifier

Such identifiers are effectively a compact represention of a piece of history. The above identifier identifies an object (a DOQQ image, in this hypothetical case) that was ingested from CaSIL (gis.ca.gov) by NGDA in 2006; at the time of ingest, the object was identified by CaSIL as "doqq/c32114e4ne".


Greg Janée
Last modified: 2006-02-23 21:24