|
Simple Ingest Protocol (SIP)
Draft 0.1
This is a simple protocol that supports submission of records to a
library and asynchronous notification of the acceptance or rejection
of those records.
Contents
An ingest server is a web service, operating on behalf of
a library, that accepts from distributed submitters
library content in the form of discrete records.
Our notion of "library" is very broad here, and includes any kind
of database, repository, service, etc., that stores and exerts
curatorial power over discrete records. A record may be any XML
document that has a format acceptable to the library and that has a
library-assigned identifier.
A submitter sends to an ingest server an ingest request
containing a record and, optionally, contact and identification
information related to the record and its source. In return, the
submitter (synchronously) receives an ingest disposition.
The disposition may be accepted, indicating the record was
added to the library and in which case the record identifier assigned
by the library is returned; or rejected, in which case a
failure reason may be returned; or provisionally accepted, in
which case the ultimate disposition will be sent asynchronously (i.e.,
at some future time) to a notification recipient identified
by the submitter.
We first define the XML document formats utilized by the protocol.
For brevity namespace declarations have been elided in the definitions
below, but all XML elements should be understood to reside in
namespace "http://www.alexandria.ucsb.edu". sip.xsd [TBD] is an XML
schema that defines the XML formats; sip.dtd
[TBD] is a roughly equivalent XML DTD.
<ingest-properties>
-
Ingest server properties. <notification-style>
is the server's notification style: if "synchronous",
ultimate record dispositions are always returned synchronously; if
"asynchronous", ultimate record dispositions are returned
synchronously or asynchronously depending on the errors encountered in
the record and the processing required. <formats>
lists one or more record formats accepted by the server. Each format
is expressed as the URL of the format's XML schema.
<!ELEMENT ingest-properties (notification-style,
formats)>
<!ELEMENT notification-style (#PCDATA)>
<!-- "synchronous" or "asynchronous" -->
<!ELEMENT formats (format+)>
<!ELEMENT format (#PCDATA)>
For example:
<ingest-properties>
<notification-style>synchronous</notification-style>
<formats>
<format>http://.../myschema.xsd</format>
</formats>
</ingest-properties>
<ingest-request>
-
A request to ingest a single record into the library. The record,
enclosed within the <record> element, may be any
XML content, subject only to the restrictions that it appear as a
single XML element and that it adhere to one of the formats supported
by the ingest server. <source> describes the
source of the record for identification and contact purposes. Within
<source>, <submitter> describes,
by a name and email address, the record's source institution, project,
or person; <sequence> names the sequence of records
the submitted record is a member of; and
<source-identifier> is the source's identifier for
the record. All elements within <source>, and
<source> itself, are optional.
<notification-recipient>, if present, is the URL of
the notification recipient for the request.
<!ELEMENT ingest-request (source?,
notification-recipient?, record)>
<!ELEMENT source (submitter?, sequence?,
source-identifier?)>
<!ELEMENT submitter (name, email-address)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email-address (#PCDATA)>
<!ELEMENT sequence (#PCDATA)>
<!ELEMENT source-identifier (#PCDATA)>
<!ELEMENT notification-recipient (#PCDATA)>
<!ELEMENT record ANY>
<!-- any single element -->
For example:
<ingest-request>
<source>
<submitter>
<name>Stanford DL Project</name>
<email-address>...@stanford.edu</email-address>
</submitter>
<sequence>Campus Buildings</sequence>
<source-identifier>145</source-identifier>
</source>
<notification-recipient>http://...</notification-recipient>
<record>
<ADL_gazetteer_entry xmlns="...">
...
</ADL_gazetteer_entry>
</record>
<ingest-request>
<ingest-disposition>
-
The disposition of an ingest request. <source>
repeats the source information from the original request, to the
extent present. If the record was accepted,
<assigned-identifier> is the record's identifier as
assigned by the library. If the record was rejected,
<reason> says why.
<!ELEMENT ingest-disposition (source?, (accepted |
provisionally-accepted | rejected)>
<!ELEMENT accepted (assigned-identifier)>
<!ELEMENT assigned-identifier (#PCDATA)>
<!ELEMENT provisionally-accepted EMPTY>
<!ELEMENT rejected (reason?)>
<!ELEMENT reason (#PCDATA)>
For example:
<ingest-disposition>
<source>
<submitter>
<name>Stanford DL Project</name>
<email-address>...@stanford.edu</email-address>
</submitter>
<sequence>Campus Buildings</sequence>
<source-identifier>145</source-identifier>
</source>
<rejected>
<reason>Data value out of range...</reason>
</rejected>
</ingest-disposition>
All protocol operations are stateless. An ingest server provides
the following two operations.
- ingest-properties
<-
get-properties()
Returns the server's properties.
- ingest-disposition
<-
ingest(ingest-request)
Accepts and processes an ingest request. The return document
indicates the disposition of the request; if the disposition is
provisionally accepted, the ultimate disposition will be
delivered at a future time to the notification recipient specified in the
request.
A notification recipient provides the following operation.
notify(ingest-disposition)
Accepts an ingest disposition. The disposition must not be
provisionally accepted.
The SOAP binding of this
protocol is largely defined by the above XML formats. We need only
note that documents are passed using document-style encoding, and that
notification recipient URLs must use either the "http" or
"smtp" protocols, corresponding to the respective
well-known SOAP transport machanisms.
Greg
Janée
Created: 2003-02-01
Last modified: 2004-10-18 10:13
|