Provenance

Collection of Papers

Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about the data or thing's quality, reliability or trustworthiness.

The goal of PROV is to enable the wide publication and interchange of provenance on the Web and other information systems. PROV enables one to represent and interchange provenance information using widely available formats such as RDF and XML.

At its core is a conceptual data model (PROV-DM), which defines a common vocabulary used to describe provenance.

To help developers and users express valid provenance, a set of constraints (PROV-Constraints) are defined, which can be used to implement provenance validators. This is complimented by a formal semantics (PROV-SEM). Finally, to further support the interchange of provenance, additional specifications are provided for protocols to locate and access provenance (PROV-AQ), connect bundles of provenance descriptions (PROV-Links), represent dictionary style collections (PROV-Dictionary) and define how to interoperate with the widely used Dublin Core vocabulary (PROV-DC).

JSON-LD allows a semantic structure to be overlaid over a JSON structure, thereby enabling the conversion of JSON serializations into linked data.

PROV-JSONLD, a serialization of PROV that is compatible with PROV-DM and that addresses all of our 4 key requirements:

  • Lightweight: A serialization MUST support lightweight Web applications.
  • Natural: A serialization MUST look natural to its targeted community of users.
  • Semantic: A serialization MUST allow for semantic markup and integration with linked data applications.
  • Efficient: A serialization MUST be efficiently processable.

PROV Data Model

The PROV data model (PROV-DM) is a generic data model for provenance that allows domain and application specific representations of provenance to be translated into such a data model and interchanged between systems.

It distinguishes core structures from extended structures: core structures form the essence of provenance information, and are commonly found in various domain-specific vocabularies that deal with provenance or similar kinds of information. Extended structures enhance and refine core structures with more expressive capabilities to cater for more advanced uses of provenance.

Provenance describes the use and production of entities by activities, which may be influenced in various ways by agents.

An entity is a physical, digital, conceptual, or other kind of thing with some fixed aspects; entities may be real or imaginary. An activity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities.

Communication is the exchange of some unspecified entity by two activities, one activity using some entity generated by the other.

A derivation is a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity.

An agent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity. Attribution is the ascribing of an entity to an agent. An activity association is an assignment of responsibility to an agent for an activity, indicating that the agent had a role in the activity. Delegation is the assignment of authority and responsibility to an agent (by itself or by another agent) to carry out a specific activity.