The Open Provenance Model
Introduction
Interest for provenance in the “e-science community” is also growing, since provenance is perceived as a crucial component of workflow systems.
Open Provenance Model, a model for provenance which meets the following requirements:
- To allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model.
- To allow developers to build and share tools that operate on such provenance model.
- To define the model in a precise, technology-agnostic manner.
- To support a digital representation of provenance for any “thing”, whether pro- duced by computer systems or not.
- To define a core set of rules that identify the valid inferences that can be made on provenance graphs.
Entities
Our primary is concern is to be able to represent how “things”, whether digital data such as simulation results, physical objects such as cars, or immaterial entities such as decisions.
Hence, from the perspective of provenance, we introduce the concept of an artifact as an immutable piece of state; likewise, we introduce the concept of a process as actions resulting in new artifacts. A process usually takes place in some context, which enables or facilitates its execution.
Definition 1 (Artifact) Immutable piece of state, which may have a physical embodiment in an physical object, or a digital representation in a computer system.
Definition 2 (Process) Action or series of actions performed on or caused by artifacts, and resulting in new artifacts.
Definition 3 (Agent) Contextual entity acting as a catalyst of a process, enabling, facilitating, controlling, affecting its execution.
Dependencies
A provenance graph aims to capture the causal dependencies between the abovementioned entities. Therefore, a provenance graph is defined as a directed graph, whose nodes are artifacts, processes and agents, and whose edges belong to one of following categories.
Definition 4 (Causal Relationship) A causal relationship is represented by an arc and denotes the presence of a causal dependency between the source of the arc (the effect) and the destination of the arc (the cause). Five causal relationships are recognized: a process used an artifact, an artifact was generated by a process, a process was triggered by a process, an artifact was derived from an artifact, and a process was controlled by an agent.
Definition 5 (Artifact Used by a Process) In a graph, connecting a process to an artifact by a used edge is intended to indicate that the process required the availability of the artifact to complete its execution.
Definition 6 (Artifacts Generated by Processes) In a graph, connecting an artifact to a process by an edge wasGeneratedBy is intended to mean that the process was required to initiate its execution for the artifact to be generated.
Definition 7 (Process Triggered by Process) A connection of a process P2 to a pro- cess P1 by a “was triggered by” edge indicates that the start of process P1 was required for P2 to be able to complete.
Definition 8 (Artifact Derived from Artifact) An edge “was derived from” between two artifacts A1 and A2 indicates that artifact A1 may have been used by a process that derived A2.
Definition 9 (Process Controlled by Agent) The assertion of an edge “was controlled by” between a process P and an agent Ag indicates that a start and end of process P was controlled by agent.
Roles
A role is an annotation on used, wasGeneratedBy and wasControlledBy.
Definition 10 (Role) A role designates an artifact’s or agent’s function in a process.