11 Paths and Locators

11 Paths and Locators

11.1 Overview

The openEHR architecture includes a path mechanism that enables any node within a top level structure to be specified from the top of the structure using a "semantic" (i.e. archetype-based) X-path compatible path. The availability of such paths radically changes the available querying possibilities with health information, and is one of the major distinguishing features of openEHR.

Technically, the combination of a path and a Version identifier such as OBJECT_VERSION_ID forms a "globally qualified node reference" which can be expressed using LOCATABLE_REF. It can also be expressed in portable URI form as a DV_EHR_URI, known as a "globally qualified node locator". Either representation enables any openEHR data node to be referred to from anywhere. This section describes the syntax and semantics of paths, and of the URI form of reference. In the following, the term "archetype path" means a path extracted from an archetype, while "data path" means one that identifies an item in data. They are no different formally, and this terminology is only used to indicate where they are used.

11.2 Paths

11.2.1 Basic Syntax

Paths in openEHR are defined in an Xpath¹-compatible syntax which is a superset of the path syntax described in the Archetype Definition Language (ADL). The syntax is designed to be easily mappable to Xpath expressions, for use with openEHR-based XML.

The data path syntax used in locator expressions follows the general pattern of a path consisting of segments each consisting of an attribute name², and separated by the slash (`/') character, i.e.:
attribute_name / attribute_name / ... / attribute_name
 
Paths select the object which is the value of the final attribute name in the path, when going from some starting point in the tree and following attribute names given in the path. The starting point is indicated by the initial part of the path, and can be specified in two ways:

relative path: path starts with an attribute name, and the starting point is the current point in the tree (given by some previous operation or knowledge);
absolute path: path starts with a `/'; the starting point is the top of the structure.

In addition, the "//" notation from Xpath can be used to define a path pattern:

path pattern: path starts with or contains a the symbol `//' and is taken to be a pattern which can match any number of path segments in the data; the pattern is matched if an actual path can be found anywhere in the structure for which part of the path matches the path section before the `//' symbol, and a later section matches the section appearing after the `//'.

11.2.2 Predicate Expressions

Overview

Paths specified solely with attribute names are limited in two ways. Firstly, they can only locate objects in structures in which there are no containers such as lists or sets. However, in any realistic data, including most openEHR data, list, set and hash structures are common. Additional syntax is needed to match a particular object from among the siblings referred to by a container attribute. This takes the form of a predicate expression enclosed in brackets (`[]') after the relevant attribute in a segment, i.e.:
attribute_name [predicate expression]
 
The general form of a path then resembles the following:
attribute_name / attribute_name [predicate expression] / ...
 
Here, predicate expressions are used optionally on those attributes defined in the reference model to be of a container type (i.e. having a cardinality of > 1). If a predicate expression is not used on a container attribute, the whole container is selected. Note that predicate expressions are often possible even on single-valued attributes, and technically can be used (e.g. if generic path-processing software can't tell the difference) but are not required.

The second limitation of basic paths is that they cannot locate objects based on other conditions, such as the object having a child node with a particular value. To address this, predicate expressions can be used to select an object on the basis of other conditions relative to the object, by including boolean expressions including paths, operators, values and parentheses. The syntax of predicate expressions used in openEHR is a subset of the Xpath syntax for predicates with a small number of short-cuts.

Archetype path Predicate

The most important predicate uses the archetype_node_id value (inherited from LOCATABLE) to limit the items returned from a container, such as to certain ELEMENTs within a CLUSTER. The shortcut form allows the archetype code to be included on its own as the predicate, e.g. [at0003]. This shortcut corresponds to using an archetype path against the runtime data. A typical archetype-derived path is the following (applied to an Observation instance):
/data/events[at0003]/data/items[at0025]/value/magnitude
 
This path refers to the magnitude of a 1-minute Apgar total in an Observation containing a full Apgar result structure. In this path, the [atNNNN] predicates are a shortcut for [@archetype_node_id = `atNNNN'] in standard Xpath. Note that while an archetype path is always unique in an archetype, it can correspond to more than one item in runtime data, due to the repeated use of the same archetype node within a container.

Name-based Predicate

In order to create a guaranteed unique data path, predicates can also include the name value (also inherited from LOCATABLE) as well as the archetype_node_id value. The standard Xpath form of this expression is exemplified by the following:
/data/events[at0001 and name/value=`standing']
 
Since the combination of an archetype node identifier and a name value is very common in archetyped databases, a shortcut is also available for the name/value expression, which is to simply include the value after a comma as follows:
/data/events[at0001, `standing']
 
Other Predicates

Other predicates can be used, based on the value of other attributes such as ELEMENT.name or EVENT.time. Combinations of the archetype_node_id and other such values are commonly used in querying, such as the following path fragment (applied to an OBSERVATION instance):
/data/events[at0007 AND time >= "24-06-2005 09:30:00"]
 
This path would choose Events in Observation.data whose archetype_node_id meaning is "summary event" (at0007 in some archetype) and which occurred at or after the given time. The following example would choose an Evaluation containing a diagnosis (at0002.1) of "other bacterial intestinal infections" (ICD10 code A04):
/data/items[at0002.1 
 
	AND 	value/defining_code/terminology_id/value = "ICD10AM" 
 
	AND value/defining_code/code_string = "A04"]
 
11.2.3 Paths within Top-level Structures

Paths within top-level structures strictly adhere to attribute and function names in the relevant parts of the reference model. Predicate expressions are needed to distinguish multiple siblings in various points in paths into these structures, but particularly at archetype "chaining" points. A chaining point is where one archetype takes over from another as illustrated in FIGURE 30. Chaining points in Compositions occur between the Composition and a Section structure, potentially between a Section structure and other sub-Section structures (constrained by a different Section archetype), and between either Compositions or Section structures, and Entries. Chaining might also occur inside an Entry, if archetyping is used on lower level structures such as Item_lists etc. Most chaining points correspond to container types such as List<T> etc., e.g. COMPOSITION.content is defined to be a List<CONTENT_ITEM>, meaning that in real data, the content of a Composition could be a List of Section structures. To distinguish between such sibling structures, predicate expressions are used, based on the archetype_id. At the root point of an archetype in data (e.g. top of a Section structure), the archetype_id carries the identifier of the archetype used to create that structure, in the same manner as any interior point in an archetyped structure has an archetype_node_id attribute carrying archetype node_id values. The chaining point between Sections and Entries works in the same manner, and since multiple Entries can occur under a single Section, archetype_id predicates are also used to distinguish them. The same shorthand is used for archetype_id predicate expressions as for archetype_node_ids, i.e. instead of using [@archetype_id = "xxxxx"], [xxxx] can be used instead.

The following paths are examples of referring to items within a Composition:
/content[openEHR-EHR-SECTION.vital_signs.v1 and name/value='Vital signs']/
 
	items[openEHR-EHR-OBSERVATION.heart_rate-pulse.v1 and name/value='Pulse']/
 
	data/events[at0003 and name/value='Any event']/data/items[at1005]
 
/content[openEHR-EHR-SECTION.vital_signs.v1 and name/value='Vital signs']/
 
	items[openEHR-EHR-OBSERVATION.blood_pressure.v1 and 
 
	name/value='Blood pressure']/data/events[at0006 and name/value='any event']/
 
	data/items[at0004]
 
/content[openEHR-EHR-SECTION.vital_signs.v1, 'Vital signs']/
 
	items[openEHR-EHR-OBSERVATION.blood_pressure.v1, 'Blood pressure']/
 
	data/events[at0006, 'any event']/data/items[at0005]
 
Paths within the other top level types follow the same general approach, i.e. are created by following the required attributes down the hierarchy.

11.2.4 Data Paths and Uniqueness

Archetype paths are not guaranteed to uniquely identify items in data, due to the fact that one archetype node may correspond to multiple instances in the data. However it is often necessary to be able to construct a unique path to an item in real data. This can be done by using attributes other than archetype_node_id in path predicates. Consider as an example the following OBSERVATION archetype:
OBSERVATION[at0000] matches {   							-- blood pressure measurement
 
	data matches {			
 
		HISTORY matches {					
 
			events {1..*} matches { 	
 
				EVENT[at0006] {0..1}  matches {								-- any event
 
					name matches {DV_TEXT			 matches {...}}
 
					data matches {	
 
						ITEM_LIST[at0003] matches {						-- systemic arterial BP
 
							count matches {2..*}
 
							items matches {				
 
								ELEMENT[at0004] matches {			-- systolic BP
 
									name matches {DV_TEXT			 matches {...}}
 
									value matches {magnitude matches {...}}
 
								} 
 
								ELEMENT[at0005] matches {			-- diastolic BP
 
									name matches {DV_TEXT			 matches {...}}
 
									value matches {magnitude matches {...}}
 
								}
 
							}
 
						}
 
					}
 
				}
 
			}
 
		}
 

 
The following path extracted from the archetype refers to the systolic blood pressure magnitude:
/data/events[at0006]/data/items[at0004]/value/magnitude
 
The codes [atnnnn] at each node of the archetype become the archetype_node_ids found in each node in the data.

Now consider an OBSERVATION instance (expressed here in dADL format), in which a history of two blood pressures has been recorded using this archetype:
<	-- OBSERVATION - blood pressure measurement
 
	archetype_node_id = <[openEHR-EHR-OBSERVATION.blood_pressure.v1]>
 
	name = <value = <"BP measurement">>
 
	data = <									-- HISTORY	
 
		archetype_node_id = <[at0001]>
 
		origin = <2005-12-03T09:22:00>
 
		events = <									-- List <EVENT>
 
			[1]	 = <								-- EVENT
 
				archetype_node_id = <[at0006]>
 
				name = <value = <"sitting">>
 
				time = <2005-12-03T09:22:00>
 
				data = <									-- ITEM_LIST	
 
					archetype_node_id = <[at0003]>
 
					items = <								-- List<ELEMENT>	
 
						[1]	 = <
 
							name = <value = <"systolic">>
 
							archetype_node_id = <[at0004]>
 
							value = <magnitude = <120.0> ...>
 
						>
 
						[2]	 = <....
 
							name = <value = <"diastolic">>
 
							archetype_node_id = <[at0005]>
 
							value = <magnitude = <80.0> ...>
 
						>
 
					>
 
				>
 
			>
 
			[2]	 = <								-- EVENT
 
				archetype_node_id = <[at0006]>
 
				name = <value = <"standing">>
 
				time = <2005-12-03T09:27:00>
 
				data = <									-- ITEM_LIST	
 
					archetype_node_id = <[at0003]>
 
					items = <								-- List<ELEMENT>	
 
						[1]	 = <
 
							name = <value = <"systolic">>
 
							archetype_node_id = <[at0004]>
 
							value = <magnitude = <105.0> ...>
 
						>
 
						[2]	 = <
 
							name = <value = <"diastolic">>
 
							archetype_node_id = <[at0005]>
 
							value = <magnitude = <70.0> ...>
 
						>
 
					>
 
				>
 
			>
 
		>
 
>
 
[Note: in the above example, name values are shown as if they were all DV_TEXTs, whereas in reality in openEHR they more likely to be DV_CODED_TEXT instances; either is allowed by the archetype. This has been done to reduce the size of the example, and makes no difference to the paths shown below].

The archetype path mentioned above matches both systolic pressures in the recording. In many querying situations, this may be exactly what is desired. However, to uniquely match each of the systolic pressure nodes, paths need to be created that are based not only on the archetype_node_id but also on another attribute. In the case above, the name attribute provides uniqueness. Guaranteed unique paths to the systolic and diastolic pressures of each event (sitting and standing measurements) are available using the following expressions (identical in Xpath)³:
/data/events[1]/data/items[1]/value/magnitude
 
/data/events[1]/data/items[2]/value/magnitude
 
/data/events[2]/data/items[1]/value/magnitude
 
/data/events[2]/data/items[2]/value/magnitude
 
More expressive unique paths based on archetype paths are also possible, as follows:
/data/events[at0006, `sitting']/data/items[at0004]/value/magnitude
 
/data/events[at0006, `sitting']/data/items[at0005]/value/magnitude
 
/data/events[at0006, `standing']/data/items[at0004]/value/magnitude
 
/data/events[at0006, `standing']/data/items[at0005]/value/magnitude
 
Each of these paths has an Xpath equivalent of the following form:
/data/events[@archetype_node_id=`at0006' and name/value=`standing']
 
	/data/items[@archetype_node_id=`at0004']
 
	/value/magnitude
 
As a general rule, one or more other attribute values in the runtime data will uniquely identify any node in openEHR data. To make construction of unique paths easier, the value of the name attribute (inherited from the LOCATABLE class), is required to be unique with respect to the name values of sibling nodes. This has two consequences as follows:

a guaranteed unique path can always be constructed to any data item in openEHR data using a combination of archetype_node_id and name values (as shown in the example paths above);
the name value may be systematically defined to be a copy of one or more other attribute values. For example, in an EVENT object, name could clearly be a string copy of the time attribute.

11.3 EHR URIs

There are two broad categories of URIs that can be used with any resource: direct references, and queries. The first kind are usually generated by the system containing the referred-to item, and passed to other systems as definitive references, while the second are queries from the requesting system in the form of a URI.

11.3.1 EHR Reference URIs

To create a reference to a node in an EHR in the form of a URI (uniform resource identifier), three elements are needed: the path within a top-level structure, a reference to a top-level structure within an EHR, and a reference to an EHR. These can be combined to form a URI in an "ehr" scheme-space, obeying the following syntax:
ehr://ehr_locator/top_level_structure_locator/path_inside_top_level_structure
 
In this way, any object in any openEHR EHR is addressable via a URI. Within ehr-space, URL-style references to particular servers, hosts etc are not used, due to not being reliable in the long term. Instead, logical identifiers for EHRs and/or subjects are used, ensuring that URIs remain correct for the lifetime of the resources to which they refer. The openEHR data type DV_EHR_URI is designed to carry URIs of this form, enabling URIs to be constructed for use within LINKs and elsewhere in the openEHR EHR.

An ehr:// URI implies the availability of a name resolution mechanism in ehr-space, similar to the DNS, which provides such services for http-, ftp- and other well-known URI schemes. Until such services are established, ad hoc means of dealing with ehr:// URIs are likely to be used, as well as more traditional http:// style references. The subsections below describe how URIs of both kinds can be constructed.

EHR Location

In ehr-space, a direct locator for an EHR is an EHR identifier as distinct from a subject or patient identifier or query. Normally the copy in the `local system' is the one required, and a majority of the time, may be the only one in existence. In this case, the required EHR can be identified simply by an unqualified identifier, giving a URI of the form:
ehr://1234567/
 
However, due to copying / synchronising of the EHR for one subject among multiple EHR systems, a given EHR identifier may exist at more than one location. It is not guaranteed that each such EHR is a completely identical copy of the others, since partial copying is allowed. Therefore, in an environment where EHR copies exist, and there is a need to identify exactly which EHR instance is required, an system identifier is also required, giving a URI of the form:
ehr://1234567@rmh.nhs.net/
 
Top-level Structure Locator

There are two logical ways to identify a top-level structure in an openEHR EHR. The first is via the combination of the identifier of the required top-level object and the version time (i.e. `system' or `commit' time). The former can be done in a number of ways, including via the use of the uid of the relevant VERSIONED_OBJECT, or via archetype identifiers, or names. This would lead to URIs like the following:
ehr://1234567/87284370-2D4B-4e3d-A3F3-F303D2F4F34B@latest_trunk_version -- a VO Guid
 
ehr://1234567/87284370-2D4B-4e3d-A3F3-F303D2F4F34B@2005-08-02T04:30:00 -- using time
 
The second way to identify a top-level structure is by using an exact Version identifier, i.e. the standard openEHR Version identifier, which takes the form version_object_uid::version_tree_id::creating_system_id. This leads to URIs like the following:
ehr://1234567/87284370-2D4B-4e3d-A3F3-F303D2F4F34B::rmh.nhs.net:2
 

 
ehr://1234567/87284370-2D4B-4e3d-A3F3-F303D2F4F34B::F7C5C7B7-75DB-
 
	4b39-9A1E-C0BA9BFDBDEC:2
 
The first URI identifies a top-level item whose version identifier is 87284370-2D4B-4e3d-A3F3-F303D2F4F34B::rmh.nhs.net:2, i.e. the second trunk version of the Versioned Object indentified by the Guid, created at an EHR system identified by net.nhs.rmh. The second is the same, but another Guid is used to identify the creating system as well. Note that the mention of a system in the version identifier does not imply that the requested EHR is at that system, only that the top-level object being sought was created at that system.

If no Version identifier is mentioned, `latest_trunk_version' is always assumed, as per the following:
ehr://1234567/87284370-2D4B-4e3d-A3F3-F303D2F4F34B
 
Item URIs

With the addition of path expressions as described earlier, URIs can be constructed that refer to the finest grained items in the openEHR EHR, such as the following:
ehr://1234567/87284370-2D4B-4e3d-A3F3-F303D2F4F34B@latest_trunk_version/
 
	content[openEHR-EHR-SECTION.vital_signs.v1]/
 
	items[openEHR-EHR-OBSERVATION.heart_rate-pulse.v1]/data/
 
	events[at0006, 'any event']/data/items[at0004]
 
Relative URIs

URIs can also be constructed relative to the current EHR, in which case they do not mention the EHR id, as in the following example:
ehr:///87284370-2D4B-4e3d-A3F3-F303D2F4F34B@latest_version/
 
	content[openEHR-EHR-SECTION.vital_signs.v1]/
 
	items[openEHR-EHR-OBSERVATION.blood_pressure.v1]/
 
	data/events[at0006, 'any event']/data/items[at0004]
 
¹See W3C Xpath 1.0 specification, 1999. Available at http://www.w3.org/TR/xpath.

²In all openEHR documentation, the term "attribute" is used in the object-oriented sense of "property of an object", not in the XML sense of named values appearing within a tag. The syntax described here should not be considered to necessarily have a literal mapping to XML instance, but rather to have a logical mapping to object-oriented data structures.

³the notation attr[1] is a Xpath shorthand for attr[position() = 1]

openEHR Foundation
http://www.openEHR.org