Machine-readable model representations of openEHR

Introduction

One of the key needs in any open computing environment is a computable representation of its own models, for a number of purposes, including reasoning about them, performing validation and consistency checking, building software and generating documentation. This is particularly true of openEHR, where a further need is to be able to validate archetypes and templates with respect to the reference model, and also to validate runtime instance data against operational templates and the reference model.

A number of computable representations of the openEHR published models have been available in the past. Unfortunately the standards and tools available for model representation in ICT in general remain problematic. Solutions for machine-readable models in openEHR are therefore still being sought.

Requirements

The key requirements in this area are as follows:

  • a primary computable expression of the reference model as published in the specification documents
    • functionally, the formalism and tooling need to support static class models, including generic classes, multiple inheritance, class invariants, pre-conditions and post-conditions (i.e. 'design by contract' or DbC elements)
    • ideally this expression would be defined either
      • within a UML tool, and exported in a usable form; or
      • within some other editor that makes it easy to maintain.
    • this expression should be the basis for published documentation of the specifications;
    • this expression should also support machine validation / compilation to ensure the model is valid.
  • one or more data serialisation forms, derived from the primary expression
    • at least XML needs to be supported, but other formats such as JSON and the openEHR dADL may also be supported.

Current state of the art - UML, XMI, XML-schema

One would think that the ICT industry would have computable representations of models sorted out by now, but it is not yet the case. UML 2.x and its associated serialisation format XMI 2.x should in theory mean that complete, interoperable machine expressions of object models would be available in all tools. It will probably eventually come to pass, but a number of problems have prevented easy success so far:

  • the UML 2.x specifications are exceedingly complex (see OMG site for specifications; UML 2.1.2 'superstructure' = 738pp; UML 2.1.2 'infrastructure' = 224pp);
  • the XMI 2.x specification is correspondingly complex, which seems to have so far prevented reliable tool interoperability;
  • the Design By Contract (DbC) concept is supported via the use of OCL 2.0 constraints in class models, although there are semantic holes, e.g. proper semantics through inheritance.
  • for some, XML-schema represents a way of expressing object models, but in fact is not semantically suitable for this purpose. It can be and is used (including in openEHR) as a derivative serialisation representation, but from the point of view of specification, it is deficient in having non-object-oriented inheritance semantics, no generic classes, no representation of non-data members, and only marginal support for design by contract.

Previous experience with various UML tools (up till mid-2007) highlighted the following problems:

  • poor support for OCL and design by contract
  • very poor support for generic (i.e. template) classes, particularly on XSD generation, where the results were wildly wrong in some tools
  • in some tools,  it was impossible to define an abstract formal model. The only option was to select a programming language profile such as Java or C# and thus get locked into the limitations of those languages (messy type systems, weak inheritance semantics, language-specific notion of types such as Array<T>, List<T>, etc.).

We have not conducted a more recent review, so do not have a clear picture of tool support and reliable interoperability based on UML 2.1.2 as of 2009, although it does not appear that XMI is supported or implemented consistently across vendors.

Current Situation in openEHR

As of early 2009, the situation is as follows.

Specifications

The specification documents (PDF, based on  manually edited FrameMaker documents) are the authoritative expression, but of course are not computable. This is not an ideal state of affairs at all, but at the current point of stability is in fact sustainable. The reason for this situation is as follows:

  • as of around mid-2007, when the last investigation was made into UML tools, none of them a) properly (or at all) supported DbC or b) could publish textual or graphical renditions of the models at anything like the quality achievable in the hand-built documentation. Since one of our greatest priorities is clarity, so far it has seemed better to remain with the specification documents as the primary authoritative source.

The semantics expressed in the specifications include the following elements for each class in the reference model:

  • name of class, may be generic, e.g. OBSERVATION, DV_INTERVAL<T->DV_ORDERED>
  • inheritance, may be multiple, but only one branch is assumed to correspond to substitutability
  • description
  • use - description of intended uses
  • misuse - possible misuses of the class 
  • list of abstract features, constants, functions and/or attributes; for each:
    • formal, typed signature of feature
    • pre-conditions, where applicable
    • post-conditions, where applicable
    • description
    • existence marker (duplicates invariants)
  • invariant list, including statements expressing
    • existence of attributes
    • cardinality of container attributes
    • other logical constraints

As the specifications have been around since 2001, the only suitable syntax to base many of the elements on was the Eiffel programming language. To the consternation of some, this has remained the semantic model underpinning. In practice, it has proved to be extremely useful, for a number of reasons. Firstly, Eiffel implements everything in UML, plus it has DbC elements fully integrated. It can be compiled straight into software as a means of validation of models. On the downside, no direct mechanism for export of Eiffel classes to say XML or other documentary form that could be used in the specifications has been developed. Nor is the UML diagramming mode suitable for publishing available in the Eiffel tooling. Also, Eiffel has not turned out to be a mainstream language, and thus as a formal primary expression of the models will be unsuitable into the future unless a reliable XMI export can be developed. Lastly, it uses its own first-order predicate logic language for contract statements, which is not the same as OCL used in UML (it has to be said that OCL is both uglier, and less powerful, but there seems little doubt that it will be the standard of the future).

It also turned out that the use of DbC features in the specifications is of direct benefit to programmers, and is in fact indispensible, because a significant proportion of the model semantics are expressed in these elements, and are used directly in the programming activity.

UML Representation

There is a UML form of the specification online for Release 1.0.1 (viewable form; tool-based machine-readable format), from the MagicDraw 9.0 tool. This tool required a lot of work to its publisher output to generate the form found on the website today - the MagicDraw tool and many others strangely generate as their textual output a more or less simple recursive visit map of static models, which is nearly useless to developers, because it contains reverse links and a lot of useless link label and graphical junk, and does not associate class invariants, pre- or post-conditions properly with classes. MagicDraw 9.x did generate XMI 1.0, which in theory should be the interoperable format, but it was buggy, and has not proved usable. However, the main problem here was that in moving to version 10 of MagicDraw, the vendor changed the publication engine, making obsolete the XSLT publishing converted developed within openEHR, and yet offering a similarly poor publishing output.

Additional note: An experiment using the MagicDraw models to generate code etc using Eclipse-based tools is described at the page "Experimental generation of code and documentation from UML" //Erik Sundvall

Reference Model Checking for Archetypes

Some implementation work has been done on an 'RM checker' for archetypes. In the reference parser (used in the Ocean Archetype Editor and the openEHR ADL Workbench), a custom class model (Eiffel classes) and associated dADL expression of the openEHR RM are initially being used. This approach was taken in the interests of expediency, due to both the non-availability and complexity of any XMI 2.x expression of the reference model. The 'basic meta-model' used here correctly expresses the semantics needed to perform archetype validation, and is currently in use in the latest specialisation build of the ADL Workbench. It has the advantage of being easily readable and modifiable in a normal text editor; here is an example of the text:

 	--
	--------------------- rm.data_types.quantity ------------------
	--

	["DV_INTERVAL"] = <
		name = <"DV_INTERVAL">
		ancestors = </primitive_types["Interval"], /class_definitions["DATA_VALUE"]>
		is_generic = <True>
		generic_parameters = <
			["T"] = <
				name = <"T">
				conforms_to_type = </class_definitions["DV_ORDERED"]>
			>
		>
		properties = <
			["lower"] = (BMM_SINGLE_PROPERTY_OPEN) <
				name = <"lower">
				type = </class_definitions["DV_INTERVAL"]/generic_parameters["T"]>
			>
			["upper"] = (BMM_SINGLE_PROPERTY_OPEN) <
				name = <"upper">
				type = </class_definitions["DV_INTERVAL"]/generic_parameters["T"]>
			>
		>
	>

	["REFERENCE_RANGE"] = <...>

	["DV_ORDERED"] = <
		name = <"DV_ORDERED">
		is_abstract = <True>
		ancestors = </primitive_types["Ordered"], /class_definitions["DATA_VALUE"]>
		properties = <
			["normal_status"] = (BMM_SINGLE_PROPERTY) <
				name = <"normal_status">
				type = </class_definitions["CODE_PHRASE"]>
			>
			["normal_range"] = (BMM_GENERIC_PROPERTY) <
				name = <"normal_range">
				type = <
					root_type = </class_definitions["DV_INTERVAL"]>
					generic_parameters = <
						["T"] = </class_definitions["DV_INTERVAL"]/generic_parameters["T"]>
					>
				>
			>
			["other_reference_ranges"] = (BMM_GENERIC_PROPERTY) <
				name = <"other_reference_ranges">
				type = <
					root_type = </class_definitions["REFERENCE_RANGE"]>
					generic_parameters = <
						["T"] = </class_definitions["REFERENCE_RANGE"]/generic_parameters["T"]>
					>
				>
			>
		>
	>

	["DV_QUANTIFIED"] = <...>

	["DV_ORDINAL"] = <...>

	["DV_AMOUNT"] = <...>

	["DV_ABSOLUTE_QUANTITY"] = <
		name = <"DV_ABSOLUTE_QUANTITY">
		is_abstract = <True>
		ancestors = </class_definitions["DV_QUANTIFIED"], ...>
		properties = <
			["accuracy"] = (BMM_SINGLE_PROPERTY) <
				name = <"accuracy">
				type = </class_definitions["DV_AMOUNT"]>
			>
		>
	>

	["DV_QUANTITY"] = <
		name = <"DV_QUANTITY">
		ancestors = </class_definitions["DV_AMOUNT"], ...>
		properties = <
			["magnitude"] = (BMM_SINGLE_PROPERTY) <
				name = <"magnitude">
				type = </primitive_types["Double"]>
				is_mandatory = <True>
			>
			["units"] = (BMM_SINGLE_PROPERTY) <
				name = <"units">
				type = </primitive_types["String"]>
				is_mandatory = <True>
			>
			["precision"] = (BMM_SINGLE_PROPERTY) <
				name = <"precision">
				type = </primitive_types["Integer"]>
			>
		>
	>

This approach is not intended as a normative or final one in openEHR of course, and doesn't meet the utimate goal of having a single source machine-readable reference model, but it does provide an interim solution that is maintainable (given that the openEHR RM is now very stable), and provides the function of RM checking and visualisation for archetypes and templates (e.g. differential and flat view of archetypes with RM visualiation in the ADL Workbench).

Both the basic meta-model class model and the dADL textual expression of the openEHR RM can be easily modified to be compatible with a more long-term solution. One approach may be to keep such a simplified, readable expression available, but to either validate it or generate it from an XMI file, e.g. using an XSLT script. A problem of relying totally on an XMI file representation is that a) XMI files are complex to the point of impenetrability and b) only editable by UML tools, and it is not currently clear whether all UML tools would generate the same XMI, or even what the transform between an on-screen UML diagram (with numerous hidden elements) and the XMI is for any particular tool. In other words, the quality of an actual XMI file generated by a tool is difficult to ascertain.

The Future for openEHR