openEHR Terminology

Introduction

The openEHR Terminology is a simple terminology that includes all the terms found in the terminology specification. In the openEHR terminology specification, there are two kinds of vocabularies - code-sets, where the codes stand for themselves (includes ISO 3166 & ISO 639 codes, IANA MME types etc), and term sets, where each term has a numeric code, and has a description that can be translated into multiple languages.

Current Status

The openEHR terminology is being used by the Java project, by the Archetype Editor, and by various other tools.However, different tools use different source files, a situation which we are trying to rectify. The two representations are as follows:

  • The 'java terminology' - the Java project tools now use the terminology files in the knowledge2 SVN repository. These files are designed as follows:
    • structured as 2 files, one for code sets and one for the openEHR term sets
    • each file pair covers one language only - translating the terminology means copying and renaming the files and translating the contents.
  • The 'AE terminology' - Archetype Editor uses a differently structured file , which is based on this XML schema. The characteristics of this file are:
    • one file for all translations
    • includes significant amounts of UI 'terms' specific to the Archetype Editor UI, i.e. the file contents are not limited to the openEHR terminology.

Issues

The currently known content issues with the above situation are:

  • The AE terminology file contains numerous Archetype Editor GUI elements which do not belong in the openEHR Terminology. In its current state, it is not an appropriate file to publish as the openEHR terminology.
  • managing translations in the AE terminology is clumsy, because a) every time a new language is added, the original file has to be modified, b) the file keeps growing in size and c) it is not easy to see if any given translation is complete, because the file needs to be cut up to compare language sections to each other.
  • The codes in the openEHR code set 'media types' are not literally used, because they are just a placeholder for IANA types like 'text/plain' etc, whose real definition is at http://www.iana.org/assignments/media-types/index.html . However, there appears to be no standard file obtainable.
  • Computable files for ISO 639 (language names) should similarly not be provided by openEHR, although it is still not clear where they should come from. The official Library of Congress ISO 639 page is here - however no computable file is provided.
  • Computable files for ISO 3166 (country codes) are actually available here , in TXT and XML format.
  • Unit types: TBD
  • Currently there is no defined process for obtaining a new code

Recommendations

Short Term

Based on the problems above, the following recommendations have been made by an initial analysis group (Rong Chen, Heath Frankel, Sebastian Garde, Thomas Beale):

  • Adopt the java files as the basis for going forward.
  • Adopt the design approach that openEHR will actually need to create its own 'code-set' files for external code-sets in a standard format based on internet available lists and pages from IANA, ISO etc.
    • Define an XML 'adjunct format' for this type of file
    • Remove the IANA codes from the existing 'external_terminologies' file, and create a separate file which is derived from the iana.org link above.
    • Remove the ISO 639 codes from existing 'external_terminologies' file and create a new adjunct format file derived from the above Library of Congress ISO 639-1 page.
    • Remove the ISO 3166 codes existing 'external_terminologies' file and obtain and convert the ISO 3166 file to a separate file in the adjunct format
  • Units: TBD

Obtaining a new code:

TBD

Future

A long term solution most likely involves discussion with IHTSDO in order to determine their coverage of the code sets.

Secondly, we should consider SNOMED CT style representation, which is to say, 3 separate tables as described in the IHTSDO TIG section on RF2 . This approach would enable each separate vocabulary in openEHR to be managed as a small hierarchy of its own. In order to use actual SNOMED CT codes, we would need to obtain an openEHR Snomed Extension .