Tuple Proposal

This page makes two different proposals for improving ADL in a generic way, that would remove the 'custom syntax' used to efficiently represent constraints on Coded text, Quantity and Ordinals.

Coded text constraints - 'code' as a primitive type

In openEHR and CIMI we have debated the question of the so-called 'special' types in ADL. I think that the issues are not the same for all of them - i just depends on how we see them. I'll just talk about one here - the coded text one.

In 'standard' ADL you can write:

        CODED_TEXT matches {
            terminology_id matches {
                TERMINOLOGY_ID matches {
                    value matches {"local"}
                }
            }
            code matches {"at0111", "at0012", "at0013", "at0014", "at0015"}
        }

In the current custom syntax, this becomes:

        CODED_TEXT matches {[local: at0111, at0012, at0013, at0014, at0015]}

Reducing RM dependence, increasing archetype re-usability

This is not just a question of more or less efficient syntax. In the first one, you are constraining an explicit RM object of type CODED_TEXT, with some specific internal structure, coming from its RM. If I want to do the same for openEHR, the ADL code is different, even though the class definitions are very similar.

In the second example, the RM type CODED_TEXT is assumed, but no explicit RM definition of internal structure is assumed; instead, AOM has an internal model of that which has a 'terminology id' concept, and a 'code' concept. Clearly this can be mapped to the relevant bits of CODED_TEXT, DV_CODED_TEXT, CS, etc.

So the point here is that an archetype fragment containing a coded term constraint (probably the single most common constraint there is) can be represented in a semi-RM-independent way, increasing the direct re-usability (= minimising conversion effort) of the archetype text in e.g. 13606, HL7, other situations.

Put another way, if we build a minimal 'coded term' concept into the AOM, as we do today in openEHR, we are helping to reduce the Data type mismatch problem in e-health (a bit).

Enabling tools to know about coded elements

In the first version above, the class CODED_TEXT is like any other non-terminal type, and an archetype tool that doesn't have special code in it won't know it is really what we health informatics people know as a 'coded term', one of our most basic data types. I think this is important because of the ubiquity of coded terms in data and therefore archetypes - tools that can't recognise them obfuscate what should be clear.

Conclusions

One conclusion that could be drawn from the above is that we should simply consider a 'coded term' as a kind of leaf data type in ADL and AOM. We already have multi-part dates, times and URI as leaf types, which provide the same 2 advantages above for those types: why not for terms? If we did accept that, I am only talking about a minimal model, e.g. in ADL a coded term would be a lexical element of the form

[terminology_id::code]

or

[terminology_id(version)::code]

or

[terminology_id::code_phrase]

some more sophisticated variant could be devised to allow constraining of the term_id if that were needed.

In the above, code could always be

code|text text text|

If we included that in the leaf types, let's call it TERMINOLOGY_CODE, then it becomes trivial to create an AOM leaf constrainer type C_TERMINOLOGY_CODE, just like C_DATE, C_DURATION etc.

We are in the semantically enabled age; health may need to be the domain that makes terminology codes as normal as Integers or Booleans.

A structural model of this would be something like:

class TERMINOLOGY_CODE
    value: String {1}

    terminology_id: String {1}          -- extract terminology id
    terminology_release: String {0..1}  -- extract terminology release, if available in 'value'
    code_string: String {1}             -- extract code / code string
    text: String {0..1}                 -- extract text of code, if available in 'value'
end

This is essentially the same approach we use for representing the ISO8601-based date/time types in openEHR - a String 'value' and a collection of extractor routines.

A new ADL Syntax proposal: Tuple constraints

The second kind of 'special' type in ADL we use in openEHR is the C_DV_QUANTITY constraint. To revisit the underlying problem:

In realistic data, it is not uncommon to need to constrain object properties in a covarying way. A simple example is the need to state range constraints on a temperature, represented as a Quantity type, for both Centigrade and Fahrenheit scales. The default way to do this in ADL is:

value matches {
    QUANTITY [at0014] matches {
        property matches {[at4]} – bind to SCT temperature
        units matches {[at5]} – bind to SCT/UCUM 'deg C'
        magnitude matches {|32.0..212.0|}
    }
    QUANTITY [at0015] matches {
        property matches {[at4]} – bind to SCT temperature
        units matches {[at6]} – bind to SCT/UCUM 'deg F'
        magnitude matches {|0.0..100.0|}
    }
}

What we logically want to do is to state a single constraint on a QUANTITY that sets the magnitude range constraint dependent on the units constraint. Note that we are forced to include at-codes for the two Quantity nodes, to satisfy the path uniqueness rule.

This could be done using rules of the form:

.../value/units = “deg F” implies magnitude matches {|32.0..212.0|}
.../value/units = “deg C” implies magnitude matches {|0.0..100.0|}

However, this seems obscure for what is logically a very simple kind of constraint.

In openEHR we use the C_DV_QUANTITY type to provide a better structured constraint for this. But let's ignore that for now.

A solution I thought of and dismissed some years ago is as follows:

An extension to the ADL syntax would allow tuple-based constraints of the following form:

value matches {
    QUANTITY matches {
        property matches {[at4]} – at4 bind-> SCT 'temperature'
        [units, magnitude] matches {
            [{[at5]}, {|32.0..212.0|}] , – at5 bind-> UCUM or SCT ^oF
            [{[at6]}, {|0.0..100.0|}]   – at6 bind-> UCUM or SCT ^oC
        }
    }
}

In the above, the {} surrounding each leaf level constraint are needed because although such constraints are typically atomic, as above, they may also take other standard ADL forms such as a list of strings, list of integers etc. In the latter case, the ',' characters from such lists will be conflated with the ',' separator of the distinct constraints in the tuple. Use of {} is also logically justified: each such entity is indeed a 'constraint' in the ADL sense, and all constraints are delimited by {}.

The above defines constraints on units and magnitude together, as tuples like [{“deg F”}, {|32.0..212.0|}] . It also removes the need for at-codes, since now we just have one object. This would work for tuples of 3 or more co-varying properties, and is mathematically clean. It would be some work for compilers to digest it, but... that's unavoidable.

If we look at the ORDINAL data type constraint in the same light. First, doing a typical ordinal constraint (a scale of +, ++, +++) with just standard ADL:

        ordinal_attr matches {
            ORDINAL [at0004] matches {
                value matches {|0|}
                symbol matches {
                    CODED_TEXT matches {
                        terminology_id matches {"local"}
                        code matches {"at0001"}        -- +
                    }
                }
            }
            ORDINAL [at0005] matches {
                value matches {|1|}
                symbol matches {
                    CODED_TEXT matches {
                        terminology_id matches {"local"}
                        code matches {"at0002"}        -- ++
                    }
                }
            }
            ORDINAL [at0006] matches {
                value matches {|2|}
                symbol matches {
                    CODED_TEXT matches {
                       terminology_id matches {"local"}
                       code matches {"at0003"}        -- +++
                    }
                }
            }
        }

Today, in ADL we use the more efficient syntax, which is not only smaller, but removes the need for the three at-codes above:

        ordinal_attr matches {
            0|[local::at0001],     -- +
            1|[local::at0002],     -- ++
            2|[local::at0003];     -- +++
            0     -- assumed value
        }

This hides the ORDINAL type altogether, and has the same advantages as the CODED_TEXT argument I described above. On the other hand, the above syntax is for Ordinals only, and requires something special in the AOM to support it. But we could also solve the problem using the tuple approach above:

        ordinal_attr matches {
            ORDINAL matches {
               symbol, value matches {
                    {CODED_TEXT matches {code matches {"at0001"} terminology_id matches {"local"}}, 0} -- +
                    {CODED_TEXT matches {code matches {"at0002"} terminology_id matches {"local"}}, 1} -- ++
                    {CODED_TEXT matches {code matches {"at0003"} terminology_id matches {"local"}}, 2} -- +++
                }
            }

Now, this is still ugly, but if we adopted the idea of the coded term type as a primitive (like Date, etc), then we could do this:

        ordinal_attr matches {
            ORDINAL matches {
               [symbol, value] matches {
                    [{[at1]}, {0}], -- +
                    [{[at2]}, {1}], -- ++
                    [{[at3]}, {2}] -- +++
                }
            }

The above seems very clear, and relies only on the AOM supporting an 'inbuilt terminology code' data type, and tuple constraints, but any special RM type support can now be removed. This Tuple-based solution is more generic and would appear to solve more problems at once than the current custom syntax, which is dependent upon RM types. Coupled with an inbuilt primitive code type, it makes for very concise ADL structures.

Now, one question in all this, is: who cares about ADL, all we care about is the AOM. Notice that the Tuple-structured ADL clearly does imply a different kind of AOM structure, and that structure will then appear in other serialisations that you may want to use, including XML, JSON etc. So here I am really just using ADL for clarity of explanation, not definitionally (one of the main reasons we need abstract syntaxes like ADL in fact).