Null Flavours and Boolean data in openEHR

The Use of Null Flavor in openEHR

In the openEHR Reference Model, the low-level class ELEMENT has attributes 'value' and 'null_flavor'. The latter attribute is taken from HL7 (although used in a different way) and is used to mark a 'lack of data'. Using this attribute in openEHR was inspired by a) the need to do something about marking missing data in health information and b) the use of 'data quality markers' in SCADA control systems which show on the screen when a measured value from the field is out of date or wrong due to technical failure to obtain the current value. In the development of openEHR it was thought that some kind of data quality marker should be available for a similar reason: to indicate technical incapacity to obtain data.

The Problem of 'Boolean' or two-valued data

A general design problem in health information is when to use Boolean data values. There are many situations where a naive analysis might indicate to use a Boolean, i.e. a DV_BOOLEAN in openEHR-speak for a data field such as gender, or as the response to a question like 'have you eaten in the last 12 hours?'. In both cases, the possible values are greater than simply yes/no or some equivalent. Gender could typically have values from a small set of codes like:

male
female
unable to determine
intersex
...

Similarly, yes/no questions in A&E might not be answered due to the patient being unconscious - which is a 'normal' happening in A&E. A 'don't know' answer might be prefectly sensible for many questions asked to patients.

In the HL7v3 modelling approach, Null Flavour is used to indicate 'missing' data, such as to represent situations like 'asked but not answered'.

The openEHR Approach

In openEHR, we see that when a physician asks a question and it is not answered - e.g. the patient is dazed or becomes unconscious - as being a normal medical situation. There is no technical incapacity of the physician to obtain information - he or she is in effect making a normal observation. So in modelling the information obtained in such situations, we should ensure that value set for questions like 'have you eaten in the last 12 hours' should include yes/no/don't know, and possibly also things like maybe/most likely/unlikely etc. In an A&E (ED) situation, most likely the responses to any question might include no answer, due to unconsciousness.

In general, the value set should include values for any possible patient response - the data then correctly show that the patient was asked, but responded with some kind of 'don't know' or did not respond at all. Situations where the information could technically not be obtained, e.g. physician was talking to patient using an internet chat tool and the communication dropped out, or a response was techically impossible for some other reason, e.g. faulty equipment, should be marked with a null flavour. In general, null flavour is used sparingly in openEHR, and is not used for representing typical (if not necessarily common) clinical events that can be observed perfectly well by the clinician.

The openEHR Null Flavours are currently (with HL7 mappings):

openEHR code	Rubric	Description	HL7_NullFlavor
271	"no information"	No information provided; nothing can be inferred as to the reason why, including whether there might be a possible applicable value or not.	NI
253	"unknown"	A possible value exists but is not provided.	UNK
272	"masked"	The value has not been provided due to privacy settings.	MSK
273	"not applicable"	No valid value exists for this data item.	NA

What this means for Archetyping

The implications of the above approach for templates are that archetypes should fully model the range of responses for questions and other data elements that may seem to initially be Boolean in nature.

Other approaches

This analysis is the only possble view of affairs, but it does ake care of the need to know what the patient or clinician said, even if it was not definitive. Complicated null flavour approaches tend to mix up such situations with the situation where data were unavailable for a techincal reason.

HL7 Approach

	Code mnemonic	Rubric	Description	**openEHR equivalent**
1	NI	no information	The value is exceptional (missing, incomplete, improper). No information as to the reason for being an exceptional value is provided. This is the most general exceptional value. It is also the default exceptional value.	(same in openEHR)
2	INV	invalid	The value as represented in the instance is not an element in the constrained value domain of a variable.	Q: This seems very similar to OTH. A: OTH is the most common specialisation of this, but is not the only case of this
3	OTH	other	The actual value is not an element in the constrained value domain of a variable. (e.g., concept not provided by required code system).	Q: This seems to be saying that a value has been recorded in violation of the model? A: No, the converse: a value has not been recorded because the value is in violation of the model. Usual cases are : the model says it's an integer, but the correct value is not an integer, nor should it be rounded (i.e. a modeling error) The model supplied a list of enumerated values, and the actual value is not in the enumeration - maybe either a modeling error or a problem in terminologies such as snomed
4	NINF	negative infinity	Negative infinity of numbers.	Modelled in Interval<T> and DV_INTERVAL<T> classes. Note: this concept has nothing to do with null flavours or missing data. Respectfully, I disagree: you have a special boolean flag that says that the data is missing, and assigns an interpretation of that fact. I'm not saying that the HL7 nullFlavor is the right way to model it, just pointing out that it is to do with missing data
4	PINF	positive infinity	Positive infinity of numbers.	Modelled in Interval<T> and DV_INTERVAL<T> classes. Note: this concept has nothing to do with null flavours or missing data. ditto above
3	UNC	unencoded	No attempt has been made to encode the information correctly but the raw source information is represented (usually in originalText).	Not currently handled explicitly in openEHR. Q: This content can appear at any level of granularity, and usually appears at Composition level. How does HL7 handle this? A: in this case, original text would be a reference to a portion of the text in the composition. It doesn't really add much: the difference between no value, an UNC and an originalText and no value and an originalText is... not much. The vanishingly small use case for this is to distinguish between "I didn't try coding this in snomed" (1st case) and "I tried to code this in snomed, but couldn't" (2nd case) Q2: if one were to use this code wouldn't one then need a place to say more, e.g. which terminology was attempted etc? A2: yes, you do - actually a valueset, really, which may cross terminologies.
	DER	derived	An actual value may exist, but it must be derived from the provided information (usually an expression is provided directly).	This can happen anywhere in the data; openEHR doesn't see it as missing data / null data. The archetype shows what is derived and what is not, e.g. Apgar total from input values. Maybe; but in some cases the value is derived from other values (which may not yet be known) using an expression, and in other cases it's given directly. I guess that in OpenEHR you would just model that directly, but it seems to me to be confusing the process of getting the value with the semantics of the values themselves.
2	UNK	unknown	A proper value is applicable, but not known.	(same in openEHR)
3	ASKU	asked but unknown	Information was sought but not found (e.g., patient was asked but didn't know)	In openEHR, this is a legitimate response of the patient, and is not a case of missing data. See discussion above. I really don't understand that comment; the patient responded that they didn't know, so the data is missing
4	NAV	temporarily unavailable	Information is not available at this time but it is expected that it will be available later.	Q: What information systems can predict te future and reliably set a value like this? What use would it serve? If the data are in fact available later, they will be recorded. Well, the answer is that there is some; like some of the other nullFlavors, the use case is small, but still valid. It's definitely questionnaire related. I don't think it matters to an information system, but a user might make use of this subtlety somehow
3	QS	sufficient quantity	The specific quantity is not known, but is known to be non-zero and is not specified because it makes up the bulk of the material.'Add 10mg of ingredient X, 50mg of ingredient Y, and sufficient quantity of water to 100mL.' The null flavor would be used to express the quantity of water.	In openEHR, this is not missing data. It just happens to be quantitative data that is expressed in narrative form. Modelled in archetypes by allowing a narrative alternative to a quantity. Fine, but narrative is not the same. There are systems that can figure out how to profitable use QS as a notion. Narrative is much wider.
3	NASK	not asked	This information has not been sought (e.g., patient was not asked)	Q: is this a missing data concept, when by definition no data is expected to be there? In any case can only apply in specific questionnaire-like situations. Should be part of design of archetypes / templates for questionnaires. My response is the same as above; the patient wasn't asked. When does this make a difference? Probably not to a system, but maybe to a user. But why should I have to worry about this for every item in a template? generically, I may not ask, or the patient may refuse to answer, any question. Let's just build that into the infrastructure; it seems to me that these are reasonable things to have in openEHR nullFlavor, even though they may not be widely used.
3	TRC	trace	The content is greater than zero, but too small to be quantified.	This is a laboratory concept, and has nothing to do with missing data; labs routinely specify some amounts as 'trace'.The value not in itself computable, but is usually part of an ordered set, e.g. trace, +, ++ etc. Usually modelled in openEHR using DV_ORDINALs, e.g. for urinalysis. Yes, no, maybe. In the lab, you may report a value, or TRC. TRC is used outside ordinals - though not as often, I agree
2	MSK	masked	There is information on this item available but it has not been provided by the sender due to security, privacy or other reasons. There may be an alternate mechanism for gaining access to this information.Note: using this null flavor does provide information that may be a breach of confidentiality, even though no detail data is provided. Its primary purpose is for those circumstances where it is necessary to inform the receiver that the information does exist without providing any detail.	(same in openEHR)
2	NA	not applicable	No proper value is applicable in this context (e.g., last menstrual period for a male).	(same in openEHR)

This is the HL7 nullFlavor table as of Dec 2007. (Grahame)

It would seem that some of these are not appropriate in the context of openEHR. I would think that OpenEHR has handled the concepts of OTH, NINF and PINF differently, and needn't support them.

I'd be interested to know how OpenEHR models the concepts of UNC, DER, QS and TRC