Issue Details (XML | Word | Printable)

Key: SPEC-260
Type: Change Request Change Request
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Reporter: Thomas Beale
Votes: 1
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Specification

Correct the regex published for the ARCHETYPE_ID type

Created: 03/Mar/08 08:44 AM   Updated: 31/Dec/08 10:06 AM  Due: 22/Aug/08
Component/s: openehr.rm.support
Affects Version/s: None
Fix Version/s: Release 1.0.2

Time Tracking:
Not Specified

 Raise CR   Analysis   
Raised By: Eric Browne, John Arnett and Peter Gummer
Change Description:
The proposed solution is as below, and has the following features:
- the characters #, ( and ) are not allowed
- no part (i.e. any section between '-' or '.' delimiters) can be only 1 character long
- no name part can start with a digit
- allows upper case in the alphabetic parts of the id
- limiting the version part of the id to numbers (i.e. no letters)
- not allowing a leading '0' in the version identifier


The grammar published in the Support IM becomes:

# -------- production rules --------
archetype_id: qualified_rm_entity '.' domain_concept '.' version_id
qualified_rm_entity: rm_originator '-' rm_name '-' rm_entity
rm_originator: V_ALPHANUMERIC_NAME
rm_name: V_ALPHANUMERIC_NAME
rm_entity: V_ALPHANUMERIC_NAME
domain_concept: concept_name { '-' specialisation }*
concept_name: V_ALPHANUMERIC_NAME
specialisation: V_ALPHANUMERIC_NAME
version_id: 'v' V_NONZERO_DIGIT [ V_NUMBER ]
# -------- lexical patterns --------
V_ALPHANUMERIC_NAME: [a-zA-Z][a-zA-Z0-9_]+
V_NONZERO_DIGIT: [1-9]
V_NUMBER: [0-9]+

The PERL regular expression equivalent of the above is as follows:
[a-zA-Z]\w+(-[a-zA-Z]\w+){2}\.[a-zA-Z]\w+(-[a-zA-Z]\w+)*\.v[1-9]\d*

The classic regular expression equivalent of this is generated from the above with the following substitutions:
\w -> [a-zA-Z0-9_]
\d -> [0-9]


The explanatory text is improved as follows:
- move explanatory material next to grammar up to a dedicated subsection under section 4.2.2 Composite Identifiers;
- improve the text under section 4.2.2 generally
- add a warning that the non-conforming .v1draft form of the identifier may still need to be supported.
Impact Analysis:
The impact should be minimal, since the grammar and regex changes above conform to all known archetypes, other than deprecated ones with '.v1draft' at the end. A warning statement in the spec has been made about these identifiers, and a separate guideline will be published on openEHR regarding fixing this situation.
Analyst: Eric Browne, Peter Gummer and Thomas Beale


 Description  « Hide
There is currently a problem with the published regex for the ARCHETYPE_ID class in the openEHR support information model. It does not deal correctly with case, version ids, or special characters. In addition, the explanatroy text in section 4.2.2. does not correspond to the current reality of openEHR identifiers.

 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Eric Browne added a comment - 10/Jul/08 06:45 PM

1. The spec should reflect the requirements.
There may be implications for existing archetypes if the specs are tightened, but only if and when implementations are tightened accordingly.

2. Requirement is to have single part +ive integer version number. There should be a "version" function to return the numeric component of the "version_id".

3. there is a problem with "specialisation" function for multi-specialised archetypes. What happens to the '-'?
e.g. if we have
  "openehr-composition-SECTION.physical_examination-postnatal-dermatology.v1" , then the specialisation would be :
   "postnatal-dermatology" and not "-postnatal-dermatology" , nor "-dermatology". This is probably not the intention.
Suggest this be addressed in parallel with changes to ??? to support specialisation.

4. Need statement about case-sensitivity.

5. none of the regex patterns in SPEC-260 match the Support IM Rev 1.6.0. In particular the following need addressing:-

  a. support for uppercase [a-z][A-Z]
  b. correct handling of '_'.
  c. correct handling of zero to many (*) and one to many (+).
  d. version_id should not allow "v0". Therefore "version_id" regex should be "v[1-9][0-9]*" or possibly "[vV][1-9][0-9]*"
  e. handling of non-alphanumerics ()/%$#& in the EBNF - what are requirements?
  f. V_NAME vs NAME and V_NUMBER vs NUMBER in EBNF.NOTES

Thomas Beale added a comment - 18/Jul/08 10:50 PM
Accepted by 6 members of ARB: TB, RC, TC, SH, EB, DK

Thomas Beale added a comment - 20/Jul/08 12:26 PM
First attempt at an improved outcome based on all analysis so far.

Thomas Beale added a comment - 20/Jul/08 12:42 PM
Add in Eric B's non-zero version id point

Thomas Beale added a comment - 20/Jul/08 01:30 PM
Added PERL regex form.

Thomas Beale added a comment - 24/Jul/08 12:26 PM
Further improvements on regex and grammar

Thomas Beale added a comment - 29/Jul/08 05:21 PM
Update version id matching part to match a single part number only

Thomas Beale added a comment - 15/Aug/08 03:07 PM
Update change description to reflect recent discussions with ARB.

Thomas Beale added a comment - 14/Sep/08 10:32 PM
Reduce problem statement to essentials relating to the specifications rather than implementations of the spec.

Thomas Beale added a comment - 16/Sep/08 06:28 PM
Approved by JA, RC, SH, DK, TB, EB

Thomas Beale added a comment - 16/Sep/08 06:29 PM
Passed with the votes of RC, SH, JA, EB, TB