r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

BrianH
4-Nov-2005
[726]
For instance, how many fields does that data you posted have? Are 
they seperated by | or is it a length thing?
Graham
4-Nov-2005
[727]
I don't know .. I am just looking at sample data and trying to reverse 
engineer the format as I don't have time to read 100s of pages of 
specs.
BrianH
4-Nov-2005
[728]
What does the ^ mean in context?
Graham
4-Nov-2005
[729x2]
but each OBX record is one blood result
It seems to be a delimiter to divide a record into parts
BrianH
4-Nov-2005
[731]
Really? By the format it looks like they are using | for that.
Graham
4-Nov-2005
[732x3]
so, | separates fields, and ^ sub divides a field
OBR|1|3CHI|05-556701-MHA-0^VDL|MHA^MASTER HAEM PANEL^L|R|200511021006|200511021006|""|""|||||200511021006||10761^CHIU&G|||10761^CHIU&G|10761^CHIU&G|3CHI^chiu|200511021152|||F
OBX|1|ST|Hb^ Hb:^L||135|g/L|120 - 155|N|||F
OBX|2|ST|pV^ PCV:^L||0.397||0.340 - 0.470|N|||F
OBX|3|ST|mV^ MCV:^L||95|fL|81 - 97|N|||F
OBX|4|ST|mh^ MCH:^L||32.3|pg|26.5 - 33.0|N|||F
OBX|5|ST|pl^ Platelets:^L||224|x 10*9/L|150 - 450|N|||F
OBX|6|ST|es^ ESR:^L||23|mm/hr|1 - 27|N|||F
OBX|7|ST|wc^ WCC:^L||7.7|x 10*9/L|3.8 - 10.0|N|||F
OBX|8|ST|Nt^ Neutrophils:^L||4.5|x10*9/L|1.9 - 7.1|N|||F
OBX|9|ST|Ly^ Lymphocytes:^L||2.6|x10*9/L|0.6 - 3.6|N|||F
OBX|10|ST|Mo^ Monocytes:^L||0.5|x10*9/L|0.2 - 1.0|N|||F
OBX|11|ST|Eo^ Eosinophils:^L||0.1|x10*9/L|< 0.6|N|||F
OBX|12|ST|Ba^ Basophils:^L||0.05|x10*9/L|0.00 - 0.10|N|||F

OBX|13|FT|bf^Comments^L||COMMENT: RBC parameters normochromic normocytic.|||N|||F
NTE|1|L|CC Drs: MALIK, CHIU.
I've omitted the MSH MSA and PID lines which identify the patient.
BrianH
4-Nov-2005
[735x2]
So the first field specifies the record format, the number of fields 
and such. The other fields are data.
Thanks for that by the way, I'd rather not know.
Graham
4-Nov-2005
[737x2]
no, I think the numbers indicate a sequence
so, there are 13 OBX records for the OBR result.
BrianH
4-Nov-2005
[739]
I mean OBX is the record type, and OBX records have 11 additional 
fields to them.
Graham
4-Nov-2005
[740x3]
Yes, I think so.
I presume ST stands for sub test.
The HL7 org want to move to using XML instead ...
BrianH
4-Nov-2005
[743]
Do you want to do a full rule-based parse here, or will simple parse 
do?
data: read/lines %data
foreach rec data [
    rec: parse/all rec "|"
    switch rec/1 [
        "OBX" [ ... do stuff...
sqlab
4-Nov-2005
[744]
OBX is the segment type
segments are separated by #"^M"
an OBX segment can have up to 24 fields according version 2.4, 
empty fields at the end of an segment need not to be transferred,

fields are delimited by #"|" normally, but all delimiters except 
segment delimiter can be defined for each message. 

fields can be divided by #"^^" into components, components can be 
divided into subcomponents etc.
Graham
4-Nov-2005
[745x5]
I'm working on a full rule based parse.
sqlab, you've been doing this stuff for years.
Ok, my parser is able to get all the data out of all the records 
now in the test result above.
pipe: charset "|" 
nonpipe: complement charset "|"
caret: charset "^^"
non-caret: complement caret
digits: charset [ #"0" - #"9" ]

labsupplier: 
hl7level: 
datetime:
patient:
labno:
labno2: none


nte-rule: [ "NTE" pipe digits pipe 1 skip pipe copy notes to newline 
(append txt notes) ]
oru-rule: [ "ORU" pipe copy labno some digits pipe ]
datetime-rule: [ copy datetime some digits ] 

msh-rule: [ "MSH" pipe some nonpipe pipe copy labsupplier some nonpipe 
pipe some nonpipe pipe copy hl7level 
	some nonpipe 2 skip datetime-rule 2 skip oru-rule thru newline 
]

msa-rule: [ "MSA" pipe 3 skip copy labno2 some digits thru newline 
]

pid-rule: [ "PID" 2 skip some nonpipe pipe some digits pipe some 
nonpipe pipe copy patient some nonpipe 

 2 pipe copy dob some digits pipe copy gender [ #"F" | #"M" ] thru 
 newline 
]
obr-rule: [ "OBR" pipe 1 digits pipe copy drcode some nonpipe
	pipe some nonpipe pipe copy panelcode some nonpipe
	pipe 1 skip pipe copy bleeddate some digits
	pipe copy reportdate some digits 
	pipe 2 skip pipe 2 skip

 5 pipe copy bleeddate some digits 2 pipe copy requestdr some nonpipe	
	3 pipe copy nzcouncilcode some nonpipe thru newline
]

txt: copy ""
cnt: 1

obx-rule: [ "OBX" pipe copy cntr some digits ( 

  if cnt <> to-integer cntr [ print "halted as out of sequence" halt] 
  cnt: cnt + 1 
	) 
	pipe  
		[
			st-rule | 
			ft-rule 
	]
]



ft-rule: [ "FT" pipe some non-caret any caret copy comm some non-caret 
any caret 3 skip 

 copy comments some nonpipe (repend txt [ comm " " comments newline 
 ]) thru newline ]


st-rule: [ "ST" pipe some non-caret any caret copy testtype some 
non-caret  
	any caret 3 skip copy testresult some nonpipe 
	pipe [ pipe | copy units some nonpipe pipe ]
	copy range some nonpipe thru newline

 ( repend txt [ testtype " " testresult " " units " " range newline 
 ] )
]
	
record-rule: [
	( cnt: 1 txt: copy "" )
	msh-rule
	msa-rule
	pid-rule
	obr-rule
	obx-rule
	[ some obx-rule ]
	nte-rule
]

parse read %hl7data.txt record-rule
print [ labsupplier hl7level datetime patient 
labno labno2 dob gender panelcode bleeddate reportdate 
requestdr nzcouncilcode newline txt
]
That's my rough working parser.
sqlab
4-Nov-2005
[750]
I have seen just too many exceptions in real messages from the rules.
So I just parse the message into an internal structure like 

mssg: [MSH [ field1 field2 ..] PID [field1 field2 .. ..]  OBR [.. 
..]  .. ] etc

Then I can access the data either with mssg/OBX/3 for example 
or use set .

I use checking as a optional second step.
Graham
4-Nov-2005
[751x4]
I used to parse HL7 messages differently ... splitting them  into 
fields as well.  But this time I thought I 'd try a rule based approach.
I admit it is likely to be easier to do it your way.
I then I have to map the results so that different laboratory's fields 
can be made equivalent.
And the data can then be codified.
sqlab
4-Nov-2005
[755]
What do you mean with 

different laboratory's fields can be made equivalent. and the data 
can be codified?
Graham
4-Nov-2005
[756]
Rather than storing the HL7 result as free text, to store each sub 
test in a database.
So, a Hb result will be stored as a Hb record.

Another laboratory might call that "haemoglobin", so I need to map 
these two together.
sqlab
4-Nov-2005
[757]
Do you want to do your mapping in the database or with Rebol?
Graham
4-Nov-2005
[758x2]
in the database ...
Here's my new parser using your approach.
hl7msg: make object! [
	msh: []
	msa: []
	pid: []
	obr: []
	obx: []
	nte: []
]

datafile: %hl7data.txt

parse-hl7msg: func [datafile [string!]
	/local segment segbl v
] [
	hl7: make hl7msg []
	trim/head/tail datafile
	append datafile {^/}
	line-rule: [copy segment to "^/" 1 skip (
			segbl: parse/all segment "|"
			either segbl/1 = "OBX" [
				insert/only tail hl7/obx skip segbl 1
			] [
				v: to-word segbl/1
				insert hl7/:v skip segbl 1
			]
		)
	]
	parse/all datafile [ some line-rule ]
	hl7
]

test: parse-hl7msg read datafile
BrianH
4-Nov-2005
[760]
Again, if you are just matching a single character or a fixed string, 
it is better and faster to just match it instead of matching a charset 
of that character. You don't need the caret, non-caret, pipe and 
nonpipe charsets you have above - the strings "^^" and "|" will do 
just as well.
Anton
5-Nov-2005
[761x7]
Graham, I agree with BrianH. It should speed up your parse, and make 
it easier to read because you can use TO and THRU again.
	caret: #"^^"
etc
(and using a character instead of a string will save a tiny bit of 
memory too, I think)
I have a strange issue of my own:
var: 123 parse/all "a" [copy var "b" | (?? var)] ; ---> var: none

var: 123 parse/all "a" [[copy var "b"] | (?? var)] ; ---> var: 123

var: 123 rule: [copy var "b"] parse/all "a" [rule | (?? var)] ; ---> 
var: 123
I want to know what happens when COPY fails to match the input.
In the first case, it modifies VAR, changing it from 123 to NONE.
In the second case, it leaves VAR alone.
Oh! I think I know what's happening !
Yep, understand it now. It's like this:
	var: 1 parse "" [copy var "a" |]
	;== true
	var ;== none
BrianH
5-Nov-2005
[768]
Anton, I used to use a character rather than a string too, because 
of the memory issue. But it turned out to be slower that way. I think 
parse only matches on strings, and single characters have to be converted 
to one-character strings before they can be passed to the matcher. 
At least that would explain the speed discrepancy.
Romano
5-Nov-2005
[769x2]
Anton, for me it is a wrong behaviour of parse.
I also have some doubts about the corretness of this:
var: 123 probe parse/all "" [copy var ""] var; false == 123
sqlab
7-Nov-2005
[771]
Graham: two points of trouble
1; you loose the order of the segments with your approach, 

eg NTE can be at different positions in the message, set-id is not 
always used in the same way.

2; are you sure that the values of different laboratories can be 
compared? 
are they using the same test kits, do they do ring tests ?
JaimeVargas
7-Nov-2005
[772]
I would agree with romano. Anton please reported this inconsistent 
behaviour to RAMBO.
Graham
7-Nov-2005
[773x2]
sqlab, doesn't my last parse account for the possibility of NET segment 
being anywhere in the message ?
currently I only have data from one laboratory - I'll have to contact 
some others to get theirs to see what their messages are like.
sqlab
8-Nov-2005
[775]
Yes, you get the NTEs, but you do not retain the information, which 
OBR,  which OBX  they comment or if the follow even the PID.


That's why I use just one block for all segments retaining the order.