r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Maxim
16-May-2009
[3727]
and just see if the line starts with one of the headers.
Steeve
16-May-2009
[3728]
what's the content look like ?
Can't you just post an example Graham ?
Maxim
16-May-2009
[3729]
parse text "^/"
Graham
16-May-2009
[3730x2]
CC:
Patient complains of sore throat.

HPI:
ONSET: Sudden, TIMING: Constant, DURATION: 3 days

INTENSITY: Moderate, QUALITY: Burning, MODIFYING FACTORS: head position

CURRENT MEDICATIONS:
TYLENOL W/ CODEINE NO. 3 300MG;30MG 1-2 po q 4-6 hrs prn "pain"
cyclobenzaprine Oral Tablet 10 MG 1 tab po TID prn "muscle spasm"

MEDICAL HISTORY:
Rheumatic heart disease, unspec. 391.9
Eczema, atopic dermatitis 691.8
dyslipidemia

ALLERGIES:
Penicillin - allergy: Allergy
Penicillin - allergy: Allergy
Penicillin - anaphylactic reaction
lovastatin - allergy: allergic
macrodantin - 1 po BID

SURGERIES:


HOSPITALIZATIONS:


FAMILY HISTORY:


SOCIAL HISTORY:


ROS:


VITALS:


EXAMINATION:
General: Appears non-toxic

HEENT: TONSILS hypertrophic, and erythematous. MOUTH buccal mucosa, 
moist. PHARYNX indurated, and angry. NOSE turbinates, with no obstuction.

Neck: NECK Supple, with no lymphadenopathy, thyromegaly, or masses.
CVS: HEART RRR s M
Chest: ANTERIOR LUNGS clear bilat


ASSESSMENTS:
391.9 Rheumatic heart disease, unspec.


TREATMENT:


PROCEDURES:


IMMUNIZATIONS:


IMAGING:


LABORATORY:


EDUCATION:
None.

REFERRALS:
Non contributory.

FOLLOWUP:


SUPERBILL:
That was sent to me today as an example
Steeve
16-May-2009
[3732]
Hmm...
Maxim
16-May-2009
[3733x3]
implementing later solution... this is easier
here you go  :-)


data: {CC:
Patient complains of sore throat.

HPI:
ONSET: Sudden, TIMING: Constant, DURATION: 3 days

INTENSITY: Moderate, QUALITY: Burning, MODIFYING FACTORS: head position

CURRENT MEDICATIONS:
TYLENOL W/ CODEINE NO. 3 300MG;30MG 1-2 po q 4-6 hrs prn "pain"
cyclobenzaprine Oral Tablet 10 MG 1 tab po TID prn "muscle spasm"

MEDICAL HISTORY:
Rheumatic heart disease, unspec. 391.9
Eczema, atopic dermatitis 691.8
dyslipidemia

ALLERGIES:
Penicillin - allergy: Allergy
Penicillin - allergy: Allergy
Penicillin - anaphylactic reaction
lovastatin - allergy: allergic
macrodantin - 1 po BID

SURGERIES:
}

data: parse/all data "^/"


header-lbl: ["CC" | "HPI" | "ONSET" | "INTENSITY" |"CURRENT MEDICATIONS" 
| "MEDICAL HISTORY" | "ALLERGIES" | "SURGERIES"]

spec: []
foreach line data [
	unless parse/all line [
		copy hdr [header-lbl ":"]
		here:
		(

   append spec to-set-word head remove back tail replace/all hdr " " 
   "-"
			append spec copy/part here tail line
		)
	][
	
		if string? item: last spec [
			append item line
		]
	]

]

probe context spec
ok for you?
Steeve
16-May-2009
[3736]
Assuming SRC: contains the source text, it seems to work too:

header-char: complement charset "^/:"
EOL2: rejoin [newline newline]
parse/all src [
	some [
		some [pos: #" " (change pos #"-") | header-char]
		#":" pos: newline (change/part pos " {" 1)
		[to EOL2 | to end] pos: (change pos "} ") skip skip
	]
]
probe construct to block! src
Graham
16-May-2009
[3737x2]
Yes ... but I'm going to have to study Steeve's
to see why it doesn't work yet
Steeve
16-May-2009
[3739]
it will not work if you have CRLF insteed of newlines in the source.
Is that the case ?
Graham
16-May-2009
[3740]
I just copied it from here.
Steeve
16-May-2009
[3741]
i mean for your source data, not for my code
Graham
16-May-2009
[3742]
that's what I meant .. I just copied the source data from here.
Steeve
16-May-2009
[3743x2]
ok, it works for me
i retry
Graham
16-May-2009
[3745x3]
working now.
Actually yours appears to be the better solution because you don't 
specify the headers
and just pick it up from the formmating of the text
Steeve
16-May-2009
[3748]
yep
Graham
16-May-2009
[3749]
well, I'm impressed :)
Steeve
16-May-2009
[3750]
you should not
Graham
16-May-2009
[3751]
sadly I am.
Graham
17-May-2009
[3752]
the parser dies when there is something like "2.5mg" in the text 
wiht invalid decimal error.
Steeve
17-May-2009
[3753x3]
should not, give the data please
There is no reason, the content is enclosed in a string before being 
loaded.
If it fails, it's because the whole grammar has changed
probaly blank lines are inserted in the content (where they should 
not)
Graham
17-May-2009
[3756]
{CC:
This is the presenting complaint.


HPI:
Developed over a few days

CURRENT MEDICATIONS:
METHOTREXATE SODIUM EQ 2.5MG BASE once weekly
METHOTREXATE SODIUM EQ 2.5MG BASE once weekly
Plaquenil 200 mg two daily
Prednisone 5 mg od
Salazopyrin EN 500 mg  two bd with food
Ultram Oral Tablet 50 MG qid prn
}
Steeve
17-May-2009
[3757x4]
ok i test that
at first sight, i can say there is too many blank lines
Right, i added skiping of useless newline.

parse/all src [
	some [
		any newline
		some [pos: #" " (change pos #"-") | header-char]
		#":" pos: newline (change/part pos " {" 1)
		[to EOL2 | to end] pos: (change pos "} ") skip skip
	]
]

Could you figure it ?
Anticipated fails:

- if blanks lines are inserted in the content (because blank lines 
should only used as delimiters between headers).
- if header's names can't be converted to words.
Maxim
17-May-2009
[3761]
afaik... my solution works flawlessly.  we could easily extend the 
header info so it recognises headers without naming them explicitely.
Steeve
17-May-2009
[3762]
In fact i could extend my solution easly to prevent those errors 
and throwing safe errors it the parsing failed.
I takes 5 minutes to do.

But adding such exceptions or other sub-rules is so easy that i don't 
see the interest to prevent those cases.

It's my philosophy when i write parsing rules.

They are so easy to extend, there is no reason to anticape thoses 
cases by guessing what is in the in the mind of the  final user.
Whe have to extend the grammar ? 
Ok, give me 5 minutes.
Graham
17-May-2009
[3763x2]
The thing is that the user can type what they want ... so have to 
be prepared for anything.
All I ask is that they type the headers in correctly.
Steeve
17-May-2009
[3765x2]
I'm not a magician, i can't figure all the cases if the given specifications 
are incompletes.

Everybody has a job to do, it's not mine to work on wrong specifications.
If you can't prevent them to insert blank lines in the content, then 
the Maxim's solution should be used isntead.
With a list of authorized headers.
Graham
17-May-2009
[3767]
It's free text ... no way can I prevent users from doing this.
Steeve
17-May-2009
[3768x2]
So you can't use automatic recognition of unspecified headers. Easy 
to figure.
if headers are not distinguishable from free text, there is no solution
Graham
17-May-2009
[3770]
Not if I use Max's method .. but the headers can be obtained from 
the original object specifications.
Steeve
17-May-2009
[3771]
do so
Maxim
17-May-2009
[3772]
the header-lbl rule in my example could be changed so it matches 
up to the first colon, but then, there is a flaw in that the text 
can also include something that LOOKS like a header and then you 
can have a stray value in the object...


in the original example data you posted... this would be hard to 
tackle...

Penicillin - allergy:
Graham
17-May-2009
[3773x2]
That was my original way of doing things.
I built the rule from the object and then parsed the data .. but 
my way relied on the headers being in the correct order.
Maxim
17-May-2009
[3775]
I started on steeve's course and had similar new-line issues, which 
is why I decided to parse liine by line.
Steeve
17-May-2009
[3776]
can't be the headers be prefixed, it would be so easy to treat...