World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Graham 16-May-2009 [3689] | Ok, let me rephrase that .. sure it's possible, but I can imagine it would be quite complicated |
Maxim 16-May-2009 [3690x2] | now was that a question of the "can you give me the solution" kind? |
actually it can be done quite simply... depends on the headers themselves... | |
Graham 16-May-2009 [3692] | It's a little complicated because the headers can have spaces in them. |
Maxim 16-May-2009 [3693x2] | spaces add no complication to the system, as long as the headers can be identified without doubt. |
so the rule is : headers start on new line, stop at first ":" all the rest is content? | |
Graham 16-May-2009 [3695] | now if you have a rule copy text [ to "a:" | to "b:" .... ] but if b: occurs before a: in the text, then you will include a header in copied text |
Maxim 16-May-2009 [3696] | forget to and thru... they are not proper parsing. |
Graham 16-May-2009 [3697] | yes, headers start on a newline and terminate in ":" |
Maxim 16-May-2009 [3698] | and there can be no ":" within the content? |
Graham 16-May-2009 [3699x2] | No, there can be a ":" in the content |
but you know what the headers are ... so that's not a big problem. | |
Maxim 16-May-2009 [3701x2] | ok, so they are explicit... then its very easy. |
can you give the name of some the headers... or an example.... so far it looks like a really simple rule to me. | |
Graham 16-May-2009 [3703] | eg. "social history:" |
Maxim 16-May-2009 [3704x2] | and you want the output in neat blocks I guess. |
give me 1 minute | |
Graham 16-May-2009 [3706x3] | so I guess we can masks for each possible header |
^/social history: | |
or apply the rule recursively until it is false | |
Maxim 16-May-2009 [3709] | I can assume it starts at a header? |
Graham 16-May-2009 [3710x2] | might be leading newlines |
or white spaces | |
Maxim 16-May-2009 [3712] | ok, but no content or stray letters? |
Graham 16-May-2009 [3713x2] | shouldn't be yet. |
So, I am trying to create an object from a semi structured document where the object elements are in any order or missing. | |
Maxim 16-May-2009 [3715x3] | almost done... |
ok, so we replace the spaces in the headers by "-" and create an object out of all the code... | |
all the content... rather | |
Graham 16-May-2009 [3718] | I guess I can do it without using parse .. just replace all the headers with a mark, that allows me to split off all the sections, and then i can match the sections with all the section headers. |
Maxim 16-May-2009 [3719] | I'm almost done... I like these little parse tests.. It keeps my mind sharp on using parse ;-) |
Graham 16-May-2009 [3720] | But I don't need parse! :) |
Steeve 16-May-2009 [3721] | are you asleep ? :-) |
Maxim 16-May-2009 [3722] | its working but its skipping the first tag for some reason. |
Graham 16-May-2009 [3723] | Huh? just dozing ... |
Maxim 16-May-2009 [3724x2] | aaahh there is no newline on the start of the text hehehe |
graham, obviously the simplest solution is to read/lines. | |
Graham 16-May-2009 [3726] | read/lines doesn't work on text in memory AFAIK |
Maxim 16-May-2009 [3727] | and just see if the line starts with one of the headers. |
Steeve 16-May-2009 [3728] | what's the content look like ? Can't you just post an example Graham ? |
Maxim 16-May-2009 [3729] | parse text "^/" |
Graham 16-May-2009 [3730x2] | CC: Patient complains of sore throat. HPI: ONSET: Sudden, TIMING: Constant, DURATION: 3 days INTENSITY: Moderate, QUALITY: Burning, MODIFYING FACTORS: head position CURRENT MEDICATIONS: TYLENOL W/ CODEINE NO. 3 300MG;30MG 1-2 po q 4-6 hrs prn "pain" cyclobenzaprine Oral Tablet 10 MG 1 tab po TID prn "muscle spasm" MEDICAL HISTORY: Rheumatic heart disease, unspec. 391.9 Eczema, atopic dermatitis 691.8 dyslipidemia ALLERGIES: Penicillin - allergy: Allergy Penicillin - allergy: Allergy Penicillin - anaphylactic reaction lovastatin - allergy: allergic macrodantin - 1 po BID SURGERIES: HOSPITALIZATIONS: FAMILY HISTORY: SOCIAL HISTORY: ROS: VITALS: EXAMINATION: General: Appears non-toxic HEENT: TONSILS hypertrophic, and erythematous. MOUTH buccal mucosa, moist. PHARYNX indurated, and angry. NOSE turbinates, with no obstuction. Neck: NECK Supple, with no lymphadenopathy, thyromegaly, or masses. CVS: HEART RRR s M Chest: ANTERIOR LUNGS clear bilat ASSESSMENTS: 391.9 Rheumatic heart disease, unspec. TREATMENT: PROCEDURES: IMMUNIZATIONS: IMAGING: LABORATORY: EDUCATION: None. REFERRALS: Non contributory. FOLLOWUP: SUPERBILL: |
That was sent to me today as an example | |
Steeve 16-May-2009 [3732] | Hmm... |
Maxim 16-May-2009 [3733x3] | implementing later solution... this is easier |
here you go :-) data: {CC: Patient complains of sore throat. HPI: ONSET: Sudden, TIMING: Constant, DURATION: 3 days INTENSITY: Moderate, QUALITY: Burning, MODIFYING FACTORS: head position CURRENT MEDICATIONS: TYLENOL W/ CODEINE NO. 3 300MG;30MG 1-2 po q 4-6 hrs prn "pain" cyclobenzaprine Oral Tablet 10 MG 1 tab po TID prn "muscle spasm" MEDICAL HISTORY: Rheumatic heart disease, unspec. 391.9 Eczema, atopic dermatitis 691.8 dyslipidemia ALLERGIES: Penicillin - allergy: Allergy Penicillin - allergy: Allergy Penicillin - anaphylactic reaction lovastatin - allergy: allergic macrodantin - 1 po BID SURGERIES: } data: parse/all data "^/" header-lbl: ["CC" | "HPI" | "ONSET" | "INTENSITY" |"CURRENT MEDICATIONS" | "MEDICAL HISTORY" | "ALLERGIES" | "SURGERIES"] spec: [] foreach line data [ unless parse/all line [ copy hdr [header-lbl ":"] here: ( append spec to-set-word head remove back tail replace/all hdr " " "-" append spec copy/part here tail line ) ][ if string? item: last spec [ append item line ] ] ] probe context spec | |
ok for you? | |
Steeve 16-May-2009 [3736] | Assuming SRC: contains the source text, it seems to work too: header-char: complement charset "^/:" EOL2: rejoin [newline newline] parse/all src [ some [ some [pos: #" " (change pos #"-") | header-char] #":" pos: newline (change/part pos " {" 1) [to EOL2 | to end] pos: (change pos "} ") skip skip ] ] probe construct to block! src |
Graham 16-May-2009 [3737x2] | Yes ... but I'm going to have to study Steeve's |
to see why it doesn't work yet | |
older newer | first last |