r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Steeve
16-May-2009
[3739]
it will not work if you have CRLF insteed of newlines in the source.
Is that the case ?
Graham
16-May-2009
[3740]
I just copied it from here.
Steeve
16-May-2009
[3741]
i mean for your source data, not for my code
Graham
16-May-2009
[3742]
that's what I meant .. I just copied the source data from here.
Steeve
16-May-2009
[3743x2]
ok, it works for me
i retry
Graham
16-May-2009
[3745x3]
working now.
Actually yours appears to be the better solution because you don't 
specify the headers
and just pick it up from the formmating of the text
Steeve
16-May-2009
[3748]
yep
Graham
16-May-2009
[3749]
well, I'm impressed :)
Steeve
16-May-2009
[3750]
you should not
Graham
16-May-2009
[3751]
sadly I am.
Graham
17-May-2009
[3752]
the parser dies when there is something like "2.5mg" in the text 
wiht invalid decimal error.
Steeve
17-May-2009
[3753x3]
should not, give the data please
There is no reason, the content is enclosed in a string before being 
loaded.
If it fails, it's because the whole grammar has changed
probaly blank lines are inserted in the content (where they should 
not)
Graham
17-May-2009
[3756]
{CC:
This is the presenting complaint.


HPI:
Developed over a few days

CURRENT MEDICATIONS:
METHOTREXATE SODIUM EQ 2.5MG BASE once weekly
METHOTREXATE SODIUM EQ 2.5MG BASE once weekly
Plaquenil 200 mg two daily
Prednisone 5 mg od
Salazopyrin EN 500 mg  two bd with food
Ultram Oral Tablet 50 MG qid prn
}
Steeve
17-May-2009
[3757x4]
ok i test that
at first sight, i can say there is too many blank lines
Right, i added skiping of useless newline.

parse/all src [
	some [
		any newline
		some [pos: #" " (change pos #"-") | header-char]
		#":" pos: newline (change/part pos " {" 1)
		[to EOL2 | to end] pos: (change pos "} ") skip skip
	]
]

Could you figure it ?
Anticipated fails:

- if blanks lines are inserted in the content (because blank lines 
should only used as delimiters between headers).
- if header's names can't be converted to words.
Maxim
17-May-2009
[3761]
afaik... my solution works flawlessly.  we could easily extend the 
header info so it recognises headers without naming them explicitely.
Steeve
17-May-2009
[3762]
In fact i could extend my solution easly to prevent those errors 
and throwing safe errors it the parsing failed.
I takes 5 minutes to do.

But adding such exceptions or other sub-rules is so easy that i don't 
see the interest to prevent those cases.

It's my philosophy when i write parsing rules.

They are so easy to extend, there is no reason to anticape thoses 
cases by guessing what is in the in the mind of the  final user.
Whe have to extend the grammar ? 
Ok, give me 5 minutes.
Graham
17-May-2009
[3763x2]
The thing is that the user can type what they want ... so have to 
be prepared for anything.
All I ask is that they type the headers in correctly.
Steeve
17-May-2009
[3765x2]
I'm not a magician, i can't figure all the cases if the given specifications 
are incompletes.

Everybody has a job to do, it's not mine to work on wrong specifications.
If you can't prevent them to insert blank lines in the content, then 
the Maxim's solution should be used isntead.
With a list of authorized headers.
Graham
17-May-2009
[3767]
It's free text ... no way can I prevent users from doing this.
Steeve
17-May-2009
[3768x2]
So you can't use automatic recognition of unspecified headers. Easy 
to figure.
if headers are not distinguishable from free text, there is no solution
Graham
17-May-2009
[3770]
Not if I use Max's method .. but the headers can be obtained from 
the original object specifications.
Steeve
17-May-2009
[3771]
do so
Maxim
17-May-2009
[3772]
the header-lbl rule in my example could be changed so it matches 
up to the first colon, but then, there is a flaw in that the text 
can also include something that LOOKS like a header and then you 
can have a stray value in the object...


in the original example data you posted... this would be hard to 
tackle...

Penicillin - allergy:
Graham
17-May-2009
[3773x2]
That was my original way of doing things.
I built the rule from the object and then parsed the data .. but 
my way relied on the headers being in the correct order.
Maxim
17-May-2009
[3775]
I started on steeve's course and had similar new-line issues, which 
is why I decided to parse liine by line.
Steeve
17-May-2009
[3776x3]
can't be the headers be prefixed, it would be so easy to treat...
Parsing line by line is not the solution (neither the problem) there.

All you can do line by line can be enrolled in only one parsing flow. 
It's just matter of your skills in using parse.
i saw many people proposing to parse line by line in many topics 
here.
I don't get it. 
It's slower and wasting memory for nothing.

They seem to be afraid of the use of any/some parsing loops, i don't 
understand why.
Maxim
17-May-2009
[3779]
its just MUCH easier in doing it line by line because the context 
of the parse isn't the same.  a parse rule going astray in multi-line 
doesn't react the same as for a single line which has a context of 
"this has a header" | "this doesn't"


I'm not saying my solution can't be done using only one parse, only 
that the rules are that much simpler.  in my first tests, handling 
the first and last headers needed special treatment, ultimately forcing 
me to add new rules, and generally making the whole much more complex.
Steeve
17-May-2009
[3780]
i never had to cut data into lines when parsing, and i will never 
have to
Maxim
17-May-2009
[3781x2]
steeve I did a 4000 line parse rule... outperforming C code.  but 
I'm pragmatic.  if the rules are going to be 50% smaller, and 100% 
bug free. then that's the better solution.
I find parse is very suited to very complex systems.  strangely, 
the more complex the rules, the better they are at being parsed.
Steeve
17-May-2009
[3783]
i don't get your point, i've done a lot of parsing scripts too. 

Never saw that it could be bug free or smaller using parse line by 
line.
It's just wasting time and memory.
Maxim
17-May-2009
[3784]
it took me about 30 seconds to solve it with lines.  with a single 
parse rule, after 15m  I was still trying to corner a simple detail 
that meant rewriting the whole rules, or adding a new rule, just 
for one specific situation.  Had I started with another rule setup, 
I'd encountered another nagging situation (like yours has tumbled 
upon).


my time / hour is worth more than 2 milliseconds my of my computer 
consuming 1/4 watt of electricity.  Using 500 bytes more of ram that 
is recycled, also isn't worth consideration.

like I said, I'm pragmatic, that's all there is to it.
Steeve
17-May-2009
[3785]
My...

The problen in the method i proposed has nothing to do with the line 
by line approach.

Can't you figure that ? It's only because i try do recognise headers 
whitout knowing them.


I can rewrite your solution without using your line by line approach 
in 5 minutes (you didn't do yours in 30 secs btw).
It will be smaller and faster than yours.
But i don't see the interest, i thougth anyone could figure that.
Maxim
17-May-2009
[3786x2]
sorry... I should have been more preicse:


solved <> writting down the code.  it did take me a bit more time 
writting it down than solving, testing, cleaning and submitting, 
it.
on the other hand, I do see that parsing in the rebol scene seems 
to be the cause for bragging rights.  its a complex system reserved 
for a select few who have spent time and effort learning how to come 
to grips with it.


looking at working rules, makes it seem simple, but the deeper knowledge 
of how it works ... really isn't.
Graham
17-May-2009
[3788]
Given time and repeated use, parse should be able to be learnt by 
most programmers .. but many of us use it infrequently, and so don't 
retain the skills we might have learnt.