World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Graham 16-May-2009 [3740]	I just copied it from here.
Steeve 16-May-2009 [3741]	i mean for your source data, not for my code
Graham 16-May-2009 [3742]	that's what I meant .. I just copied the source data from here.
Steeve 16-May-2009 [3743x2]	ok, it works for me
Steeve 16-May-2009 [3743x2]	i retry
Graham 16-May-2009 [3745x3]	working now.
	Actually yours appears to be the better solution because you don't specify the headers
	and just pick it up from the formmating of the text
Steeve 16-May-2009 [3748]	yep
Graham 16-May-2009 [3749]	well, I'm impressed :)
Steeve 16-May-2009 [3750]	you should not
Graham 16-May-2009 [3751]	sadly I am.
Graham 17-May-2009 [3752]	the parser dies when there is something like "2.5mg" in the text wiht invalid decimal error.
Steeve 17-May-2009 [3753x3]	should not, give the data please
	There is no reason, the content is enclosed in a string before being loaded. If it fails, it's because the whole grammar has changed
	probaly blank lines are inserted in the content (where they should not)
Graham 17-May-2009 [3756]	{CC: This is the presenting complaint. HPI: Developed over a few days CURRENT MEDICATIONS: METHOTREXATE SODIUM EQ 2.5MG BASE once weekly METHOTREXATE SODIUM EQ 2.5MG BASE once weekly Plaquenil 200 mg two daily Prednisone 5 mg od Salazopyrin EN 500 mg two bd with food Ultram Oral Tablet 50 MG qid prn }
Steeve 17-May-2009 [3757x4]	ok i test that
	at first sight, i can say there is too many blank lines
	Right, i added skiping of useless newline. parse/all src [ some [ any newline some [pos: #" " (change pos #"-") \| header-char] #":" pos: newline (change/part pos " {" 1) [to EOL2 \| to end] pos: (change pos "} ") skip skip ] ] Could you figure it ?
	Anticipated fails: - if blanks lines are inserted in the content (because blank lines should only used as delimiters between headers). - if header's names can't be converted to words.
Maxim 17-May-2009 [3761]	afaik... my solution works flawlessly. we could easily extend the header info so it recognises headers without naming them explicitely.
Steeve 17-May-2009 [3762]	In fact i could extend my solution easly to prevent those errors and throwing safe errors it the parsing failed. I takes 5 minutes to do. But adding such exceptions or other sub-rules is so easy that i don't see the interest to prevent those cases. It's my philosophy when i write parsing rules. They are so easy to extend, there is no reason to anticape thoses cases by guessing what is in the in the mind of the final user. Whe have to extend the grammar ? Ok, give me 5 minutes.
Graham 17-May-2009 [3763x2]	The thing is that the user can type what they want ... so have to be prepared for anything.
Graham 17-May-2009 [3763x2]	All I ask is that they type the headers in correctly.
Steeve 17-May-2009 [3765x2]	I'm not a magician, i can't figure all the cases if the given specifications are incompletes. Everybody has a job to do, it's not mine to work on wrong specifications.
Steeve 17-May-2009 [3765x2]	If you can't prevent them to insert blank lines in the content, then the Maxim's solution should be used isntead. With a list of authorized headers.
Graham 17-May-2009 [3767]	It's free text ... no way can I prevent users from doing this.
Steeve 17-May-2009 [3768x2]	So you can't use automatic recognition of unspecified headers. Easy to figure.
Steeve 17-May-2009 [3768x2]	if headers are not distinguishable from free text, there is no solution
Graham 17-May-2009 [3770]	Not if I use Max's method .. but the headers can be obtained from the original object specifications.
Steeve 17-May-2009 [3771]	do so
Maxim 17-May-2009 [3772]	the header-lbl rule in my example could be changed so it matches up to the first colon, but then, there is a flaw in that the text can also include something that LOOKS like a header and then you can have a stray value in the object... in the original example data you posted... this would be hard to tackle... Penicillin - allergy:
Graham 17-May-2009 [3773x2]	That was my original way of doing things.
Graham 17-May-2009 [3773x2]	I built the rule from the object and then parsed the data .. but my way relied on the headers being in the correct order.
Maxim 17-May-2009 [3775]	I started on steeve's course and had similar new-line issues, which is why I decided to parse liine by line.
Steeve 17-May-2009 [3776x3]	can't be the headers be prefixed, it would be so easy to treat...
	Parsing line by line is not the solution (neither the problem) there. All you can do line by line can be enrolled in only one parsing flow. It's just matter of your skills in using parse.
	i saw many people proposing to parse line by line in many topics here. I don't get it. It's slower and wasting memory for nothing. They seem to be afraid of the use of any/some parsing loops, i don't understand why.
Maxim 17-May-2009 [3779]	its just MUCH easier in doing it line by line because the context of the parse isn't the same. a parse rule going astray in multi-line doesn't react the same as for a single line which has a context of "this has a header" \| "this doesn't" I'm not saying my solution can't be done using only one parse, only that the rules are that much simpler. in my first tests, handling the first and last headers needed special treatment, ultimately forcing me to add new rules, and generally making the whole much more complex.
Steeve 17-May-2009 [3780]	i never had to cut data into lines when parsing, and i will never have to
Maxim 17-May-2009 [3781x2]	steeve I did a 4000 line parse rule... outperforming C code. but I'm pragmatic. if the rules are going to be 50% smaller, and 100% bug free. then that's the better solution.
Maxim 17-May-2009 [3781x2]	I find parse is very suited to very complex systems. strangely, the more complex the rules, the better they are at being parsed.
Steeve 17-May-2009 [3783]	i don't get your point, i've done a lot of parsing scripts too. Never saw that it could be bug free or smaller using parse line by line. It's just wasting time and memory.
Maxim 17-May-2009 [3784]	it took me about 30 seconds to solve it with lines. with a single parse rule, after 15m I was still trying to corner a simple detail that meant rewriting the whole rules, or adding a new rule, just for one specific situation. Had I started with another rule setup, I'd encountered another nagging situation (like yours has tumbled upon). my time / hour is worth more than 2 milliseconds my of my computer consuming 1/4 watt of electricity. Using 500 bytes more of ram that is recycled, also isn't worth consideration. like I said, I'm pragmatic, that's all there is to it.
Steeve 17-May-2009 [3785]	My... The problen in the method i proposed has nothing to do with the line by line approach. Can't you figure that ? It's only because i try do recognise headers whitout knowing them. I can rewrite your solution without using your line by line approach in 5 minutes (you didn't do yours in 30 secs btw). It will be smaller and faster than yours. But i don't see the interest, i thougth anyone could figure that.
Maxim 17-May-2009 [3786x2]	sorry... I should have been more preicse: solved <> writting down the code. it did take me a bit more time writting it down than solving, testing, cleaning and submitting, it.
Maxim 17-May-2009 [3786x2]	on the other hand, I do see that parsing in the rebol scene seems to be the cause for bragging rights. its a complex system reserved for a select few who have spent time and effort learning how to come to grips with it. looking at working rules, makes it seem simple, but the deeper knowledge of how it works ... really isn't.
Graham 17-May-2009 [3788]	Given time and repeated use, parse should be able to be learnt by most programmers .. but many of us use it infrequently, and so don't retain the skills we might have learnt.
Steeve 17-May-2009 [3789]	You may be rigth on that point. I think many rebolers well knowledged in most of practices, don't use parse at his full power. Whereas parse is most powerful feature in Rebol to my mind.
older newer	first last