r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Graham
29-Jan-2010
[4861]
only eats one char instead of two ... so that's a 50% improvement
BrianH
29-Jan-2010
[4862x2]
The worst was when someone "fixed" #10 to make it compatible with 
R2's buggy behavior. Bad fixes get marked as a problem.
Check out #666 for R3's official policy on bug-for-bug compatibility 
:)
Graham
29-Jan-2010
[4864]
at least it should not introduce new bugs
BrianH
29-Jan-2010
[4865]
Agreed (and the policy agrees too).
Graham
29-Jan-2010
[4866]
I looked for a previous report on this bug but couldn't find it .. 
4 pages of bugs with parse in them.  I wonder if they can be filtered 
to only show active bugs
BrianH
29-Jan-2010
[4867]
Bring it up in the !CureCode group.
Graham
7-Feb-2010
[4868x2]
I want to extract all the dates ( dd-mmm-yy, dd mmm yyyy d mmmmmmm 
yy )


extract-dates: func [ txt 
	/local months dates days month year
][
	dates: copy []
	months: copy []
	digit: charset [ #"0" - #"9" ]
	digits: [ some digit ]
	foreach mon system/locale/months [
		repend months [ mon '|  copy/part mon 3 '| ]
	]
	remove back tail months
	parse txt [
		some [
			to 1 2 digits copy days 1 2 digit [ #" " | #"-" ]
			copy month months
			[ #" " | #"-" ]
			copy year [ 4 digits | 2 digits ]
			( repend dates rejoin [ days "-" month "-" year ] ) |
			thru 1 2 digits ??
		]
	]
	dates
]


extract-dates "asdf sdfsf  11 Jan 2008 12-January-10 fasdfsaf asdf 
as 11 2 3 3  13-Feb-08 asdfasf "
not working ...
Steeve
7-Feb-2010
[4870]
R2 or R3 ?
In any case, the first rule may fail.
you can't do "TO 1 2 digits"
BrianH
7-Feb-2010
[4871]
TO and THRU have limited argument syntax, and don't support full 
rules. Both R2 and R3 support literal value arguments (that don't 
count as rules). R3 also supports a block of literal values delimited 
by |, and those values are less limted.
Steeve
7-Feb-2010
[4872x2]
Something weird !
Using a simple charset with TO or THRU should work.
But it fail here with R3.

digits: charset "134567890"

Something weird !
Using a simple charset with TO or THRU should work.
But it fail here with R3.

>> digits: charset "134567890"
>> parse "azaz 34" [to digits ??]
end!: "azaz 34"
Oh my !!!!!
It fail with R2 now too...
Graham
7-Feb-2010
[4874]
R2 & R3 ... I tried
nondigit: complement digit nondigits: [ some nondigit ]

some [
	any nondigits 1 2 ....
]

but it gets stuck on the year
BrianH
7-Feb-2010
[4875]
Steeve, that's a bug that I reported yesterday.
Graham
7-Feb-2010
[4876]
I was using r3 as it's easier to trace the parse ... but perhaps 
i shouldn't!
Steeve
7-Feb-2010
[4877]
Maybe i'm wrong ,I can't  remember if TO or THRU ever worked with 
charsets.
Alzheimer catches me...
Graham
7-Feb-2010
[4878]
XRatio is right .. parse is too difficult!
Steeve
7-Feb-2010
[4879]
hehe
Gabriele
7-Feb-2010
[4880]
to/thru never worked with charsets. that's why we always have those 
complements... :)
BrianH
7-Feb-2010
[4881]
Oh crap. Well, it was reported as a bug, and it's staying that way 
until Carl says otherwise :)
Gabriele
7-Feb-2010
[4882]
given that to and thru do "more" in R3, it probably is not bad to 
consider it a bug. (maybe it should be considered a bug in R2 as 
well, given that FIND does work with charsets...)
BrianH
7-Feb-2010
[4883]
Carl seems to think that he can add TO or THRU QUOTE value to block 
parsing too.
Graham
7-Feb-2010
[4884x3]
this works 


extract-dates: func [ txt 
	/local months dates days month year
][
	dates: copy []
	months: copy []
	digit: charset [ #"0" - #"9" ]
	digits: [ some digit ]
	nondigit: complement digit
	nondigits: [ some nondigit ]
	foreach mon system/locale/months [
		repend months [ mon '|  copy/part mon 3 '| ]
	]
	separator: [ #" " | #"-" ] 
	remove back tail months

 date-rule: [ copy days 1 2 digit separator copy month months separator 
 copy year digits (
		?? days ?? month ?? year
		append dates ajoin [ days "-" month "-" year ] 
		)
	]
	parse txt [
		some [
			any nondigits [ date-rule | any digits ]
		]
	]
	dates
]
extract-dates "asdf sdfsf 1 11 Jan 2008 12-January-10 fasdfsaf asdf 
as 11 2 3 3  13-Feb-08 asdfasf "
days: "11"
month: "Jan"
year: "2008"
days: "12"
month: "January"
year: "10"
days: "13"
month: "Feb"
year: "08"
== ["11-Jan-2008" "12-January-10" "13-Feb-08"]
ahh... correction, it works under R3 and locks up in R2 :(
Graham
8-Feb-2010
[4887]
and finally a parse rule that works under r2 and r3

	parse/all txt [
		some [
			[ end | any nondigits ] [ date-rule | some digits  ] 
		]
	]
Sunanda
13-Apr-2010
[4888]
Parse help needed here:

  http://stackoverflow.com/questions/2631125/change-part-doesnt-work-as-expected-with-parse
Ladislav
13-Apr-2010
[4889x2]
His style looks strange
(looks like he never read Parse doc)
Sunanda
13-Apr-2010
[4891]
He does ask a lot of simpler questions :)
Ladislav
13-Apr-2010
[4892x3]
I am against using change on parse input (never did it)
That operation is too slow to be serious
(I mean seriously usable)
Henrik
13-Apr-2010
[4895]
I can understand why you would want to, though, as an advanced search/replace 
tool.
Ladislav
13-Apr-2010
[4896]
no way, you certainly cannot talk me into that
Steeve
13-Apr-2010
[4897]
Classical...
ending: (ending: change/part start "mystring" ending) :ending
Ladislav
13-Apr-2010
[4898]
yes, that is his trouble
Steeve
13-Apr-2010
[4899x2]
Ladislav, On short strings parse replacements is faster than anything 
else
especially within R3
Ladislav
13-Apr-2010
[4901]
Your statement cannot be verified, since you did not specify,what 
you mean by "short strings"
Steeve
13-Apr-2010
[4902]
It's simple to understand, it's faster until it's not anymore, depending 
the use cases, do your own tests
Ladislav
13-Apr-2010
[4903]
Yes, "it's faster than anything else, until it's not" is a perfect 
statement, and you got my agreement :-p
Steeve
13-Apr-2010
[4904]
:)
Henrik
13-Apr-2010
[4905]
a short string is one that is not long. :-)
Maxim
13-Apr-2010
[4906]
ladislav, Remark changes the input on the fly to implement function 
html unfolding, and using that improved speed by 50 times, when compared 
with traditional series manipulations.

so yes its seriously usable   ;-P
Ladislav
13-Apr-2010
[4907]
Now, I can make a bold statement: for any method distinct from the 
one using PARSE and CHANGE/PART combo holds, that it is faster than 
the above method, until it's not :-p
Maxim
13-Apr-2010
[4908]
its not a single change/part which is the issue, its managing the 
stack, allocating all those blocks over and over... the sheer speed 
of the parse loop, blows away all the other looped/recursive algorythms 
in my usage so far.
Ladislav
13-Apr-2010
[4909]
Nevertheless, I pointed him to http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse#Modifying_the_input_series
BudzinskiC
14-Apr-2010
[4910]
And here I thought yesterday, wow I finally understood Parse and 
gosh it's awesome. And now I read change/part, which I used, is not 
the way to do things unless it is. I am confused! Generally, but 
also now specifically.