r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Rebolek
24-May-2007
[1783]
OK
BrianH
24-May-2007
[1784x2]
Still, you might want to apply rewrite rules to your generated parse 
rules - that code seems a little sloppy.
Peephole fixing?
Rebolek
24-May-2007
[1786]
rewrite rules?
Oldes
24-May-2007
[1787]
that you will not have [some "a" "a"] but just [some "a"]
Rebolek
24-May-2007
[1788]
Well, I'm not exactly sure if that's possible, I have to do some 
tests
BrianH
24-May-2007
[1789x2]
By rewrite rules, I mean something like what Gabriele came up with 
for the rebcode assembler a while ago. Since I helped refine his 
work, I may still have a copy somewhere. I'll take a look.
I'm not sure I helped with this one, now that I think of it. It's 
one of his literate programming projects, here:
http://www.colellachiara.com/soft/Misc/
Look for rewrite.*
Gabriele
24-May-2007
[1791]
that one is different from the one in rebcode, but the principle 
is about the same.
Rebolek
24-May-2007
[1792x2]
OK I'll check it, thanks
is it possible to convert bitset! back to something readable?
Geomol
24-May-2007
[1794]
Define readable! ;-) Maybe you could use a combination of to-string, 
to-binary, debase and things like that.
Rebolek
24-May-2007
[1795]
if i do (a: charset "abc") i want to do also (decharset a) to get 
"abc" :) that's readable ;)
Volker
24-May-2007
[1796]
loop thru all possible chars, print matches ;)
Rebolek
24-May-2007
[1797]
yes, but that's really not the fastet way :)
Geomol
24-May-2007
[1798]
Rebolek, use my hokus-pokus function:

hokus-pokus: func [
	value
	/local a out
][
	either bitset? value [
		a: enbase/base to-binary value 2
		out: copy ""
		forall a [
			if a/1 = #"1" [append out to-char (index? a) - 4]
		]
		out
	][
		42
	]
]

>> a: charset "abc"
>> hokus-pokus a
== "abc"
BrianH
24-May-2007
[1799x3]
bitset-to-string: func [b [bitset!] /local s] [
    s: copy ""
    repeat x 256 [
        x: to-char x - 1
        if find b x [append s x]
    ]
    s
]
; Sorry, error...
bitset-to-string: func [b [bitset!] /local s x c] [
    s: copy ""
    repeat x 256 [
        c: to-char x - 1
        if find b c [append s c]
    ]
    s
]
That should be pretty fast, and it doesn't involve huge binary temporaries.
Gregg
24-May-2007
[1802]
http://www.codeconscious.com/rebol/scripts/bitsets.r
BrianH
24-May-2007
[1803x3]
To compare those, it looks like

    repeat x 256 [
        c: to-char x - 1
        if find b c [append s c]
    ]

would be faster than

    for i 0 255 1 [

        if parse/all to-string test-char: to-char i reduce [ bitset ] [
            append result test-char
        ]
    ]


because of the reduce, the to-string. the parse, and the use of the 
mezzanine for instead of the native repeat.
Your's is more flexible, though.
Lots of other interesting stuff on that site.
Gregg
24-May-2007
[1806]
Yes, Brett has built a lot of very cool stuff. Haven't seen him around 
for a while though.
Rebolek
26-May-2007
[1807]
So, this is my first attempt to do regular expressions in REBOL. 
Type on your console:
do http://bolek.techno.cz/reb/regex.r

Some things are missing and it can sometimes run in endless loop 
when it shouldn't, so please be benevolent :)

But at least the email regex can be translated and parsed succesfully.
Henrik
26-May-2007
[1808]
it's quite small
Oldes
26-May-2007
[1809x2]
nice:)... just you should print regex not regexp in the example (or 
rename the function)
and... it would be good to have just a function which returns the 
translated Rebol parse block
Graham
26-May-2007
[1811]
regex rules look quite complicated!
Rebolek
26-May-2007
[1812x3]
Graham yes, they are :)
Oldes: regex vs. regexp on google is 12mil vs 8mil, so that's the 
reason, but I can rename it (it was called regexp yesterday ;)
And yes, function returning just parse rules will be done, this is 
just a work in progress
Oldes
26-May-2007
[1815x2]
I don't mind how you caal the function.. just the printed one is 
different from the one used which is confusing as first what I did 
was copy and paste what you printed and got en error that regexp 
function do not exists
and anyway... 12 or 8 millions google rusults  is not a big difference 
if your page is not listed between first 20 pages:)
Rebolek
26-May-2007
[1817]
oh I see now what you mean...that print is from test function, I'll 
change it. And that google score - I used it just to compare what 
name will be better, I in no way don't expect to be there with my 
page :) It's just what I do when I'm not sure how some word spells 
(svatba vs. svadba and so on ;) - I put both terms in Google and 
the one with better score is probably the right one ;)
Oldes
26-May-2007
[1818x2]
you can use... http://www.googlefight.com/or make a Rebol version... 
it's quite easy
if you were talking about bitsets.... it reminded me that it would 
be good to have some common rules in Rebol3 available for parsing...
Rebolek
26-May-2007
[1820x2]
I had googlefight in Krabot, if you remeber :)
like characters and digits?
Oldes
26-May-2007
[1822]
I mean.... I have to write.... spaces: charset " ^/^-^M" and so on 
on so many places in code
Rebolek
26-May-2007
[1823]
and whitespaces, yes
Oldes
26-May-2007
[1824x2]
it could be working like in stylize in VID
but maybe it's not so important...
Rebolek
26-May-2007
[1826]
in the file i posted is a function REGSET that converts small bit 
of regex to bitset, it's syntax seems to be easier than charset's 
syntax (charset [#"a" - #"z" #"0" - #"9"] vs regset "a-z0-9")
Gregg
26-May-2007
[1827x2]
Very nice Boleslav! What regex engine/syntax are you going for compatibility 
with (if any)?


Charset syntax is probably that way because it's a dialect, and Carl 
wanted a string as input to be easy, without escapes and such; just 
my guess.
Graham, the best book I know of on regexs is Jeff Friedl's 'Mastering 
Regular Expressions'. He has an email validating regex (i.e. it just 
matches the RFC822 spec for an email address) which is almost 5K 
IIRC.
Rebolek
26-May-2007
[1829]
Thanks Gregg. I started with some examples from http://regular-expressions.info, 
just to see, if I can do it, so after fixing bugs and adding some 
feature still missing, I'll see what next, if anything. There are 
already some incompatibilities - carret ^  cannot be used in rebol 
strings the way it's used in regex, so i'm using tilde ~ instead.
BrianH
26-May-2007
[1830x3]
There are several different regex dialects. Are you following one 
of those, or making another?
(Running commentary as I read your code)
You should wrap your code in a context.