r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Gregg
28-Apr-2006
[933x3]
The singular/plural argument seems easy, but isn't (IMO); DIGITS 
could be done as SOME DIGIT, and you could argue that things like 
2 DIGITS reads better, though 1 DIGITS does not. You could double-define 
it, but that gets ugly too. So, what about DIG? That doesn't imply 
any singularity, though it's a bit terse, and not a full word (or, 
rather, the wrong full word).
I'm all for proposing some basics though. Worst case, you can override 
them, which is no more work than we do today.
space/spc
whitespace/wsp
alpha
digit(s)
alpha-num	; should digit be num?
ctl/control
non-US-ASCII/high-ASCII
quoted-string
escaped-char    ; what is the escape though; REBOL ^, C \, etc.?

What other standard sets would we want?
Sunanda
28-Apr-2006
[936]
II was sure I'd posted this just after Oldes' message.....But it 
ain't there now.....Maybe it's in the wrong group)
Andrew has a nice starter set:

http://www.rebol.org/cgi-bin/cgiwrap/rebol/view-script.r?script=common-parse-values.r

And I know he has extended that list extensively to include things 
like email address and URL
Gregg
28-Apr-2006
[937x2]
It would be great (again, IMO), if we had parse rules for REBOL datatypes. 
For those that want the power of block parsing, with the ability 
to load strings that aren't valid REBOL, it would be very handy.
Good starter set! I forgot about that. Thanks Sunanda.
Graham
28-Apr-2006
[939x2]
the problem I find with block parsing is the rigid interpretation 
of datatypes.
So, if Rebol gets the datatype wrong ( and real word data is dirty 
), you're screwed.
Gregg
28-Apr-2006
[941]
That's the tradeoff. :\
Graham
28-Apr-2006
[942x3]
real world data is dirty ..
Maybe there should be no invalid datatypes .... everything can be 
converted to a datatype
if the parser thinks a datatype is invalid, well, let's call it an 
invalid! datatype!!
Gregg
28-Apr-2006
[945]
I think that's where string parsing comes in, and where having rules 
for REBOL datatypes would ease the pain.
Graham
28-Apr-2006
[946x3]
I do screen validation by datatypes ( for data input ).  If the user 
enters an invalid datatype ... ..
anyway, I think rebol should recognise all data ..
have a catchall for stuff it thinks is wrong
Oldes
30-Apr-2006
[949x2]
I agree with you Graham, I was mentioning this many times, that there 
could be something to handle datatype exceptions
About the spaces charset - most people do not know that we have one 
more space char - non braking space:  >> to-char 160 <<
Volker
1-May-2006
[951x2]
How about another way: integrate datatypes in string-parser. Basically 
a  load/next and check for type.
Then we can write (note i parse a string): 
parse "1 a , #2" [ integer! word! "," issue! ]
'invalite! has a problem: its easy to recognize where the wrong part 
starts, but harder to recognize where the wrong part ends.
Oldes
1-May-2006
[953x2]
Is there any RTF (Rich Text Format) parser  for Rebol?
hm, maybe this one: http://www.codeconscious.com/rebol/scripts/rtf-tools.r
:-)
Ashley
24-May-2006
[955]
Quick question for the parse experts. If I have a parse rule that 
looks like this:

parse spec [
	any [
		set arg string! (...) | set arg tuple! (...) | ...
	]
]

How would I add a rule like:

	set arg paren! (reduce arg)


that always occurred prior to the any rule and evaluated parenthesized 
expressions (i.e. I want parenthesized expressions to be reduced 
to a REBOL value that can be handled by the remainder of the parse 
rule).
Tomc
25-May-2006
[956]
I only parse strings  not blocks so this may be compleatly off  but 
I would try
parse spec [
	any[
		opt [here: set arg paren! (change :here reduce arg) :here]
		[	set arg string! (...) | 
			set arg tuple! (...) | ...
		]
	]
]
Anton
25-May-2006
[957]
(here/1: do arg)
Ashley
25-May-2006
[958]
Thanks both, works a treat.
Graham
27-Jun-2006
[959]
My brain is still asleep.  How to go thru a document and add <strong> 
</strong> around every word that is in capitals and is more than 
a few characters long?
Pekr
27-Jun-2006
[960x3]
hmm, quite a challenge ...
somehow to look-up words, mark: before, find its end (another space), 
check for if first is capital or not, change at position, :mark at 
end ...
but don't ask me for code, it would last few hours to get somewhere, 
if even :-)
Graham
27-Jun-2006
[963]
pattern search on capitals, mark, copy to space, mark, count length 
of copy, if long, insert at mark2, and then at mark1, continue ??
Gordon
27-Jun-2006
[964]
I agree - a bit much to ask.  A more specific question would get 
a more specific answer :)

Something like:

file: read filename2parse
newfile: ""
Foreach word file [
   if Is-Capitals Word [
      newfile: join newfile ["<strong> " word " </strong> "]
]

The Is-Capitals function would have to be defined
Is-Capitals func [Word2Check] [
   some code here
]
Graham
27-Jun-2006
[965x2]
that won't work because file is just text and not a block.
but my brain is gradually waking up now ... all I need to do is get 
dressed!
Pekr
27-Jun-2006
[967]
:-)
Volker
27-Jun-2006
[968]
;thinking loud:
capitals: charset["#"A" - #"Z"]
capital: [5 capitals any capitals]
Henrik
27-Jun-2006
[969]
can you do this in one pass?
Gordon
27-Jun-2006
[970]
.Yes "Newfile would have to be "parsed" into words

something like:

Newfile: parse file

or 

file: parse/with file {separator character}
Graham
27-Jun-2006
[971x3]
troubel is, parse doesn't only just parse on " " if specified ...
so, you might lose other characters.
I think this can be done in one pass.
Pekr
27-Jun-2006
[974]
I would not rely on parse helpers, as parse string delimiter, but 
use full parse/all, if you need precise result ...
BrianH
27-Jun-2006
[975]
Yes, give me a minute...
JaimeVargas
27-Jun-2006
[976x2]
capitalize-word: func [

    s [string!]
    /local len

][

    either 5 < len: length? s [

        s: rejoin ["<strong>" uppercase s/1 next s </strong>]

    ][

        s

    ]
 
]



capitalize-text: func [
    s [string!]

    /local result word-rule alpha non-alpha w c
][

    result: copy {}
    alpha: charset [#"A" - #"Z" #"a" - #"z"]

    non-alpha: complement alpha

     word-rule: [copy w [some alpha] (insert tail result capitalize-word 
     w)]
    other-rule: [copy c non-alpha (insert tail result c)]

    parse/all s [some [word-rule | other-rule] end]
    result

]
>> capitalize-text {The result changes according to formating.}  
          

; 
== {The <strong>Result</strong> <strong>Changes</strong> <strong>According</strong> 
to <strong>Formating</strong>.}
Graham
27-Jun-2006
[978x2]
Not quite the problem I was stating!
search for a series of capitalised words and strong them
JaimeVargas
27-Jun-2006
[980]
Ah. Very easy modification.
Graham
27-Jun-2006
[981x2]
bolden-word: func [
    s [string!]
    /local len
][
    either 5 < len: length? s [
        s: rejoin ["<strong>" s </strong>]
    ][
        s
    ]
 ]

enhance-text: func [
    s [string!]
    /local result word-rule alpha non-alpha w c
][
    result: copy {}
    alpha: charset [#"A" - #"Z"]
    non-alpha: complement alpha

    word-rule: [copy w [some alpha] (insert tail result bolden-word w)]
    other-rule: [copy c non-alpha (insert tail result c)]
    parse/all s [some [word-rule | other-rule] end]
    result
]
Thanks Jaime.