r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Steeve
24-May-2010
[5001]
some space 
is even better to my mind
Terry
24-May-2010
[5002]
>> spam: "[15/May/2010 17:59:56] IP address 190.101.1.10 found in 
DNS blacklist SpamHaus SBL-XBL..."

== {[15/May/2010 17:59:56] IP address 190.101.1.10 found in DNS blacklist 
SpamHaus SBL-XBL...}
>> replace/all spam "]" "" replace/all spam "[" ""

== {15/May/2010 17:59:56 IP address 190.101.1.10 found in DNS blacklist 
SpamHaus SBL-XBL...}
>> blk: parse/all spam " "

== ["15/May/2010" "17:59:56" "IP" "address" "190.101.1.10" "found" 
"in" "DNS" "blacklist" "SpamHaus" "SBL-XBL..."]
>> date: blk/1
== "15/May/2010"
>> time: blk/2
== "17:59:56"
>> ip: blk/5
== "190.101.1.10"
Pekr
24-May-2010
[5003]
nice ....
Steeve
24-May-2010
[5004]
trim/with "[]" is better to remove things...
Terry
24-May-2010
[5005]
rebol and logs are like bread and butter
Pekr
24-May-2010
[5006]
yeah ... I used it often ... read/lines is your friend :-)
Steeve
24-May-2010
[5007]
yep nothing in the world can beat this combination
Terry
24-May-2010
[5008]
Structured data like logs, but Rebol shines even when parsing unstructured 
data
florin
24-May-2010
[5009x3]
Yep, this is my script to dig the IP's inside the log files. I was 
quiet excited to see how natural it was to write this:
digits: charset "0123456789"

ip: [some digits "." some digits "." some digits "." some digits 
]

newList: [] 
existingList: []

files: []
foreach file read %. [
	if find file "security.log" [
		split: parse read file none
		foreach it split [
			parse it [[ip] (append newList it) ] 
		]
	]
]
Then, I said, read only from the last read, and pare the date/time. 
I wanted to parse date AND time at the same time" [15/May/2010 17:59:56] 
But I hit a snag because of the space in between. I don't want date 
and time separater beause rebol can parse the string into a date-time 
easy. The space gave me trouble, and the brackets too.
Steeve
24-May-2010
[5012]
I would write th ip rule more strictly.
ip: [1 3 digits 3 [#"." 1 3 digits]]
florin
24-May-2010
[5013x3]
Using the charset did not work. So I did this: parse/all txt [thru 
"[" copy found to "]" ]. And said to myself, time to learn more and 
could not find resources.
g so helpful.
Thanks.
Pekr
24-May-2010
[5016]
What is the problem in parsing date and time at the same time? I 
somehow don't understand, if the solution does what you need, or 
you need further help?
Anton
24-May-2010
[5017]
lines: read/lines file
foreach line lines [ ... ]
florin
24-May-2010
[5018]
Pekr, I'm done. Thanks. Initially I could not figure out how to escape 
the space character.
Ladislav
24-May-2010
[5019]
the funny thing (that confused Pekr) about it is, that you did not 
realize, that there was no need to escape the space character :-)
Terry
24-May-2010
[5020]
for Graham : delimeters = delimiters
florin
25-May-2010
[5021x2]
I should've read the entire core manual before posting. Chapter 15 
addresses this clearly.
Processing text and rules seems so natural to rebol. I think I'm 
going to enjoy this.
Gregg
25-May-2010
[5023]
Once you get used to PARSE, it's very hard to use other languages. 
:-)
Fork
11-Jul-2010
[5024x2]
I've uploaded my old Whitespace interpreter that I implemented in 
PARSE to GitHub: http://github.com/hostilefork/whitespacers/raw/master/rebol/whitespace.r
What's interesting about it is the comparison that can be made to 
implementations in other languages.  I've collected implementations 
of whitespace interpreters in several other languages (by other people) 
in that project: http://github.com/hostilefork/whitespacers/
Anton
30-Jul-2010
[5026]
Ok, continuing the discussion from "Performance" group, I'd like 
to ask for some help with parsing rebol format files.

Basically, I'd like to be able to extract a block near the beginning 
or end of a file, while minimizing disk access.

The files to be parsed could be large, so I don't want to load the 
entire contents, but chunks at a time.

So my parse rule should be able to detect when the input has been 
exhausted and ask for another chunk.

(When extracting a block near the end of a file, I'll have to parse 
in reverse, but I'll try to implement that later.)
Oldes
30-Jul-2010
[5027]
And why you don't want to use the load/next which was advised?
Anton
30-Jul-2010
[5028x4]
Using LOAD/NEXT, I still have to use a O(n^2) algorithm. I'd now 
like to do my own parse, which can be O(n).
As far as I know, there is no way to instruct LOAD/NEXT to only read 
chunks from the file as necessary to load the next value.
Which is why, in that algorithm, I had to iteratively: load a chunk, 
append it and try LOAD/NEXT until it succeeded.
Which gives the algorithm O(n^2) performance.
My question for this O(n) parse algorithm is:
What rebol syntax do I need identify?
I suppose all I need is:
 - Comments (lines beginning with semi-colon ; )

 - Strings (single-line "" and multi-line {} ) watching out for escape 
 sequences eg. {^}}
 - Blocks
Oldes
30-Jul-2010
[5032]
yes
Anton
30-Jul-2010
[5033]
Have I missed anything ?
Oldes
30-Jul-2010
[5034x2]
just that comment may not be on line beginning so far
and you want to get just the first block? Skipping any content before 
it?
Anton
30-Jul-2010
[5036x2]
Ok, yes, it may be preceded by whitespace. That seems easy to deal 
with.
Well, I'd like the algorithm to be general enough to get any number 
of block in the file. So far I need just the first, second and last 
blocks.
Oldes
30-Jul-2010
[5038]
And you need to parse the Altme's *.set files or something else as 
well?
Anton
30-Jul-2010
[5039x3]
The *.set files are what I need it for, yes.
I imagine it could be useful in other similar situations, so I'd 
like it to be pretty general.

I suppose a bonus functionality is to be able to get nested blocks.

(And a super bonus will be to get any datatype at any level, but 
I won't bother doing that until I need it.)
Um, actually, I remember years ago I wanted to load the rebol header 
of many files, avoiding loading the whole of each file.
Oldes
30-Jul-2010
[5042]
Also char! must be supported:  #"["
Anton
30-Jul-2010
[5043x2]
So this algorithm could be used for that, too.
Must it ?

I think if I can parse single-line strings correctly, then a bracket 
inside won't cause a problem.

This means I'll be basically ignoring datatypes which allow strings 
in their syntax, and just jumping to the string part.
Oldes
30-Jul-2010
[5045]
you are right
Anton
30-Jul-2010
[5046x5]
I don't think there's any way to make any type with a literal bracket 
in it (except blocks, of course). (But I am worrying about that a 
bit.)
I tried to make some words with a single unmatched literal bracket, 
or literal string delimiter, but I failed so far. They don't load, 
so they won't be in well-formed rebol format files.
And then I tried issues, files, and I can't do it there, either.
One caveat:

Misidentifying as a block, types like (what are they called?) "inline 
types"?
eg.  #[none]

If I don't recognise it as none! (or maybe issue!) , then I might 
accidentally take it as a block.
Or how about this:
#[block! [hello]]

I would probably want to extract  [hello], but if I don't recognise 
this form, then I'd get  [block! [hello]]  instead.