r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

BrianH
19-Apr-2010
[4964x2]
Sometimes you don't want to put too much pressure on the GC, and 
sometimes you don't want to increase the total size of the pool too 
much, because that pool doesn't always get returned to the OS very 
quickly or at all. This is the motivation for additions like the 
/into option.
We'll see how much optimizations like that need to be undone once 
we have to adjust for task safety :(
Maxim
19-Apr-2010
[4966x2]
the GC doesn't return the pool... only image data is ever returned 
AFAIK.
and the GC doesn't kick in too quick or it would be really slow  
(just try recycle/torture to see ;-)


so when you're doing serious work it REALLY grows... although it 
stabilizes


for example although stats often show 10MB... my OS tells me that 
its actually using 24 MB.  that will never shrink back down.
florin
24-May-2010
[4968]
Is there a place for the newbie questions on parsing?
Terry
24-May-2010
[4969]
You've come to the right place.
florin
24-May-2010
[4970x2]
I've created my very first script. The script loops through a list 
of email (Kerio) log files, extracts the IP addresses, compiles them 
in a list and adds them to a (Peerblock) list in order to limit incoming 
spam. I find rebol perfect for this.
So an entry in the log file starts like this: "[15/May/2010 17:59:56] 
IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL..."
Terry
24-May-2010
[4972]
aye
florin
24-May-2010
[4973x3]
Improve the script by reading only the latest entries in the log, 
and I pare the date like this: parse/all txt [thru "[" copy found 
to "]" ]
So I get the job done. This is the question: If I do parse/all so 
that spaces are not automatically included, how to I include the 
space in my parse rule?
A rule can be: "=," etc. How do I "escape" the space character so 
that I can include in my rule?
Terry
24-May-2010
[4976x2]
I've always used the spaces as delimeters
parse k [thru "[" copy date to " "]
NickA
24-May-2010
[4978]
I've used parse/all, and then used 'trim on the results.
florin
24-May-2010
[4979]
Yes, that is exactly what I did and it works. However, for the sake 
of learning, how do I use the the space character as part of my rule?
Steeve
24-May-2010
[4980]
don't see your point, show us the annoying rule...
florin
24-May-2010
[4981x5]
Ok, I will. The point is that I want to include the space in my rule. 
Here's the example:
digits: charset "0123456789"

ip: [some digits "." some digits "." some digits "." some digits 
]
This finds the IP in the log entry. What if I have two ip addresses 
and I want to pick them at the same time: ip: [some digits "." some 
digits "." some digits "." some digits __space__ some digits ...etc]
And the IP addresses are separatered by a space?
My question really is, how do I escape the space character as one 
would in regular expressions?
Steeve
24-May-2010
[4986]
you need parse/all
florin
24-May-2010
[4987]
correct, and then, how do you place the space in the rule: {} ?
Steeve
24-May-2010
[4988x2]
#" " or " " or { }
'space works too
Pekr
24-May-2010
[4990]
parse/all is really your friend - no unwanted surprises ....
florin
24-May-2010
[4991]
That's it? 'space? That is easy.
Steeve
24-May-2010
[4992]
sorry , it only exists in R3
florin
24-May-2010
[4993x2]
Yes, parse/all is great, and this is why I want to include the space 
not as a delimiter but as a character in the rule. As if, sometimes 
I want to find two strings separated by a character.
So #" " should do. I will try it. Thank you.
Steeve
24-May-2010
[4995]
but you can make your own easly ;-)
space: #" "
PeterWood
24-May-2010
[4996]
>> a: "a b"

== "a b"

>> parse/all a ["a" " " "b"]

== true
florin
24-May-2010
[4997]
My script works, but you know how it goes. Once a question creeps 
in the brain, it needs an answer. Thank you.
Pekr
24-May-2010
[4998]
I would use #" ", or defined a space rule first: spaces: charset 
" ^-" (eventually include tab)
florin
24-May-2010
[4999]
Thank you all.
Steeve
24-May-2010
[5000x2]
some space 
is even better to my mind
some space 
is even better to my mind
Terry
24-May-2010
[5002]
>> spam: "[15/May/2010 17:59:56] IP address 190.101.1.10 found in 
DNS blacklist SpamHaus SBL-XBL..."

== {[15/May/2010 17:59:56] IP address 190.101.1.10 found in DNS blacklist 
SpamHaus SBL-XBL...}
>> replace/all spam "]" "" replace/all spam "[" ""

== {15/May/2010 17:59:56 IP address 190.101.1.10 found in DNS blacklist 
SpamHaus SBL-XBL...}
>> blk: parse/all spam " "

== ["15/May/2010" "17:59:56" "IP" "address" "190.101.1.10" "found" 
"in" "DNS" "blacklist" "SpamHaus" "SBL-XBL..."]
>> date: blk/1
== "15/May/2010"
>> time: blk/2
== "17:59:56"
>> ip: blk/5
== "190.101.1.10"
Pekr
24-May-2010
[5003]
nice ....
Steeve
24-May-2010
[5004]
trim/with "[]" is better to remove things...
Terry
24-May-2010
[5005]
rebol and logs are like bread and butter
Pekr
24-May-2010
[5006]
yeah ... I used it often ... read/lines is your friend :-)
Steeve
24-May-2010
[5007]
yep nothing in the world can beat this combination
Terry
24-May-2010
[5008]
Structured data like logs, but Rebol shines even when parsing unstructured 
data
florin
24-May-2010
[5009x3]
Yep, this is my script to dig the IP's inside the log files. I was 
quiet excited to see how natural it was to write this:
digits: charset "0123456789"

ip: [some digits "." some digits "." some digits "." some digits 
]

newList: [] 
existingList: []

files: []
foreach file read %. [
	if find file "security.log" [
		split: parse read file none
		foreach it split [
			parse it [[ip] (append newList it) ] 
		]
	]
]
Then, I said, read only from the last read, and pare the date/time. 
I wanted to parse date AND time at the same time" [15/May/2010 17:59:56] 
But I hit a snag because of the space in between. I don't want date 
and time separater beause rebol can parse the string into a date-time 
easy. The space gave me trouble, and the brackets too.
Steeve
24-May-2010
[5012]
I would write th ip rule more strictly.
ip: [1 3 digits 3 [#"." 1 3 digits]]
florin
24-May-2010
[5013]
Using the charset did not work. So I did this: parse/all txt [thru 
"[" copy found to "]" ]. And said to myself, time to learn more and 
could not find resources.