r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

florin
24-May-2010
[4971]
So an entry in the log file starts like this: "[15/May/2010 17:59:56] 
IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL..."
Terry
24-May-2010
[4972]
aye
florin
24-May-2010
[4973x3]
Improve the script by reading only the latest entries in the log, 
and I pare the date like this: parse/all txt [thru "[" copy found 
to "]" ]
So I get the job done. This is the question: If I do parse/all so 
that spaces are not automatically included, how to I include the 
space in my parse rule?
A rule can be: "=," etc. How do I "escape" the space character so 
that I can include in my rule?
Terry
24-May-2010
[4976x2]
I've always used the spaces as delimeters
parse k [thru "[" copy date to " "]
NickA
24-May-2010
[4978]
I've used parse/all, and then used 'trim on the results.
florin
24-May-2010
[4979]
Yes, that is exactly what I did and it works. However, for the sake 
of learning, how do I use the the space character as part of my rule?
Steeve
24-May-2010
[4980]
don't see your point, show us the annoying rule...
florin
24-May-2010
[4981x5]
Ok, I will. The point is that I want to include the space in my rule. 
Here's the example:
digits: charset "0123456789"

ip: [some digits "." some digits "." some digits "." some digits 
]
This finds the IP in the log entry. What if I have two ip addresses 
and I want to pick them at the same time: ip: [some digits "." some 
digits "." some digits "." some digits __space__ some digits ...etc]
And the IP addresses are separatered by a space?
My question really is, how do I escape the space character as one 
would in regular expressions?
Steeve
24-May-2010
[4986]
you need parse/all
florin
24-May-2010
[4987]
correct, and then, how do you place the space in the rule: {} ?
Steeve
24-May-2010
[4988x2]
#" " or " " or { }
'space works too
Pekr
24-May-2010
[4990]
parse/all is really your friend - no unwanted surprises ....
florin
24-May-2010
[4991]
That's it? 'space? That is easy.
Steeve
24-May-2010
[4992]
sorry , it only exists in R3
florin
24-May-2010
[4993x2]
Yes, parse/all is great, and this is why I want to include the space 
not as a delimiter but as a character in the rule. As if, sometimes 
I want to find two strings separated by a character.
So #" " should do. I will try it. Thank you.
Steeve
24-May-2010
[4995]
but you can make your own easly ;-)
space: #" "
PeterWood
24-May-2010
[4996]
>> a: "a b"

== "a b"

>> parse/all a ["a" " " "b"]

== true
florin
24-May-2010
[4997]
My script works, but you know how it goes. Once a question creeps 
in the brain, it needs an answer. Thank you.
Pekr
24-May-2010
[4998]
I would use #" ", or defined a space rule first: spaces: charset 
" ^-" (eventually include tab)
florin
24-May-2010
[4999]
Thank you all.
Steeve
24-May-2010
[5000x2]
some space 
is even better to my mind
some space 
is even better to my mind
Terry
24-May-2010
[5002]
>> spam: "[15/May/2010 17:59:56] IP address 190.101.1.10 found in 
DNS blacklist SpamHaus SBL-XBL..."

== {[15/May/2010 17:59:56] IP address 190.101.1.10 found in DNS blacklist 
SpamHaus SBL-XBL...}
>> replace/all spam "]" "" replace/all spam "[" ""

== {15/May/2010 17:59:56 IP address 190.101.1.10 found in DNS blacklist 
SpamHaus SBL-XBL...}
>> blk: parse/all spam " "

== ["15/May/2010" "17:59:56" "IP" "address" "190.101.1.10" "found" 
"in" "DNS" "blacklist" "SpamHaus" "SBL-XBL..."]
>> date: blk/1
== "15/May/2010"
>> time: blk/2
== "17:59:56"
>> ip: blk/5
== "190.101.1.10"
Pekr
24-May-2010
[5003]
nice ....
Steeve
24-May-2010
[5004]
trim/with "[]" is better to remove things...
Terry
24-May-2010
[5005]
rebol and logs are like bread and butter
Pekr
24-May-2010
[5006]
yeah ... I used it often ... read/lines is your friend :-)
Steeve
24-May-2010
[5007]
yep nothing in the world can beat this combination
Terry
24-May-2010
[5008]
Structured data like logs, but Rebol shines even when parsing unstructured 
data
florin
24-May-2010
[5009x3]
Yep, this is my script to dig the IP's inside the log files. I was 
quiet excited to see how natural it was to write this:
digits: charset "0123456789"

ip: [some digits "." some digits "." some digits "." some digits 
]

newList: [] 
existingList: []

files: []
foreach file read %. [
	if find file "security.log" [
		split: parse read file none
		foreach it split [
			parse it [[ip] (append newList it) ] 
		]
	]
]
Then, I said, read only from the last read, and pare the date/time. 
I wanted to parse date AND time at the same time" [15/May/2010 17:59:56] 
But I hit a snag because of the space in between. I don't want date 
and time separater beause rebol can parse the string into a date-time 
easy. The space gave me trouble, and the brackets too.
Steeve
24-May-2010
[5012]
I would write th ip rule more strictly.
ip: [1 3 digits 3 [#"." 1 3 digits]]
florin
24-May-2010
[5013x3]
Using the charset did not work. So I did this: parse/all txt [thru 
"[" copy found to "]" ]. And said to myself, time to learn more and 
could not find resources.
g so helpful.
Thanks.
Pekr
24-May-2010
[5016]
What is the problem in parsing date and time at the same time? I 
somehow don't understand, if the solution does what you need, or 
you need further help?
Anton
24-May-2010
[5017]
lines: read/lines file
foreach line lines [ ... ]
florin
24-May-2010
[5018]
Pekr, I'm done. Thanks. Initially I could not figure out how to escape 
the space character.
Ladislav
24-May-2010
[5019]
the funny thing (that confused Pekr) about it is, that you did not 
realize, that there was no need to escape the space character :-)
Terry
24-May-2010
[5020]
for Graham : delimeters = delimiters