World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
Terry 24-May-2010 [4969]	You've come to the right place.
florin 24-May-2010 [4970x2]	I've created my very first script. The script loops through a list of email (Kerio) log files, extracts the IP addresses, compiles them in a list and adds them to a (Peerblock) list in order to limit incoming spam. I find rebol perfect for this.
florin 24-May-2010 [4970x2]	So an entry in the log file starts like this: "[15/May/2010 17:59:56] IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL..."
Terry 24-May-2010 [4972]	aye
florin 24-May-2010 [4973x3]	Improve the script by reading only the latest entries in the log, and I pare the date like this: parse/all txt [thru "[" copy found to "]" ]
	So I get the job done. This is the question: If I do parse/all so that spaces are not automatically included, how to I include the space in my parse rule?
	A rule can be: "=," etc. How do I "escape" the space character so that I can include in my rule?
Terry 24-May-2010 [4976x2]	I've always used the spaces as delimeters
Terry 24-May-2010 [4976x2]	parse k [thru "[" copy date to " "]
NickA 24-May-2010 [4978]	I've used parse/all, and then used 'trim on the results.
florin 24-May-2010 [4979]	Yes, that is exactly what I did and it works. However, for the sake of learning, how do I use the the space character as part of my rule?
Steeve 24-May-2010 [4980]	don't see your point, show us the annoying rule...
florin 24-May-2010 [4981x5]	Ok, I will. The point is that I want to include the space in my rule. Here's the example:
	digits: charset "0123456789" ip: [some digits "." some digits "." some digits "." some digits ]
	This finds the IP in the log entry. What if I have two ip addresses and I want to pick them at the same time: ip: [some digits "." some digits "." some digits "." some digits __space__ some digits ...etc]
	And the IP addresses are separatered by a space?
	My question really is, how do I escape the space character as one would in regular expressions?
Steeve 24-May-2010 [4986]	you need parse/all
florin 24-May-2010 [4987]	correct, and then, how do you place the space in the rule: {} ?
Steeve 24-May-2010 [4988x2]	#" " or " " or { }
Steeve 24-May-2010 [4988x2]	'space works too
Pekr 24-May-2010 [4990]	parse/all is really your friend - no unwanted surprises ....
florin 24-May-2010 [4991]	That's it? 'space? That is easy.
Steeve 24-May-2010 [4992]	sorry , it only exists in R3
florin 24-May-2010 [4993x2]	Yes, parse/all is great, and this is why I want to include the space not as a delimiter but as a character in the rule. As if, sometimes I want to find two strings separated by a character.
florin 24-May-2010 [4993x2]	So #" " should do. I will try it. Thank you.
Steeve 24-May-2010 [4995]	but you can make your own easly ;-) space: #" "
PeterWood 24-May-2010 [4996]	>> a: "a b" == "a b" >> parse/all a ["a" " " "b"] == true
florin 24-May-2010 [4997]	My script works, but you know how it goes. Once a question creeps in the brain, it needs an answer. Thank you.
Pekr 24-May-2010 [4998]	I would use #" ", or defined a space rule first: spaces: charset " ^-" (eventually include tab)
florin 24-May-2010 [4999]	Thank you all.
Steeve 24-May-2010 [5000x2]	some space is even better to my mind
Steeve 24-May-2010 [5000x2]	some space is even better to my mind
Terry 24-May-2010 [5002]	>> spam: "[15/May/2010 17:59:56] IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL..." == {[15/May/2010 17:59:56] IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL...} >> replace/all spam "]" "" replace/all spam "[" "" == {15/May/2010 17:59:56 IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL...} >> blk: parse/all spam " " == ["15/May/2010" "17:59:56" "IP" "address" "190.101.1.10" "found" "in" "DNS" "blacklist" "SpamHaus" "SBL-XBL..."] >> date: blk/1 == "15/May/2010" >> time: blk/2 == "17:59:56" >> ip: blk/5 == "190.101.1.10"
Pekr 24-May-2010 [5003]	nice ....
Steeve 24-May-2010 [5004]	trim/with "[]" is better to remove things...
Terry 24-May-2010 [5005]	rebol and logs are like bread and butter
Pekr 24-May-2010 [5006]	yeah ... I used it often ... read/lines is your friend :-)
Steeve 24-May-2010 [5007]	yep nothing in the world can beat this combination
Terry 24-May-2010 [5008]	Structured data like logs, but Rebol shines even when parsing unstructured data
florin 24-May-2010 [5009x3]	Yep, this is my script to dig the IP's inside the log files. I was quiet excited to see how natural it was to write this:
	digits: charset "0123456789" ip: [some digits "." some digits "." some digits "." some digits ] newList: [] existingList: [] files: [] foreach file read %. [ if find file "security.log" [ split: parse read file none foreach it split [ parse it [[ip] (append newList it) ] ] ] ]
	Then, I said, read only from the last read, and pare the date/time. I wanted to parse date AND time at the same time" [15/May/2010 17:59:56] But I hit a snag because of the space in between. I don't want date and time separater beause rebol can parse the string into a date-time easy. The space gave me trouble, and the brackets too.
Steeve 24-May-2010 [5012]	I would write th ip rule more strictly. ip: [1 3 digits 3 [#"." 1 3 digits]]
florin 24-May-2010 [5013x3]	Using the charset did not work. So I did this: parse/all txt [thru "[" copy found to "]" ]. And said to myself, time to learn more and could not find resources.
	g so helpful.
	Thanks.
Pekr 24-May-2010 [5016]	What is the problem in parsing date and time at the same time? I somehow don't understand, if the solution does what you need, or you need further help?
Anton 24-May-2010 [5017]	lines: read/lines file foreach line lines [ ... ]
florin 24-May-2010 [5018]	Pekr, I'm done. Thanks. Initially I could not figure out how to escape the space character.
older newer	first last