World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Terry 24-May-2010 [4969] | You've come to the right place. |
florin 24-May-2010 [4970x2] | I've created my very first script. The script loops through a list of email (Kerio) log files, extracts the IP addresses, compiles them in a list and adds them to a (Peerblock) list in order to limit incoming spam. I find rebol perfect for this. |
So an entry in the log file starts like this: "[15/May/2010 17:59:56] IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL..." | |
Terry 24-May-2010 [4972] | aye |
florin 24-May-2010 [4973x3] | Improve the script by reading only the latest entries in the log, and I pare the date like this: parse/all txt [thru "[" copy found to "]" ] |
So I get the job done. This is the question: If I do parse/all so that spaces are not automatically included, how to I include the space in my parse rule? | |
A rule can be: "=," etc. How do I "escape" the space character so that I can include in my rule? | |
Terry 24-May-2010 [4976x2] | I've always used the spaces as delimeters |
parse k [thru "[" copy date to " "] | |
NickA 24-May-2010 [4978] | I've used parse/all, and then used 'trim on the results. |
florin 24-May-2010 [4979] | Yes, that is exactly what I did and it works. However, for the sake of learning, how do I use the the space character as part of my rule? |
Steeve 24-May-2010 [4980] | don't see your point, show us the annoying rule... |
florin 24-May-2010 [4981x5] | Ok, I will. The point is that I want to include the space in my rule. Here's the example: |
digits: charset "0123456789" ip: [some digits "." some digits "." some digits "." some digits ] | |
This finds the IP in the log entry. What if I have two ip addresses and I want to pick them at the same time: ip: [some digits "." some digits "." some digits "." some digits __space__ some digits ...etc] | |
And the IP addresses are separatered by a space? | |
My question really is, how do I escape the space character as one would in regular expressions? | |
Steeve 24-May-2010 [4986] | you need parse/all |
florin 24-May-2010 [4987] | correct, and then, how do you place the space in the rule: {} ? |
Steeve 24-May-2010 [4988x2] | #" " or " " or { } |
'space works too | |
Pekr 24-May-2010 [4990] | parse/all is really your friend - no unwanted surprises .... |
florin 24-May-2010 [4991] | That's it? 'space? That is easy. |
Steeve 24-May-2010 [4992] | sorry , it only exists in R3 |
florin 24-May-2010 [4993x2] | Yes, parse/all is great, and this is why I want to include the space not as a delimiter but as a character in the rule. As if, sometimes I want to find two strings separated by a character. |
So #" " should do. I will try it. Thank you. | |
Steeve 24-May-2010 [4995] | but you can make your own easly ;-) space: #" " |
PeterWood 24-May-2010 [4996] | >> a: "a b" == "a b" >> parse/all a ["a" " " "b"] == true |
florin 24-May-2010 [4997] | My script works, but you know how it goes. Once a question creeps in the brain, it needs an answer. Thank you. |
Pekr 24-May-2010 [4998] | I would use #" ", or defined a space rule first: spaces: charset " ^-" (eventually include tab) |
florin 24-May-2010 [4999] | Thank you all. |
Steeve 24-May-2010 [5000x2] | some space is even better to my mind |
some space is even better to my mind | |
Terry 24-May-2010 [5002] | >> spam: "[15/May/2010 17:59:56] IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL..." == {[15/May/2010 17:59:56] IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL...} >> replace/all spam "]" "" replace/all spam "[" "" == {15/May/2010 17:59:56 IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL...} >> blk: parse/all spam " " == ["15/May/2010" "17:59:56" "IP" "address" "190.101.1.10" "found" "in" "DNS" "blacklist" "SpamHaus" "SBL-XBL..."] >> date: blk/1 == "15/May/2010" >> time: blk/2 == "17:59:56" >> ip: blk/5 == "190.101.1.10" |
Pekr 24-May-2010 [5003] | nice .... |
Steeve 24-May-2010 [5004] | trim/with "[]" is better to remove things... |
Terry 24-May-2010 [5005] | rebol and logs are like bread and butter |
Pekr 24-May-2010 [5006] | yeah ... I used it often ... read/lines is your friend :-) |
Steeve 24-May-2010 [5007] | yep nothing in the world can beat this combination |
Terry 24-May-2010 [5008] | Structured data like logs, but Rebol shines even when parsing unstructured data |
florin 24-May-2010 [5009x3] | Yep, this is my script to dig the IP's inside the log files. I was quiet excited to see how natural it was to write this: |
digits: charset "0123456789" ip: [some digits "." some digits "." some digits "." some digits ] newList: [] existingList: [] files: [] foreach file read %. [ if find file "security.log" [ split: parse read file none foreach it split [ parse it [[ip] (append newList it) ] ] ] ] | |
Then, I said, read only from the last read, and pare the date/time. I wanted to parse date AND time at the same time" [15/May/2010 17:59:56] But I hit a snag because of the space in between. I don't want date and time separater beause rebol can parse the string into a date-time easy. The space gave me trouble, and the brackets too. | |
Steeve 24-May-2010 [5012] | I would write th ip rule more strictly. ip: [1 3 digits 3 [#"." 1 3 digits]] |
florin 24-May-2010 [5013x3] | Using the charset did not work. So I did this: parse/all txt [thru "[" copy found to "]" ]. And said to myself, time to learn more and could not find resources. |
g so helpful. | |
Thanks. | |
Pekr 24-May-2010 [5016] | What is the problem in parsing date and time at the same time? I somehow don't understand, if the solution does what you need, or you need further help? |
Anton 24-May-2010 [5017] | lines: read/lines file foreach line lines [ ... ] |
florin 24-May-2010 [5018] | Pekr, I'm done. Thanks. Initially I could not figure out how to escape the space character. |
older newer | first last |