World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Ladislav 19-Apr-2010 [4961] | (I just tested, and your example is much slower than the code allocating and GC-ing the new string) |
Steeve 19-Apr-2010 [4962] | Yeah it's true, it's slower. But if your app contains many loops with many copy/part at different locations, the memory may grow insanly before the recycle. I saw that in many graphic apps with Rebol. |
Ladislav 19-Apr-2010 [4963] | I saw that in many graphic app with Rebol - are you sure it was "before the recycle"? |
BrianH 19-Apr-2010 [4964x2] | Sometimes you don't want to put too much pressure on the GC, and sometimes you don't want to increase the total size of the pool too much, because that pool doesn't always get returned to the OS very quickly or at all. This is the motivation for additions like the /into option. |
We'll see how much optimizations like that need to be undone once we have to adjust for task safety :( | |
Maxim 19-Apr-2010 [4966x2] | the GC doesn't return the pool... only image data is ever returned AFAIK. |
and the GC doesn't kick in too quick or it would be really slow (just try recycle/torture to see ;-) so when you're doing serious work it REALLY grows... although it stabilizes for example although stats often show 10MB... my OS tells me that its actually using 24 MB. that will never shrink back down. | |
florin 24-May-2010 [4968] | Is there a place for the newbie questions on parsing? |
Terry 24-May-2010 [4969] | You've come to the right place. |
florin 24-May-2010 [4970x2] | I've created my very first script. The script loops through a list of email (Kerio) log files, extracts the IP addresses, compiles them in a list and adds them to a (Peerblock) list in order to limit incoming spam. I find rebol perfect for this. |
So an entry in the log file starts like this: "[15/May/2010 17:59:56] IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL..." | |
Terry 24-May-2010 [4972] | aye |
florin 24-May-2010 [4973x3] | Improve the script by reading only the latest entries in the log, and I pare the date like this: parse/all txt [thru "[" copy found to "]" ] |
So I get the job done. This is the question: If I do parse/all so that spaces are not automatically included, how to I include the space in my parse rule? | |
A rule can be: "=," etc. How do I "escape" the space character so that I can include in my rule? | |
Terry 24-May-2010 [4976x2] | I've always used the spaces as delimeters |
parse k [thru "[" copy date to " "] | |
NickA 24-May-2010 [4978] | I've used parse/all, and then used 'trim on the results. |
florin 24-May-2010 [4979] | Yes, that is exactly what I did and it works. However, for the sake of learning, how do I use the the space character as part of my rule? |
Steeve 24-May-2010 [4980] | don't see your point, show us the annoying rule... |
florin 24-May-2010 [4981x5] | Ok, I will. The point is that I want to include the space in my rule. Here's the example: |
digits: charset "0123456789" ip: [some digits "." some digits "." some digits "." some digits ] | |
This finds the IP in the log entry. What if I have two ip addresses and I want to pick them at the same time: ip: [some digits "." some digits "." some digits "." some digits __space__ some digits ...etc] | |
And the IP addresses are separatered by a space? | |
My question really is, how do I escape the space character as one would in regular expressions? | |
Steeve 24-May-2010 [4986] | you need parse/all |
florin 24-May-2010 [4987] | correct, and then, how do you place the space in the rule: {} ? |
Steeve 24-May-2010 [4988x2] | #" " or " " or { } |
'space works too | |
Pekr 24-May-2010 [4990] | parse/all is really your friend - no unwanted surprises .... |
florin 24-May-2010 [4991] | That's it? 'space? That is easy. |
Steeve 24-May-2010 [4992] | sorry , it only exists in R3 |
florin 24-May-2010 [4993x2] | Yes, parse/all is great, and this is why I want to include the space not as a delimiter but as a character in the rule. As if, sometimes I want to find two strings separated by a character. |
So #" " should do. I will try it. Thank you. | |
Steeve 24-May-2010 [4995] | but you can make your own easly ;-) space: #" " |
PeterWood 24-May-2010 [4996] | >> a: "a b" == "a b" >> parse/all a ["a" " " "b"] == true |
florin 24-May-2010 [4997] | My script works, but you know how it goes. Once a question creeps in the brain, it needs an answer. Thank you. |
Pekr 24-May-2010 [4998] | I would use #" ", or defined a space rule first: spaces: charset " ^-" (eventually include tab) |
florin 24-May-2010 [4999] | Thank you all. |
Steeve 24-May-2010 [5000x2] | some space is even better to my mind |
some space is even better to my mind | |
Terry 24-May-2010 [5002] | >> spam: "[15/May/2010 17:59:56] IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL..." == {[15/May/2010 17:59:56] IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL...} >> replace/all spam "]" "" replace/all spam "[" "" == {15/May/2010 17:59:56 IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL...} >> blk: parse/all spam " " == ["15/May/2010" "17:59:56" "IP" "address" "190.101.1.10" "found" "in" "DNS" "blacklist" "SpamHaus" "SBL-XBL..."] >> date: blk/1 == "15/May/2010" >> time: blk/2 == "17:59:56" >> ip: blk/5 == "190.101.1.10" |
Pekr 24-May-2010 [5003] | nice .... |
Steeve 24-May-2010 [5004] | trim/with "[]" is better to remove things... |
Terry 24-May-2010 [5005] | rebol and logs are like bread and butter |
Pekr 24-May-2010 [5006] | yeah ... I used it often ... read/lines is your friend :-) |
Steeve 24-May-2010 [5007] | yep nothing in the world can beat this combination |
Terry 24-May-2010 [5008] | Structured data like logs, but Rebol shines even when parsing unstructured data |
florin 24-May-2010 [5009x2] | Yep, this is my script to dig the IP's inside the log files. I was quiet excited to see how natural it was to write this: |
digits: charset "0123456789" ip: [some digits "." some digits "." some digits "." some digits ] newList: [] existingList: [] files: [] foreach file read %. [ if find file "security.log" [ split: parse read file none foreach it split [ parse it [[ip] (append newList it) ] ] ] ] | |
older newer | first last |