World: r3wp

Join the discussions in the REBOL3 world...

[Parse] Discussion of PARSE dialect

older newer	first last
florin 24-May-2010 [4991]	That's it? 'space? That is easy.
Steeve 24-May-2010 [4992]	sorry , it only exists in R3
florin 24-May-2010 [4993x2]	Yes, parse/all is great, and this is why I want to include the space not as a delimiter but as a character in the rule. As if, sometimes I want to find two strings separated by a character.
florin 24-May-2010 [4993x2]	So #" " should do. I will try it. Thank you.
Steeve 24-May-2010 [4995]	but you can make your own easly ;-) space: #" "
PeterWood 24-May-2010 [4996]	>> a: "a b" == "a b" >> parse/all a ["a" " " "b"] == true
florin 24-May-2010 [4997]	My script works, but you know how it goes. Once a question creeps in the brain, it needs an answer. Thank you.
Pekr 24-May-2010 [4998]	I would use #" ", or defined a space rule first: spaces: charset " ^-" (eventually include tab)
florin 24-May-2010 [4999]	Thank you all.
Steeve 24-May-2010 [5000x2]	some space is even better to my mind
Steeve 24-May-2010 [5000x2]	some space is even better to my mind
Terry 24-May-2010 [5002]	>> spam: "[15/May/2010 17:59:56] IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL..." == {[15/May/2010 17:59:56] IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL...} >> replace/all spam "]" "" replace/all spam "[" "" == {15/May/2010 17:59:56 IP address 190.101.1.10 found in DNS blacklist SpamHaus SBL-XBL...} >> blk: parse/all spam " " == ["15/May/2010" "17:59:56" "IP" "address" "190.101.1.10" "found" "in" "DNS" "blacklist" "SpamHaus" "SBL-XBL..."] >> date: blk/1 == "15/May/2010" >> time: blk/2 == "17:59:56" >> ip: blk/5 == "190.101.1.10"
Pekr 24-May-2010 [5003]	nice ....
Steeve 24-May-2010 [5004]	trim/with "[]" is better to remove things...
Terry 24-May-2010 [5005]	rebol and logs are like bread and butter
Pekr 24-May-2010 [5006]	yeah ... I used it often ... read/lines is your friend :-)
Steeve 24-May-2010 [5007]	yep nothing in the world can beat this combination
Terry 24-May-2010 [5008]	Structured data like logs, but Rebol shines even when parsing unstructured data
florin 24-May-2010 [5009x3]	Yep, this is my script to dig the IP's inside the log files. I was quiet excited to see how natural it was to write this:
	digits: charset "0123456789" ip: [some digits "." some digits "." some digits "." some digits ] newList: [] existingList: [] files: [] foreach file read %. [ if find file "security.log" [ split: parse read file none foreach it split [ parse it [[ip] (append newList it) ] ] ] ]
	Then, I said, read only from the last read, and pare the date/time. I wanted to parse date AND time at the same time" [15/May/2010 17:59:56] But I hit a snag because of the space in between. I don't want date and time separater beause rebol can parse the string into a date-time easy. The space gave me trouble, and the brackets too.
Steeve 24-May-2010 [5012]	I would write th ip rule more strictly. ip: [1 3 digits 3 [#"." 1 3 digits]]
florin 24-May-2010 [5013x3]	Using the charset did not work. So I did this: parse/all txt [thru "[" copy found to "]" ]. And said to myself, time to learn more and could not find resources.
	g so helpful.
	Thanks.
Pekr 24-May-2010 [5016]	What is the problem in parsing date and time at the same time? I somehow don't understand, if the solution does what you need, or you need further help?
Anton 24-May-2010 [5017]	lines: read/lines file foreach line lines [ ... ]
florin 24-May-2010 [5018]	Pekr, I'm done. Thanks. Initially I could not figure out how to escape the space character.
Ladislav 24-May-2010 [5019]	the funny thing (that confused Pekr) about it is, that you did not realize, that there was no need to escape the space character :-)
Terry 24-May-2010 [5020]	for Graham : delimeters = delimiters
florin 25-May-2010 [5021x2]	I should've read the entire core manual before posting. Chapter 15 addresses this clearly.
florin 25-May-2010 [5021x2]	Processing text and rules seems so natural to rebol. I think I'm going to enjoy this.
Gregg 25-May-2010 [5023]	Once you get used to PARSE, it's very hard to use other languages. :-)
Fork 11-Jul-2010 [5024x2]	I've uploaded my old Whitespace interpreter that I implemented in PARSE to GitHub: http://github.com/hostilefork/whitespacers/raw/master/rebol/whitespace.r
Fork 11-Jul-2010 [5024x2]	What's interesting about it is the comparison that can be made to implementations in other languages. I've collected implementations of whitespace interpreters in several other languages (by other people) in that project: http://github.com/hostilefork/whitespacers/
Anton 30-Jul-2010 [5026]	Ok, continuing the discussion from "Performance" group, I'd like to ask for some help with parsing rebol format files. Basically, I'd like to be able to extract a block near the beginning or end of a file, while minimizing disk access. The files to be parsed could be large, so I don't want to load the entire contents, but chunks at a time. So my parse rule should be able to detect when the input has been exhausted and ask for another chunk. (When extracting a block near the end of a file, I'll have to parse in reverse, but I'll try to implement that later.)
Oldes 30-Jul-2010 [5027]	And why you don't want to use the load/next which was advised?
Anton 30-Jul-2010 [5028x4]	Using LOAD/NEXT, I still have to use a O(n^2) algorithm. I'd now like to do my own parse, which can be O(n).
	As far as I know, there is no way to instruct LOAD/NEXT to only read chunks from the file as necessary to load the next value.
	Which is why, in that algorithm, I had to iteratively: load a chunk, append it and try LOAD/NEXT until it succeeded. Which gives the algorithm O(n^2) performance.
	My question for this O(n) parse algorithm is: What rebol syntax do I need identify? I suppose all I need is: - Comments (lines beginning with semi-colon ; ) - Strings (single-line "" and multi-line {} ) watching out for escape sequences eg. {^}} - Blocks
Oldes 30-Jul-2010 [5032]	yes
Anton 30-Jul-2010 [5033]	Have I missed anything ?
Oldes 30-Jul-2010 [5034x2]	just that comment may not be on line beginning so far
Oldes 30-Jul-2010 [5034x2]	and you want to get just the first block? Skipping any content before it?
Anton 30-Jul-2010 [5036x2]	Ok, yes, it may be preceded by whitespace. That seems easy to deal with.
Anton 30-Jul-2010 [5036x2]	Well, I'd like the algorithm to be general enough to get any number of block in the file. So far I need just the first, second and last blocks.
Oldes 30-Jul-2010 [5038]	And you need to parse the Altme's *.set files or something else as well?
Anton 30-Jul-2010 [5039x2]	The *.set files are what I need it for, yes.
Anton 30-Jul-2010 [5039x2]	I imagine it could be useful in other similar situations, so I'd like it to be pretty general. I suppose a bonus functionality is to be able to get nested blocks. (And a super bonus will be to get any datatype at any level, but I won't bother doing that until I need it.)
older newer	first last