Parsing out strings

[1/7] from: ml:sproingle at: 29-Apr-2004 17:55

I am pretty new to REBOL and am having a bit of trouble getting my head around how to parse out strings from lines I am reading in from session logs. I am looking at "clear find" and "remove/part" and things like this but I am getting very confused as to what to use when, I am more of a left$, mid$, right$ kind of guy. Perhaps I can illustrate. Suppose the line I am parsing says: the user jsmith logged in at 4.30pm Can you give me a clue what I would use to parse in each of these circumstances where the desired text is: 1. "the user" 2. "the user jsmith" 3. "jsmith" 4. "jsmith logged in at 4.30pm" 5. "logged in at 4.30pm" Obviously the jsmith part would change for each line so the parsing out would need to probably base it on looking for "the user " and then looking for the next space after that which should be after the name. Anyone give me a clue? Thanks Stuart

[2/7] from: Gary:Jones:usap:gov at: 30-Apr-2004 11:13

From: ML ....

> Suppose the line I am parsing says: > "the user jsmith logged in at 4.30pm"

<<quoted lines omitted: 6>>

> 4. "jsmith logged in at 4.30pm" > 5. "logged in at 4.30pm"

Hi, Stuart, You will get a hundred ways to do this. In order to illustrate the more general principles, I offer the following ways (watch for line wrap). The first is a bit "brute force": data: "the user jsmith logged in at 4.30pm" parse-bits: parse data none probe rejoin [parse-bits/1 " " parse-bits/2] probe rejoin [parse-bits/1 " " parse-bits/2 " " parse-bits/3] probe parse-bits/3 probe rejoin [parse-bits/3 " " parse-bits/4 " " parse-bits/5 " " parse-bits/6 " " parse-bits/7] probe rejoin [parse-bits/4 " " parse-bits/5 " " parse-bits/6 " " parse-bits/7] This first method relies on the fact of the number of words of each record in the log remain the same. The second method is more rule based, and it copies sections based on the rules: data: "the user jsmith logged in at 4.30pm" parse/all data [ copy p1 thru "the user" copy p2 to "logged" copy p3 to end ] trim p2 print [p1 p2 p3] and finally a variation that demonstrates a slightly different way with more parse commands: data: "the user jsmith logged in at 4.30pm" parse/all data [ copy p1 thru "the user" skip copy p2 to " " skip copy p3 to end ] print [p1 p2 p3] Hope that helps get you started on the REBOL ways to parse. --Scott Jones

[3/7] from: maximo:meteorstudios at: 29-Apr-2004 19:33

by using left right mid mentality... use COPY and AT so here we go: with str: "the user jsmith logged in at 4.30pm"

> 1. "the user"

copy/part str 8

> 2. "the user jsmith"

copy/part str 15

> 3. "jsmith"

copy/part at str 10 6

> 4. "jsmith logged in at 4.30pm"

copy at str 10

> 5. "logged in at 4.30pm"

copy at str 17 now I'd say that there is a good chance that there is a better "SOLUTION" to the problem, given a better problem description. if the ordering of the log file is extremely constant or if some words always describe the following data, then I'd rather do: blk: parse/all str " "

>> user: third blk

== "jsmith"

>> priviledge: select blk "the"

== "user"

>> user: select blk priviledge

== "jsmith"

>> time: to-time replace select blk "at" "." ":"

== 16:30

>> action: pick intersect blk ["print" "mailed" "logged" "browsed"] 1

== "logged" I used pick, in case no matching action was found, in wich case the word 'first would crash, whereas 'pick returns none instead.

>> sub-action: select blk first action

== "in" here again, if the select fails, then select returns none. now if we replace the string by: the sysadmin root mailed fred at 9.00am then all the rules still hold, yet the copy/all code earlier, becomes completely useless... you see in rebol, the problem is not in solving but in analysis. analysing the data patterns. By doing thid properly (and knowing a few rebol words), rebol's expressive power can be fully harnessed. extending the above could be that the user name is optional, like so: str: "the sysadmin mailed fred at 9.00am" The following expression would return a default user if none was given: users: ["root" "mary" "john"] blk: parse/all str " " priviledge: select blk "the" user: either (offset: find users select blk priviledge) [ first offset ][ select ["user" "guest" "sysadmin" "root"] priviledge ] have fun :-) HTH!!!! -MAx --- You can either be part of the problem or part of the solution, but in the end, being part of the problem is much more fun.

[4/7] from: greggirwin:mindspring at: 29-Apr-2004 21:42

Hi Stuart, M> I am looking at "clear find" and "remove/part" and things like this M> but I am getting very confused as to what to use when, I am more of M> a left$, mid$, right$ kind of guy. I specialized in VB for a loooong time, so I know where you're coming from. Let's start with some simple functions: left: func [s len][copy/part s len] right: func [s len][copy skip tail s negate len] mid: func [s start len][copy/part at s start len] set-mid: func [ series [series!] new-data start [integer!] /part len [integer!] ][ len: any [len length? new-data] head change/part at series start new-data len ] I thought I would use these a lot (which is why I wrote them when I first found REBOL), but I never have. :\ These should all be obvious, though set-mid with /part is something you'll want to play with to make sure you understand it. e.g.

>> s: "the user jsmith logged in at 4.30pm"

== "the user jsmith logged in at 4.30pm"

>> set-mid s "A" 10

== "the user Asmith logged in at 4.30pm" - versus -

>> s: "the user jsmith logged in at 4.30pm"

== "the user jsmith logged in at 4.30pm"

>> set-mid/part s "A" 10 6

== "the user A logged in at 4.30pm" Now, FIND is something else you'll want to play with a bit (the console is great for this), because there are so many refinements. In your case, you might also confuse it with how InStr works in VB, so here's a replacement for that as well (just wrote it here, trying to emulate InStr; test it well): instr: func [ str-1 "Search this" str-2 "look for this" /start pos [integer!] "start looking here" /case "be case sensitive" /local loc ][ pos: any [pos 1] if any [none? str-1 none? str-2] [return none] loc: either case [ find/case at str-1 pos str-2 ][ find at str-1 pos str-2 ] either loc [index? loc][0] ] Now that you have those, you probably won't need them as I see you've gotten a couple answers already. :) PARSE is really great once you get used to it. Happy REBOLing! -- Gregg

[5/7] from: ml::sproingle::com at: 30-Apr-2004 8:58

Thanks for that Scott, I hadn't read up on parse, but I will do now. Stuart

[6/7] from: ml:sproingle at: 30-Apr-2004 18:20

Thanks for the tutorial Max. Stuart

[7/7] from: maximo:meteorstudios at: 30-Apr-2004 19:01

in a related topic, I'd like to restate/comment something said this week.. in rebol, there are usually very different ways to achieve the same goals. not just in the functions to use, but in the raw in-code ideology, of how to plug the outputs of all of the fancy series handlers. Some will optimise for code readability and sharability, others will concentrate on raw speed, and yet other solutions will tend to concentrate on code size and "denseness" this is due because in rebol, everything (well almost) is an expression and reuses the SAME basic expressions (coding) rules. So that's why I meant that knowing what you want to do, withing what parameters, actually defines what is the best course of action. now go up and learn all of those words in the dictionnary also, ALWAYS remember to check out the refinements. hey, even I revisit it now and then to refresh my memory. have fun... -MAx --- You can either be part of the problem or part of the solution, but in the end, being part of the problem is much more fun.

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted