Parsing out strings
[1/7] from: ml:sproingle at: 29-Apr-2004 17:55
I am pretty new to REBOL and am having a bit of trouble getting my head around how to
parse out strings from lines I am reading in from session logs.
I am looking at "clear find" and "remove/part" and things like this but I am getting
very confused as to what to use when, I am more of a left$, mid$, right$ kind of guy.
Perhaps I can illustrate.
Suppose the line I am parsing says:
the user jsmith logged in at 4.30pm
Can you give me a clue what I would use to parse in each of these circumstances where
the desired text is:
1. "the user"
2. "the user jsmith"
3. "jsmith"
4. "jsmith logged in at 4.30pm"
5. "logged in at 4.30pm"
Obviously the jsmith part would change for each line so the parsing out would need to
probably base it on looking for "the user " and then looking for the next space after
that which should be after the name.
Anyone give me a clue?
Thanks
Stuart
[2/7] from: Gary:Jones:usap:gov at: 30-Apr-2004 11:13
From: ML
....
> Suppose the line I am parsing says:
> "the user jsmith logged in at 4.30pm"
<<quoted lines omitted: 6>>
> 4. "jsmith logged in at 4.30pm"
> 5. "logged in at 4.30pm"
Hi, Stuart,
You will get a hundred ways to do this. In order to illustrate the more general principles,
I offer the following ways (watch for line wrap). The first is a bit "brute force":
data: "the user jsmith logged in at 4.30pm"
parse-bits: parse data none
probe rejoin [parse-bits/1 " " parse-bits/2]
probe rejoin [parse-bits/1 " " parse-bits/2 " " parse-bits/3]
probe parse-bits/3
probe rejoin [parse-bits/3 " " parse-bits/4 " " parse-bits/5 " " parse-bits/6 " " parse-bits/7]
probe rejoin [parse-bits/4 " " parse-bits/5 " " parse-bits/6 " " parse-bits/7]
This first method relies on the fact of the number of words of each record in the log
remain the same. The second method is more rule based, and it copies sections based
on the rules:
data: "the user jsmith logged in at 4.30pm"
parse/all data [
copy p1
thru "the user"
copy p2
to "logged"
copy p3
to end
]
trim p2
print [p1 p2 p3]
and finally a variation that demonstrates a slightly different way with more parse commands:
data: "the user jsmith logged in at 4.30pm"
parse/all data [
copy p1
thru "the user"
skip
copy p2
to " "
skip
copy p3
to end
]
print [p1 p2 p3]
Hope that helps get you started on the REBOL ways to parse.
--Scott Jones
[3/7] from: maximo:meteorstudios at: 29-Apr-2004 19:33
by using left right mid mentality...
use COPY and AT
so here we go:
with
str: "the user jsmith logged in at 4.30pm"
> 1. "the user"
copy/part str 8
> 2. "the user jsmith"
copy/part str 15
> 3. "jsmith"
copy/part at str 10 6
> 4. "jsmith logged in at 4.30pm"
copy at str 10
> 5. "logged in at 4.30pm"
copy at str 17
now I'd say that there is a good chance that there is a better "SOLUTION" to the problem,
given a better problem description.
if the ordering of the log file is extremely constant or if some words always describe
the following data, then I'd rather do:
blk: parse/all str " "
>> user: third blk
== "jsmith"
>> priviledge: select blk "the"
== "user"
>> user: select blk priviledge
== "jsmith"
>> time: to-time replace select blk "at" "." ":"
== 16:30
>> action: pick intersect blk ["print" "mailed" "logged" "browsed"] 1
== "logged"
I used pick, in case no matching action was found, in wich case the word 'first would
crash, whereas 'pick returns none instead.
>> sub-action: select blk first action
== "in"
here again, if the select fails, then select returns none.
now if we replace the string by:
the sysadmin root mailed fred at 9.00am
then all the rules still hold, yet the copy/all code earlier, becomes completely useless...
you see in rebol, the problem is not in solving but in analysis. analysing the data
patterns. By doing thid properly (and knowing a few rebol words), rebol's expressive
power can be fully harnessed.
extending the above could be that the user name is optional, like so:
str: "the sysadmin mailed fred at 9.00am"
The following expression would return a default user if none was given:
users: ["root" "mary" "john"]
blk: parse/all str " "
priviledge: select blk "the"
user: either (offset: find users select blk priviledge) [
first offset
][
select ["user" "guest" "sysadmin" "root"] priviledge
]
have fun :-)
HTH!!!!
-MAx
---
You can either be part of the problem or part of the solution, but in the end, being
part of the problem is much more fun.
[4/7] from: greggirwin:mindspring at: 29-Apr-2004 21:42
Hi Stuart,
M> I am looking at "clear find" and "remove/part" and things like this
M> but I am getting very confused as to what to use when, I am more of
M> a left$, mid$, right$ kind of guy.
I specialized in VB for a loooong time, so I know where you're coming
from. Let's start with some simple functions:
left: func [s len][copy/part s len]
right: func [s len][copy skip tail s negate len]
mid: func [s start len][copy/part at s start len]
set-mid: func [
series [series!]
new-data
start [integer!]
/part len [integer!]
][
len: any [len length? new-data]
head change/part at series start new-data len
]
I thought I would use these a lot (which is why I wrote them when I
first found REBOL), but I never have. :\
These should all be obvious, though set-mid with /part is something
you'll want to play with to make sure you understand it. e.g.
>> s: "the user jsmith logged in at 4.30pm"
== "the user jsmith logged in at 4.30pm"
>> set-mid s "A" 10
== "the user Asmith logged in at 4.30pm"
- versus -
>> s: "the user jsmith logged in at 4.30pm"
== "the user jsmith logged in at 4.30pm"
>> set-mid/part s "A" 10 6
== "the user A logged in at 4.30pm"
Now, FIND is something else you'll want to play with a bit (the
console is great for this), because there are so many refinements. In
your case, you might also confuse it with how InStr works in VB, so
here's a replacement for that as well (just wrote it here, trying to
emulate InStr; test it well):
instr: func [
str-1 "Search this"
str-2 "look for this"
/start pos [integer!] "start looking here"
/case "be case sensitive"
/local loc
][
pos: any [pos 1]
if any [none? str-1 none? str-2] [return none]
loc: either case [
find/case at str-1 pos str-2
][
find at str-1 pos str-2
]
either loc [index? loc][0]
]
Now that you have those, you probably won't need them as I see you've
gotten a couple answers already. :) PARSE is really great once you get
used to it.
Happy REBOLing!
-- Gregg
[5/7] from: ml::sproingle::com at: 30-Apr-2004 8:58
Thanks for that Scott, I hadn't read up on parse, but I will do now.
Stuart
[6/7] from: ml:sproingle at: 30-Apr-2004 18:20
Thanks for the tutorial Max.
Stuart
[7/7] from: maximo:meteorstudios at: 30-Apr-2004 19:01
in a related topic, I'd like to restate/comment something said this week..
in rebol, there are usually very different ways to achieve the same goals. not just
in the functions to use, but in the raw in-code ideology, of how to plug the outputs
of all of the fancy series handlers.
Some will optimise for code readability and sharability, others will concentrate on raw
speed, and yet other solutions will tend to concentrate on code size and "denseness"
this is due because in rebol, everything (well almost) is an expression and reuses the
SAME basic expressions (coding) rules.
So that's why I meant that knowing what you want to do, withing what parameters, actually
defines what is the best course of action.
now go up and learn all of those words in the dictionnary also, ALWAYS remember to
check out the refinements. hey, even I revisit it now and then to refresh my memory.
have fun...
-MAx
---
You can either be part of the problem or part of the solution, but in the end, being
part of the problem is much more fun.
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted