Processing files / Any AWK users?
[1/7] from: greggirwin::mindspring::com at: 18-Nov-2001 20:29
If I'm processing a file, by lines, thus:
foreach line read/lines/with file rec-sep [
It appears that foreach doesn't call read on each pass, but I thought I'd
ask to see if anyone knows, for certain. I know you can modify the block
that foreach uses and it will see the changes, but I wasn't sure what magic
occurs in this scenario.
The real question, I suppose, is: Should I cache the return value from 'read
before using it with foreach, or is that unnecessary?
As a side question, does anyone on the list use AWK or think that something
like it (i.e. an auto-driven rule/action file processor) would be useful?
--Gregg
[2/7] from: tomc:darkwing:uoregon at: 18-Nov-2001 21:00
On Sun, 18 Nov 2001, Gregg Irwin wrote:
> If I'm processing a file, by lines, thus:
> foreach line read/lines/with file rec-sep [
<<quoted lines omitted: 4>>
> The real question, I suppose, is: Should I cache the return value from 'read
> before using it with foreach, or is that unnecessary?
I do this alot, and have worked with the assumption that it is best
(metric: [walltime disk-io]) to read once at the beginning and write once
at the end, and use as many in-memory buffers as I need inbetween.
> As a side question, does anyone on the list use AWK or think that something
> like it (i.e. an auto-driven rule/action file processor) would be useful?
>
what is parse not doing?
[3/7] from: brett:codeconscious at: 19-Nov-2001 16:04
Hi Gregg,
> If I'm processing a file, by lines, thus:
>
> foreach line read/lines/with file rec-sep [
>
> It appears that foreach doesn't call read on each pass, but I thought I'd
> ask to see if anyone knows, for certain. I know you can modify the block
> that foreach uses and it will see the changes, but I wasn't sure what
magic
> occurs in this scenario.
>
> The real question, I suppose, is: Should I cache the return value from
'read
> before using it with foreach, or is that unnecessary?
No foreach does not call read on each pass. Read returns a series which is
then given to foreach as a parameter.
By cache I assume you mean something like:
the-lines: read/lines/with file rec-sep
foreach line the-lines [...]
You would only do this if you need to access those lines again in your
script. I doubt there is any other memory effect other than creation a
single word "the-lines" and the prevention of the series being garbage
collected.
No opinion on the second question due to lack of knowledge :)
Brett.
[4/7] from: joel:neely:fedex at: 19-Nov-2001 6:47
Hi, Gregg,
(popping my head up from the project from the black lagoon for a mo')
Gregg Irwin wrote:
> If I'm processing a file, by lines, thus:
>
> foreach line read/lines/with file rec-sep [
>
> It appears that foreach doesn't call read on each pass, but I thought
> I'd ask to see if anyone knows, for certain. I know you can modify
> the block that foreach uses and it will see the changes, but I wasn't
> sure what magic occurs in this scenario.
>
There's no interaction at all between FOREACH and READ. The above code
is exactly equivalent to
tempfoo: read/lines/with file recsep
foreach line tempfoo [
(except that this latter case burns up a word name). Another way to
think of it is
foreach line (read/lines/with file rec-sep) [
to emphasize that READ/LINES just creates a block of strings. Then (on
completion) that block is used as the second arg to FOREACH.
> The real question, I suppose, is: Should I cache the return value
> from 'read before using it with foreach, or is that unnecessary?
>
Totally unnecesary (and actually wastes a few nanoseconds ;-) unless
you want to do something else with the saved block after FOREACH gets
through doing its thing (in which case you already know that the
answer is "Yes").
> As a side question, does anyone on the list use AWK or think that
> something like it (i.e. an auto-driven rule/action file processor)
> would be useful?
>
It would be marginally useful to me as a keystroke-saver, but a
brute-force equivalent in REBOL would (IMHO) only result in
replacing something vaguely like
awk-like-func: func [one-line [string!] ...] [
if parse/all/case one-line [ ;...first parse rule...
][
;...corresponding action...
exit
]
if parse/all/case one-line [ ;...next parse rule...
][
;...corresponding action...
exit
]
;...
if parse/all/case one-line [ ;...last parse rule...
][
;...corresponding action...
exit
]
;...error or inaction...
]
foreach myline read myfile [awk-like-func myline]
with
do %awklib.r
awk-plan: [
[ ;...local words for rules/actions]
[ ;...first parse rule...] [ ;...corresponding action...]
[ ;...next parse rule... ] [ ;...corresponding action...]
;...
[ ;...last parse rule... ] [ ;...corresponding action...]
]
awk-main myfile awk-plan
which would be handy but not a huge win. I guess it just depends on
whether most of one's parsing/action tasks are line oriented or not.
However, I don't think AWK-PLAN would be too challenging to write.
-jn-
--
; sub REBOL {}; sub head ($) {@_[0]}
REBOL []
# despam: func [e] [replace replace/all e ":" "." "#" "@"]
; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"}
print head reverse despam "moc:xedef#yleen:leoj" ;
[5/7] from: greggirwin:mindspring at: 19-Nov-2001 11:34
Thanks Tom,
<< what is parse not doing? >>
Parse is *great*. I'm talking about a slightly more specialized tool that
wraps up the iteration and some parse aspects for simple file processing.
I may, indeed, find that there's a nicer way to do it with a dialect but my
initial ideas on that didn't pan out. That is to say, the rule/action
combination provided a more concise and clearer approach to me.
--Gregg
[6/7] from: greggirwin:mindspring at: 19-Nov-2001 11:34
Thanks Joel!
<BIG SNIP>
It would be marginally useful to me as a keystroke-saver, but a
brute-force equivalent in REBOL would (IMHO) only result in
replacing something vaguely like
awk-like-func: func [one-line [string!] ...] [
if parse/all/case one-line [ ;...first parse rule...
][
;...corresponding action...
exit
]
if parse/all/case one-line [ ;...next parse rule...
][
;...corresponding action...
exit
]
;...
if parse/all/case one-line [ ;...last parse rule...
][
;...corresponding action...
exit
]
;...error or inaction...
]
foreach myline read myfile [awk-like-func myline]
with
do %awklib.r
awk-plan: [
[ ;...local words for rules/actions]
[ ;...first parse rule...] [ ;...corresponding action...]
[ ;...next parse rule... ] [ ;...corresponding action...]
;...
[ ;...last parse rule... ] [ ;...corresponding action...]
]
awk-main myfile awk-plan
which would be handy but not a huge win. I guess it just depends on
whether most of one's parsing/action tasks are line oriented or not.
However, I don't think AWK-PLAN would be too challenging to write.
<END BIG SNIP>
Right. In order to be useful it should provide code savings or help to make
programs clearer and more self-documenting.
The first thing it does is save you writing two foreach loops, one for the
list of files and the other for the lines in each file.
>From a processing efficiency standpoint, it parses each line once and then
evaluate each rule against that parsed representation. If you do this
manually, you save a little more code.
It gives you a few "standard" rules that make it clear what certain actions
are used for:
begin [print "before any processing begins"]
end [print "after all processing is done"]
all [print "do this for every line in each file"]
I've thought about adding these as well:
begin-file [print "before we process the first line in each file"]
end-file [print "after we process the last line in each file"]
It does a little housekeeping for you and gives you shorthand references to
things like the current record, individual fields in a record, number of
lines read, field separator, record separator, etc.
That's the basic stuff, which works right now. Rules and actions are just
how you showed them (with rules being any singular entity: word!, block!,
paren!) though I hadn't thought about local word definitions.
The biggest omissions right now are probably regular expression support and
automatic conversion of numeric values in fields. I have a dictionary object
that could be plugged in to emulate associative arrays so that's not an
issue.
It's kind of handy in its current form (sort of a crude dialct I guess), but
would need some work IMO, to be a good general purpose tool and provide
value above just using REBOL.
Thanks for the feedback!
--Gregg
[7/7] from: greggirwin:mindspring at: 19-Nov-2001 11:34
Thanks Brett!
<< No foreach does not call read on each pass. Read returns a series which
is
then given to foreach as a parameter.>>
That's what I was looking for.
--Gregg
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted