Processing files / Any AWK users?

[1/7] from: greggirwin::mindspring::com at: 18-Nov-2001 20:29

If I'm processing a file, by lines, thus: foreach line read/lines/with file rec-sep [ It appears that foreach doesn't call read on each pass, but I thought I'd ask to see if anyone knows, for certain. I know you can modify the block that foreach uses and it will see the changes, but I wasn't sure what magic occurs in this scenario. The real question, I suppose, is: Should I cache the return value from 'read before using it with foreach, or is that unnecessary? As a side question, does anyone on the list use AWK or think that something like it (i.e. an auto-driven rule/action file processor) would be useful? --Gregg

[2/7] from: tomc:darkwing:uoregon at: 18-Nov-2001 21:00

On Sun, 18 Nov 2001, Gregg Irwin wrote:

> If I'm processing a file, by lines, thus: > foreach line read/lines/with file rec-sep [

<<quoted lines omitted: 4>>

> The real question, I suppose, is: Should I cache the return value from 'read > before using it with foreach, or is that unnecessary?

I do this alot, and have worked with the assumption that it is best (metric: [walltime disk-io]) to read once at the beginning and write once at the end, and use as many in-memory buffers as I need inbetween.

> As a side question, does anyone on the list use AWK or think that something > like it (i.e. an auto-driven rule/action file processor) would be useful? >

what is parse not doing?

[3/7] from: brett:codeconscious at: 19-Nov-2001 16:04

Hi Gregg,

> If I'm processing a file, by lines, thus: > > foreach line read/lines/with file rec-sep [ > > It appears that foreach doesn't call read on each pass, but I thought I'd > ask to see if anyone knows, for certain. I know you can modify the block > that foreach uses and it will see the changes, but I wasn't sure what

magic

> occurs in this scenario. > > The real question, I suppose, is: Should I cache the return value from

'read

> before using it with foreach, or is that unnecessary?

No foreach does not call read on each pass. Read returns a series which is then given to foreach as a parameter. By cache I assume you mean something like: the-lines: read/lines/with file rec-sep foreach line the-lines [...] You would only do this if you need to access those lines again in your script. I doubt there is any other memory effect other than creation a single word "the-lines" and the prevention of the series being garbage collected. No opinion on the second question due to lack of knowledge :) Brett.

[4/7] from: joel:neely:fedex at: 19-Nov-2001 6:47

Hi, Gregg, (popping my head up from the project from the black lagoon for a mo') Gregg Irwin wrote:

> If I'm processing a file, by lines, thus: > > foreach line read/lines/with file rec-sep [ > > It appears that foreach doesn't call read on each pass, but I thought > I'd ask to see if anyone knows, for certain. I know you can modify > the block that foreach uses and it will see the changes, but I wasn't > sure what magic occurs in this scenario. >

There's no interaction at all between FOREACH and READ. The above code is exactly equivalent to tempfoo: read/lines/with file recsep foreach line tempfoo [ (except that this latter case burns up a word name). Another way to think of it is foreach line (read/lines/with file rec-sep) [ to emphasize that READ/LINES just creates a block of strings. Then (on completion) that block is used as the second arg to FOREACH.

> The real question, I suppose, is: Should I cache the return value > from 'read before using it with foreach, or is that unnecessary? >

Totally unnecesary (and actually wastes a few nanoseconds ;-) unless you want to do something else with the saved block after FOREACH gets through doing its thing (in which case you already know that the answer is "Yes").

> As a side question, does anyone on the list use AWK or think that > something like it (i.e. an auto-driven rule/action file processor) > would be useful? >

It would be marginally useful to me as a keystroke-saver, but a brute-force equivalent in REBOL would (IMHO) only result in replacing something vaguely like awk-like-func: func [one-line [string!] ...] [ if parse/all/case one-line [ ;...first parse rule... ][ ;...corresponding action... exit ] if parse/all/case one-line [ ;...next parse rule... ][ ;...corresponding action... exit ] ;... if parse/all/case one-line [ ;...last parse rule... ][ ;...corresponding action... exit ] ;...error or inaction... ] foreach myline read myfile [awk-like-func myline] with do %awklib.r awk-plan: [ [ ;...local words for rules/actions] [ ;...first parse rule...] [ ;...corresponding action...] [ ;...next parse rule... ] [ ;...corresponding action...] ;... [ ;...last parse rule... ] [ ;...corresponding action...] ] awk-main myfile awk-plan which would be handy but not a huge win. I guess it just depends on whether most of one's parsing/action tasks are line oriented or not. However, I don't think AWK-PLAN would be too challenging to write. -jn- -- ; sub REBOL {}; sub head ($) {@_[0]} REBOL [] # despam: func [e] [replace replace/all e ":" "." "#" "@"] ; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"} print head reverse despam "moc:xedef#yleen:leoj" ;

[5/7] from: greggirwin:mindspring at: 19-Nov-2001 11:34

Thanks Tom, << what is parse not doing? >> Parse is *great*. I'm talking about a slightly more specialized tool that wraps up the iteration and some parse aspects for simple file processing. I may, indeed, find that there's a nicer way to do it with a dialect but my initial ideas on that didn't pan out. That is to say, the rule/action combination provided a more concise and clearer approach to me. --Gregg

[6/7] from: greggirwin:mindspring at: 19-Nov-2001 11:34

Thanks Joel! <BIG SNIP> It would be marginally useful to me as a keystroke-saver, but a brute-force equivalent in REBOL would (IMHO) only result in replacing something vaguely like awk-like-func: func [one-line [string!] ...] [ if parse/all/case one-line [ ;...first parse rule... ][ ;...corresponding action... exit ] if parse/all/case one-line [ ;...next parse rule... ][ ;...corresponding action... exit ] ;... if parse/all/case one-line [ ;...last parse rule... ][ ;...corresponding action... exit ] ;...error or inaction... ] foreach myline read myfile [awk-like-func myline] with do %awklib.r awk-plan: [ [ ;...local words for rules/actions] [ ;...first parse rule...] [ ;...corresponding action...] [ ;...next parse rule... ] [ ;...corresponding action...] ;... [ ;...last parse rule... ] [ ;...corresponding action...] ] awk-main myfile awk-plan which would be handy but not a huge win. I guess it just depends on whether most of one's parsing/action tasks are line oriented or not. However, I don't think AWK-PLAN would be too challenging to write. <END BIG SNIP> Right. In order to be useful it should provide code savings or help to make programs clearer and more self-documenting. The first thing it does is save you writing two foreach loops, one for the list of files and the other for the lines in each file.

>From a processing efficiency standpoint, it parses each line once and then

evaluate each rule against that parsed representation. If you do this manually, you save a little more code. It gives you a few "standard" rules that make it clear what certain actions are used for: begin [print "before any processing begins"] end [print "after all processing is done"] all [print "do this for every line in each file"] I've thought about adding these as well: begin-file [print "before we process the first line in each file"] end-file [print "after we process the last line in each file"] It does a little housekeeping for you and gives you shorthand references to things like the current record, individual fields in a record, number of lines read, field separator, record separator, etc. That's the basic stuff, which works right now. Rules and actions are just how you showed them (with rules being any singular entity: word!, block!, paren!) though I hadn't thought about local word definitions. The biggest omissions right now are probably regular expression support and automatic conversion of numeric values in fields. I have a dictionary object that could be plugged in to emulate associative arrays so that's not an issue. It's kind of handy in its current form (sort of a crude dialct I guess), but would need some work IMO, to be a good general purpose tool and provide value above just using REBOL. Thanks for the feedback! --Gregg

[7/7] from: greggirwin:mindspring at: 19-Nov-2001 11:34

Thanks Brett! << No foreach does not call read on each pass. Read returns a series which is then given to foreach as a parameter.>> That's what I was looking for. --Gregg

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted