PARSE question
[1/23] from: dvydra2::yahoo::com at: 30-Mar-2001 9:56
We are parsing large files as blocks. If there is an
error somewhere, how can we find out where the error
was? Can we get a count of items parsed correctly?
Thanks
dv
=====
please reply to: [david--vydra--net]
[2/23] from: petr:krenzelok:trz:cz at: 30-Mar-2001 20:32
----- Original Message -----
From: "David Vydra" <[dvydra2--yahoo--com]>
To: <[rebol-list--rebol--com]>
Sent: Friday, March 30, 2001 7:56 PM
Subject: [REBOL] PARSE question
> We are parsing large files as blocks. If there is an
> error somewhere, how can we find out where the error
> was? Can we get a count of items parsed correctly?
I am not sure I correctly understand what you are actually asking for, but
you can always "mark" your input to some word .... which is available in
context out of parse block too ... e.g.
parse something [some [pos1: some-stuff | pos2: other-stuff] exit: to end]
print [index? pos1 index? pos2 index? exit]
So I think that if error occures, then by printing your marking words you
can find out, where in the string you got lost ...
-pekr-
[3/23] from: agem:crosswinds at: 31-Mar-2001 17:35
>We are parsing large files as blocks. If there is an
>error somewhere, how can we find out where the error
>was? Can we get a count of items parsed correctly?
>
yes
parse something [any[
before-the-try:
some parsing (counter: counter + 1)
]]
in 'before-the-try you get the parse-position before the failed try,
in counter the count of trys.
[4/23] from: gchiu:compkarori at: 27-Jun-2001 13:29
If parse always returns a string
eg: parse {a="b"} [ thru {a="} copy test to {"} ]
and 'test is now a string containing "b"
shouldn't
parse {a=""} [ thru {a="} copy test to {"} ]
'test now be an empty string rather than type none! ?
--
Graham Chiu
[5/23] from: brett:codeconscious at: 27-Jun-2001 12:18
Hi Graham,
> If parse always returns a string
USAGE:
PARSE input rules /all /case
If rules is a block parse returns true or false.
Just being pedantic :)
> eg: parse {a="b"} [ thru {a="} copy test to {"} ]
>
> and 'test is now a string containing "b"
>
> shouldn't
>
> parse {a=""} [ thru {a="} copy test to {"} ]
>
> 'test now be an empty string rather than type none! ?
>
I guess is depends on how you see it, either as "nothing to copy" (none!) or
copied a string of zero length
(empty).
Also, I don't see that it makes a lot of difference - except for perhaps
some more if statements in some circumstances.
Brett
[6/23] from: joel:neely:fedex at: 26-Jun-2001 18:42
Hi, Brett,
I must agree with Graham on this one. A zero-length string
is a completely legitimate value, but most definitely is *not*
the same thing as NONE!
Brett Handley wrote:
> Hi Graham,
>
> > If parse always returns a string
>
I think he meant "sets variables to string! values using
COPY" instead of "returns"...
> > shouldn't
> >
<<quoted lines omitted: 6>>
> Also, I don't see that it makes a lot of difference - except
> for perhaps some more if statements in some circumstances.
Any time you wish to build a new string from the parsed
substrings, or print their content, you have a problem with
NONE! instead of an empty string. For example, given:
addr1: "John Doe/123 Lonely St/Suite 16/Los Angeles/CA/90210"
addr2: "Joe Doaks/321 Hilltop Ln//Green Mtn/MA/02187"
print-address-label: function [
s [string!]
][
addr-name addr-line1 addr-line2
addr-city addr-state addr-ZIP
][
parse/all s [
copy addr-name to "/" skip
copy addr-line1 to "/" skip
copy addr-line2 to "/" skip
copy addr-city to "/" skip
copy addr-state to "/" skip
copy addr-ZIP to end
]
print [
addr-name newline
addr-line1 newline
addr-line2 newline
addr-city " " addr-state " " addr-ZIP newline
]
]
... we get ...
>> print-address-label addr1
John Doe
123 Lonely St
Suite 16
Los Angeles CA 90210
>> print-address-label addr2
Joe Doaks
321 Hilltop Ln
none
Green Mtn MA 02187
Clearly the address label should contain a blank line between
321 Hilltop Ln
and "Green Mtn Ma 02187".
This is only a trivial example; I'm not suggesting this is the
best way to represent addresses or print labels! If the
example is too trite, just pretend that the syntax of the
address were much more complex, with various punctuation, etc.
I've done a tremendous amount of text-flogging over the years,
converting address lists, reformatting data files among a
variety of representations, etc. In such applications, it's
very common to parse out the pieces of one format and
immediately construct a new string or write/print using some
combination of the just-parsed text fields.
Yes, I know it's possible to follow the PARSE statement with
a list of "fix-the-empty-strings" statements, as in
parse/all s [
;...
]
addr-name: any [addr-name ""]
addr-line1: any [addr-line1 ""]
addr-line2: any [addr-line2 ""]
;...
print [
;...
]
(or even to make a fix-up function and call it on all of
the fields), but all that bother hardly seems in keeping
with "make easy things easy and hard things possible" IMHO.
-jn-
------------------------------------------------------------
Programming languages: compact, powerful, simple ...
Pick any two!
joel'dot'neely'at'fedex'dot'com
[7/23] from: brett:codeconscious at: 27-Jun-2001 16:20
Hi Joel,
> I must agree with Graham on this one. A zero-length string
> is a completely legitimate value, but most definitely is *not*
> the same thing as NONE!
You're phrasing runs the risk of implying I stated otherwise - which is not
the case.
Though now you've piqued my interest - what is value that is not completely
legitamate - illegitmate or partially legitimate? Are they something like
complex numbers - a real part and some other part? Just kidding.
> I think he meant "sets variables to string! values using
> COPY" instead of "returns"...
Yeah...
> >
> > > shouldn't
<<quoted lines omitted: 12>>
> substrings, or print their content, you have a problem with
> NONE! instead of an empty string. For example, given:
..
<snipped useful parse tutorial material>
..
> I've done a tremendous amount of text-flogging over the years,
> converting address lists, reformatting data files among a
<<quoted lines omitted: 4>>
> Yes, I know it's possible to follow the PARSE statement with
> a list of "fix-the-empty-strings" statements, as in
<snipped more useful tutorial code>
> (or even to make a fix-up function and call it on all of
> the fields), but all that bother hardly seems in keeping
> with "make easy things easy and hard things possible" IMHO.
Well you've fleshed out the circumstances I referred to. I agree that if
parse behaved the other way your example and similar code would be far
simpler.
It would be more elegant in the sense that when parsing strings one only
would need to think of strings "containing" substrings and strings of zero
length. So yes I guess it does make a difference, because currently parse
sort of "implies" none! is part of the string being parsed - which is
inconsistent with the knowledge that strings should only contain character
values.
It needs to be said now, that this is only relevent to parsing string!
values. When parse is applied to a block! value an empty string is a
completely legitimate value of a block. So are empty blocks. So parse still
needs to set a variable to NONE! during a COPY when parsing a block
otherwise we lose information.
So how many scripts would such a change break? Maybe not many, but it would
be a bit like y2k, you have to do a lot of hunting to determine the impact.
Would I support the change? - yep.
Brett.
[8/23] from: gchiu:compkarori at: 27-Jun-2001 18:19
On Tue, 26 Jun 2001 18:42:47 -0500
Joel Neely <[joel--neely--fedex--com]> wrote:
> I must agree with Graham on this one. A zero-length
> string
> is a completely legitimate value, but most definitely is
> *not*
> the same thing as NONE!
Trouble is I guess we are stuck with this behaviour as
changing it might cause problems with legacy code :-(
--
Graham Chiu
[9/23] from: gjones05:mail:orion at: 27-Jun-2001 5:52
Hi, Graham, Brett, Joel,
I don't seem to be able to muster the skills to articulate an argument one way
or the other, which is just as well, because you would present a formidable
opponents. :-)
So, when I don't feel up to a good argument, I simply present a different way of
thinking about the "problems" at hand and will leave to each individual user
whether the approach is useful.
>From "Graham Chiu"
...
> parse {a="b"} [ thru {a="} copy test to {"} ]
... and ...
> parse {a=""} [ thru {a="} copy test to {"} ]
Pardon me if I am not using the terminology correctly, but to most simply
coerce
the result to a specific type, here was how I saw the problem:
parse to-block {a="b"} ['a= set t string! to end] ; == true
t ; == "b"
parse to-block {a=""} ['a= set t string! to end] ; == true
t ; == ""
From: "Joel Neely"
...
> Any time you wish to build a new string from the parsed
> substrings, or print their content, you have a problem with
> NONE! instead of an empty string. For example, given:
...
I understand that your example was presented as a piece of an argument meant to
demonstrate a concrete side effect of the behavior in question. Here was how I
saw the problem:
addr1: "John Doe/123 Lonely St/Suite 16/Los Angeles/CA/90210"
addr2: "Joe Doaks/321 Hilltop Ln//Green Mtn/MA/02187"
print-address-label: function [
s [string!]
][
addr-name addr-line1 addr-line2
addr-city addr-state addr-ZIP
][
foreach [addr-name addr-line1 addr-line2
addr-city addr-state addr-zip] parse/all s "/" [
print [
addr-name newline
addr-line1 newline
addr-line2 newline
addr-city addr-state addr-ZIP newline
]
]
]
>> print-address-label addr1
John Doe
123 Lonely St
Suite 16
Los Angeles CA 90210
>> print-address-label addr2
Joe Doaks
321 Hilltop Ln
Green Mtn MA 02187
I guess a "blank" line is better than a "none" line for this trivial
("no-additional-logic-added") example.
Remember, Joel, (et al,) "It's turtles all the way down!"
(I had to look that one up, BTW. If I had ever heard it before, I had certainly
forgotten it. :-)
--Scott Jones
[10/23] from: agem:crosswinds at: 27-Jun-2001 14:37
RE: [REBOL] Re: parse question
[gchiu--compkarori--co--nz] wrote:
> On Tue, 26 Jun 2001 18:42:47 -0500
> Joel Neely <[joel--neely--fedex--com]> wrote:
<<quoted lines omitted: 6>>
> Trouble is I guess we are stuck with this behaviour as
> changing it might cause problems with legacy code :-(
could be some option like parse/empties ?
-Volker
[11/23] from: joel:neely:fedex at: 27-Jun-2001 2:55
Brett Handley wrote:
> Hi Joel,
>
> > I must agree with Graham on this one. A zero-length
> > string is a completely legitimate value, but most
> > definitely is *not* the same thing as NONE!
>
> You're phrasing runs the risk of implying I stated
> otherwise - which is not the case.
>
Ooops! Sorry! I intended it as a part of my "thinking
out loud" about "" vs. NONE! and not as an attempt to
describe your views.
> Though now you've piqued my interest - what is value that
> is not completely legitamate - illegitmate or partially
> legitimate? Are they something like complex numbers - a real
> part and some other part? Just kidding.
>
Hmmm. I gotta stop writing emails late at night! ;-)
By analogy, -1 is a legitimate integer, but one would likely
have problems trying to use it to PICK data from a series.
However, most uses one makes of string data accept "" without
complaint. That was the sense I intended.
> Well you've fleshed out the circumstances I referred to. I
> agree that if parse behaved the other way your example and
<<quoted lines omitted: 6>>
> inconsistent with the knowledge that strings should only
> contain character values.
Agreed. And my view is that the second point you made has
a much higher priority than the first. It addresses both
consistency and learnability. (Not that you've ever heard
anything from me on those topics before... ;-)
Especially since PARSE really does know how to extract strings
of zero length in other settings:
>> addr2: "Joe Doaks/321 Hilltop Ln//Green Mtn/MA/02187"
== "Joe Doaks/321 Hilltop Ln//Green Mtn/MA/02187"
>> parse/all addr2 "/"
== ["Joe Doaks" "321 Hilltop Ln" "" "Green Mtn" "MA" "02187"]
> It needs to be said now, that this is only relevent to parsing
> string! values...
>
It seems that parse needs to be *able* to supply NONE! values
when appropriate, but I'm not sure I completely understand what
the phrase "when appropriate" might mean. Given that phone
data may or may not be available for all people, we could design
several possible representations of a data structure which
allows for the data-not-available case. For example:
When phone data is available (the easy case)
demo: [1234 "Ferd Burfel" #901-555-1212 127.0.0.1]
When it is not available, we can choose to
a) omit it
omed: [2345 "Joe Doaks" 127.255.255.255]
b) mark it with a special word
medo: [3456 "Jane Doe" none 255.255.255.255]
c) or mark it with a specific not-available value
odem: reduce medo
Assuming we're handling the simple case in the obvious way:
parse demo [
copy xID integer!
copy xName string!
copy xPhone issue!
copy xIP tuple!
]
... each of the not-available choices implies a different
strategy for block parsing.
parse omed [
copy xID integer!
copy xName string!
copy xPhone [issue! | none]
copy xIP tuple!
]
parse medo [
copy xID integer!
copy xName string!
copy xPhone [issue! | word!]
copy xIP tuple!
]
parse odem [
copy xID integer!
copy xName string!
copy xPhone [issue! | none!]
copy xIP tuple!
]
The programmer can choose which representation/parsing strategy
to use, because the block is probably a constructed artifact.
I think we're agreeing here; I'm just thinking out loud to
confirm that fact...
> So how many scripts would such a change break? Maybe not many,
> but it would be a bit like y2k, you have to do a lot of
> hunting to determine the impact.
>
> Would I support the change? - yep.
>
Agreed. Especially if the change would help make the language
more attractive/understandable/usable to a larger audience.
-jn-
------------------------------------------------------------
Programming languages: compact, powerful, simple ...
Pick any two!
joel'dot'neely'at'fedex'dot'com
[12/23] from: joel:neely:fedex at: 27-Jun-2001 3:16
Hi, Scott,
GS Jones wrote:
> Hi, Graham, Brett, Joel,
>
> I don't seem to be able to muster the skills to articulate an
> argument one way or the other, which is just as well, because
> you would present a formidable opponents. :-)
>
No "opponents" here...
We're all just turtles on this bus!
;-)
-jn-
------------------------------------------------------------
Programming languages: compact, powerful, simple ...
Pick any two!
joel'dot'neely'at'fedex'dot'com
[13/23] from: robbo1mark:aol at: 27-Jun-2001 10:13
Joel,
this is offtopic & offbeat, but I'm just curious,
you seem to mention "turtles" quite a lot! any reason?
we're you a turtle in a previous life? are you a turtle
now? maybe you programmed in LOGO years ago I don't know?
just curious & having fun.
Mark Dickson
In a message dated Wed, 27 Jun 2001 9:51:21 AM Eastern Daylight Time, Joel Neely <[joel--neely--fedex--com]>
writes:
<< Hi, Scott,
GS Jones wrote:
> Hi, Graham, Brett, Joel,
>
> I don't seem to be able to muster the skills to articulate an
> argument one way or the other, which is just as well, because
> you would present a formidable opponents. :-)
>
No "opponents" here...
We're all just turtles on this bus!
;-)
-jn-
------------------------------------------------------------
Programming languages: compact, powerful, simple ...
Pick any two!
joel'dot'neely'at'fedex'dot'com
[14/23] from: brett:codeconscious at: 28-Jun-2001 1:09
> Hi, Graham, Brett, Joel,
>
> I don't seem to be able to muster the skills to articulate an argument one
way
> or the other, which is just as well, because you would present a
formidable
> opponents. :-)
I'm the one in the corner looking dazed after thumping myself in the head a
few times...
Brett.
[15/23] from: joel:neely:fedex at: 27-Jun-2001 10:16
Hi, Mark,
[Robbo1Mark--aol--com] wrote:
> Joel,
>
> this is offtopic & offbeat, but I'm just curious,
> you seem to mention "turtles" quite a lot! any reason?
>
Old joke:
Prominent scientist gives a lecture on the space program.
Little old lady comes up afterwards to speak with him.
LOL: That was all very interesting, but it's nonsense.
Everybody knows that the earth is flat and rests on
the back of a giant turtle.
PS [humoring her]: Well, what does that turtle stand on?
LOL: Each of his feet is on the back of another giant turtle.
PS: Well, what do THEY stand on?
LOL: Now, young man, you can't trick me. From there, it's
turtles all the way down!
Somehow it seemed appropriate in the context of
Language X
implemented in
Language Y
compiled to
Assembler
implemented in
Microcode
which manipulates
Function blocks
implemented in
Solid state circuitry
dependent on
Quantum mechanics
... ;-)
-jn-
It's turtles all the way down!
joel'dot'neely'at'fedex'dot'com
[16/23] from: lmecir:mbox:vol:cz at: 27-Jun-2001 17:55
Hi Brett,
there are some "partially legitimate" values in Rebol, IMO :-) I would say
that about the values of ERROR! and UNSET! datatypes.
[17/23] from: robbo1mark:aol at: 27-Jun-2001 12:39
Joel,
Aha! now I see & understand!
I prefer Terrapins myself, much more cute IMHO.
I seem to remember a thread on this list ages & ages
ago about Which Animal? should O'Reiily use for some future O'Reilly Safari book, which
would formally make REBOL "recognized officially" as a language.
Would it be a turtle? I can't remember which animal, if anything, was selected?
Anybody else got a better memory?
cheers,
Mark Dickson
PS If this debate wasn't settled then by all means lets start it all up again!
In a message dated Wed, 27 Jun 2001 11:31:14 AM Eastern Daylight Time, Joel Neely <[joel--neely--fedex--com]>
writes:
<< Hi, Mark,
[Robbo1Mark--aol--com] wrote:
> Joel,
>
> this is offtopic & offbeat, but I'm just curious,
> you seem to mention "turtles" quite a lot! any reason?
>
Old joke:
Prominent scientist gives a lecture on the space program.
Little old lady comes up afterwards to speak with him.
LOL: That was all very interesting, but it's nonsense.
Everybody knows that the earth is flat and rests on
the back of a giant turtle.
PS [humoring her]: Well, what does that turtle stand on?
LOL: Each of his feet is on the back of another giant turtle.
PS: Well, what do THEY stand on?
LOL: Now, young man, you can't trick me. From there, it's
turtles all the way down!
Somehow it seemed appropriate in the context of
Language X
implemented in
Language Y
compiled to
Assembler
implemented in
Microcode
which manipulates
Function blocks
implemented in
Solid state circuitry
dependent on
Quantum mechanics
.. ;-)
-jn-
It's turtles all the way down!
joel'dot'neely'at'fedex'dot'com
[18/23] from: gchiu:compkarori at: 28-Jun-2001 8:48
On Wed, 27 Jun 2001 10:16:20 -0500
Joel Neely <[joel--neely--fedex--com]> wrote:
> Old joke:
> Prominent scientist gives a lecture on the space
<<quoted lines omitted: 12>>
> it's
> turtles all the way down!
In "A Brief History of Time" by Stephen Hawking, 1988, page
1, the scientist in this tale is said to be Bertrand Russell
giving a public lecture on astronomy.
--
Graham Chiu
[19/23] from: joel:neely:fedex at: 27-Jun-2001 16:15
Hi, Graham,
You are a gentleman and a scholar!
Graham Chiu wrote:
> > Old joke:
> >
<<quoted lines omitted: 3>>
> 1, the scientist in this tale is said to be Bertrand Russell
> giving a public lecture on astronomy.
Since you are clearly so well-read, perhaps you'd know a hint
as to the origin of one of my favorite (but, alas, unattributed)
quotations:
"You can only learn that which you already *almost* know."
Thanks!
-jn-
--
It's turtles all the way down!
joel'dot'neely'at'fedex'dot'com
[20/23] from: gchiu:compkarori at: 28-Jun-2001 16:06
On Wed, 27 Jun 2001 16:15:38 -0500
Joel Neely <[joel--neely--fedex--com]> wrote:
> Since you are clearly so well-read, perhaps you'd know a
> hint
<<quoted lines omitted: 3>>
> "You can only learn that which you already *almost*
> know."
Sorry Joel, can't help! As for well read, it just so
happened that my sister mentioned last week that she had
borrowed this book from the library. I got my copy out in
case she needed it longer, and happened to re-read the first
page wherein lay the tale. Now, if it had been on page 2, I
wouldn't have seen it!
--
Graham Chiu
[21/23] from: robert:muench:robertmuench at: 29-Jun-2001 13:17
> -----Original Message-----
> From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]On Behalf Of
<<quoted lines omitted: 8>>
> parse {a=""} [ thru {a="} copy test to {"} ]
> 'test now be an empty string rather than type none! ?
Hi, I see the whole string "" as the definition for the empty string and there
is nothing between " and ", therefore the none! return value is OK as you read
thru the first " and up-to the next ". Robert
[22/23] from: ingo:2b1 at: 3-Jul-2001 12:38
Hi Robert,
Once upon a time Robert M. Muench spoketh thus:
> > -----Original Message-----
> > From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]On Behalf Of
<<quoted lines omitted: 10>>
> is nothing between " and ", therefore the none! return value is OK as you read
> thru the first " and up-to the next ". Robert
We might discuss the "validity" of this approach in length, but from
a practical viewpoint returning "" instead of none is much preferrable.
Returning none:
- If it doesn't matter to you, wether the string is empty or not,
you have to manually check all values, and change them to ""
- if it matters you may check for none? values
Returning "":
- If it doesn't matter to you, wether the string is empty or not,
you don't have to do any thing
- if it matters you may check for empty? values
=> all in all returning empty strings requires less programming efforts
kind regards,
Ingo
[23/23] from: robert:muench:robertmuench at: 4-Jul-2001 8:56
> -----Original Message-----
> From: [rebol-bounce--rebol--com] [mailto:[rebol-bounce--rebol--com]]On Behalf Of
<<quoted lines omitted: 6>>
> ...
> => all in all returning empty strings requires less programming efforts
Hi, from the pragmatic POV I agree absolutely ;-)). Robert
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted