ANN: xml-object.r , and...a question about REBOL's built-in parse-xml
[1/7] from: gavin::mckenzie::sympatico::ca at: 4-Oct-2001 22:45
Folks,
First, there's a new rev of xml-object.r (v 1.0.4) available at:
http://www3.sympatico.ca/gavin.mckenzie/
This release fixes an error with whitespace processing. And, I changed my
switch statements based on the recent switch/type? thread.
I've noticed some limitations in xml-object. If you have element with an
attribute and a subelement with the same name, bad things happen. This
should really be considered poor form in XML. However, it is legitimately
possible to encounter an attribute and subelement of the same local-name
within different namespaces.
I could also improve my mixed-content processing somewhat...anyway, more
work to do.
Now...on to a question.
I've been building some REBOL-Server-Pages where I query against a
MS-SQL-Server and get back an XML result set. So, I've been using the
built-in REBOL parse-xml rather than my parse-xml+ script, and I noticed
something that became very frustrating: when parse-xml encounters a XML
declaration (<?xml version ...?>) it calls a function...
check-version: func [version][print ["XML Version:" version]]
...which has the nasty side effect of printing out "XML Version" with the
version number.
This message, of course, messes up my carefully crafter HTML page that is
produced from my REBOL server page.
Anyway, my quick fix was to use my parse-xml+ function, but given that the
XML I was processing was tightly controlled and straightforward I was
actually expecting to stick to the built-in function. Oh well, I suppose I
could hack the build-in xml-language object that parse-xml uses and stub out
the check-version function, but that seems beside the point.
Has anyone else used parse-xml and considered this a real problem? I do
hope that some future rev of REBOL will drop the check-version print.
Gavin.
[2/7] from: chris:langreiter at: 5-Oct-2001 9:09
Re: ANN: xml-object.r , and...a question about REBOL's built-in parse-xm
>
>Has anyone else used parse-xml and considered this a real problem? I do
>hope that some future rev of REBOL will drop the check-version print.
>
I hope so too. Otherwise you'll continue to find the line
xml-language/check-version: func [v][return]
in every single REBOL
script dealing with XML in a parse-xmly way.
May I raise the question what the point of this print-out is or was?! Or
is it just a a not-so-subtle display of RT's disregard of XML as data
exchange format (which I don't share, though I prefer native REBOL
exchange as well)?
BTW, Gavin, your xml-object script is a godsend. RT should include it in
future REBOL releases.
-- Chris
__
Vanilla NOW: http://www.langreiter.com/space/vanilla-download
[3/7] from: joel::neely::fedex::com at: 5-Oct-2001 2:00
Hi, Gavin,
Gavin F. McKenzie
wrote:
> I've noticed some limitations in xml-object. If you have
> element with an attribute and a subelement with the same
> name, bad things happen. This should really be considered
> poor form in XML..
>
Sorry, but I must emphatically disagree.
This is equivalent to saying that recursive function call are
bad form. XML markup shows semantic structure, and it is
entirely legitimate that such structure be recursive in nature.
One of the first serious applications I wrote in REBOL (in fact
it was one of the main reasons I began using REBOL) was an XML-
based web site generator which combines content from individual
HTML files with an XML document that represents the structure
of the site. It generates per-page "navigation bars" from the
knowledge of where each page fits into the overall site, and
generates the final pages by inserting content and navigation
into templates. (Sorry for the long-winded background, but it
is the reason for the example below.)
A simplified version of the site file has content such as this
(with ellipses standing for details beside the current point):
<site docroot="/opt/netscape/suitespot/docs/devgroup/"
source="/export/home/sitedev/devgroup/" ... >
<page title="Home" file="index.html" ... >
<page title="Our Mission" file="mission.html" .../>
<page title="Our People" file="people.html" .../>
<page title="Visit Us" file="map.html" .../>
</page>
<page title="Projects" file="proj.html" ... >
<page title="Widgets" file="pr.3094.html" .../>
<page title="Frobs" file="pr.3128.html" .../>
<page title="Cruft" file="pr.3312.html" ... >
<page title="Biggie" file="pr.3467.html" ... >
<page title="ROI" file="roi.3467.html" .../>
<page title="Budget" file="bud.3467.html" .../>
</page>
</page>
...
</site>
It is entirely reasonable to have some pages with sub-pages and
others without. Pages are represented with PAGE elements whose
location (nested within other PAGEs or not) in the XML document
shows where they fit into the site structure. Since none of the
information about a page (attributes of the PAGE element) is
dependent on where the page is in the site, the site can be
re-structured simply by moving one or more PAGE elements to a
new place in the tree and re-running the generator (usually a
15- to 30-second effort).
Although the "recursion" is indirect, standard HTML allows the
nesting of tables and framesets. XHTML (essentially writing
HTML with XML notation conventions) should allow these as well.
> I could also improve my mixed-content processing somewhat...anyway, more
> work to do.
<<quoted lines omitted: 7>>
> This message, of course, messes up my carefully crafter HTML
> page that is produced from my REBOL server page.
Disabling that one function is easy. I've made other modifications
to xml-parser for other purposes as well.
Here's some sample XML ...
>> foo: {
{ <?xml version="2.5" ?>
{ <motor productID="375-2385">
{ <assembly productID="238-2356">
{ <assembly productID="795-5837"/>
{ <assembly productID="123-4567"/>
{ </assembly>
{ <assembly productID="987-6543">
{ </motor>
{ }
== {
<?xml version="2.5" ?>
<motor productID="375-2385">
<assembly productID="238-2356">
<assembly productID="795-5837"...
... which shows your problem when parsed.
>> parse-xml foo
XML Version: 2.5
== [document none [["motor" ["productID" "375-2385"] ["^/ "
["assembly" ["productID" "238-2356"] ["^/ "
["assembly" ["pro...
So, let's disable the offending function ...
>> xml-language: make xml-language [
[ check-version: func [version][]
[ ]
... and parse again.
>> parse-xml foo
== [document none [["motor" ["productID" "375-2385"] ["^/ "
["assembly" ["productID" "238-2356"] ["^/ "
["assembly" ["pro...
HTH!
-jn-
--
The end of all our exploring will be to arrive where we started and
know the place for the first time.
-- T.S. Eliot
joel-dot-neely-FIX-PUNCTUATION-at-fedex-dot-com
[4/7] from: gavin:mckenzie:sympatico:ca at: 5-Oct-2001 8:28
Re: ANN: xml-object.r , and...a question about REBOL's built-in parse-x
Hi Joel,
I think we've misunderstood each other. I should have included an example
with my comments as clarification; sorry.
Nested structures with repeating names are absolutely ok.
What I was referring to is the following:
<foo bar="something">
<bar>something</bar>
</foo>
In that example foo has both a child element named 'bar' and an attribute
named 'bar'. While it is perfectly legal to do this, it is considered (by
many) to be poor form because it makes the representation of the XML in
objects and exposure into scripting engines (such as Active Scripting or
some other script engine) problematic.
The xml-object.r script doesn't handle the above case very well.
What you were referring to is doing something like:
<foo>
<bar>something</bar>
<bar>something</bar>
</bar>
</foo>
And of course, this is ok -- in fact it is extremely useful. The ability to
represent repeating nested or 'recursive' structures is a very important
capability, as you rightly point out. And, I'm happy to say, xml-object.r
handles it ok too.
Gavin.
[5/7] from: deryk::iitowns::com at: 5-Oct-2001 22:39
On Friday 05 October 2001 08:28, you wrote:
> What you were referring to is doing something like:
> <foo>
<<quoted lines omitted: 6>>
> capability, as you rightly point out. And, I'm happy to say, xml-object.r
> handles it ok too.
[[deryk--trek] deryk]$ xmllint lint
lint:5: error: Opening and ending tag mismatch: foo and bar
</bar>
^
lint:6: error: Extra content at the end of the document
</foo>
^
_almost_ ;)
[6/7] from: gavin:mckenzie:sympatico:ca at: 5-Oct-2001 12:19
Yeah...the dangers of hand-typing XML and not completely paying attention.
<foo>
<bar>
<bar>something</bar>
</bar>
</foo>
Better.
Gavin.
[7/7] from: joel::neely::fedex::com at: 5-Oct-2001 15:28
Re: ANN: xml-object.r , and...a question about REBOL's built-inparse-xml
Hi, Gavin,
Thanks for the clarification!
Gavin F. McKenzie
wrote:
> What I was referring to is the following:
> <foo bar="something">
<<quoted lines omitted: 6>>
> engines (such as Active Scripting or some other script engine)
> problematic.
With my understanding fixed, I admit I disagree less ;-), but
still don't this as much of an issue of concern. It seems to
me that the concept of attributes is that of name/value pairs
that are "parts of" an entity, whereas the concept of a child
entity is a subordination issue -- a different relationship.
Every time I've played with XML and XML parsing, those have
been represented distinctly (e.g. in Perl, a hash for the
name/value pairs in attributes and an array for contents;
in REBOL a block of name/value pairs for attributes and a
separate block of contents), so I wouldn't expect any real
implementation issues.
Just my $0.02...
-jn-
--
This sentence contradicts itself -- no actually it doesn't.
-- Doug Hofstadter
joel<dot>neely<at>fedex<dot>com
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted