World: r3wp

[XML] xml related conversations

i im using the scripts made by Gavin the code is great, but libs 
for dom implementation are out there and are free i dont know why 
not to use this natively  in rebol i feel like the stone age here, 
tell me if im wrong but i feel like a cave man doing my parsing like 
this. if the code is free and the implementation is easy why not 
to have this in rebol ? just because we can exeed 650k ?
because it was not invented here :(
I think it's fair to say that Carl is not fond on XML:

(And, to be precise, neither am I....But there is a lot of it out 
there, and REBOL needs to work with it better)
I still believe it can the DOM be implemented succinctly in Rebol, 
in a way that not only makes it easy for Rebollers to manipulate 
XML content, but makes Rebol a desireable tool to work with XML, 
XML is not a silver bullet rebol block are much powerfull than XML, 
thats if you'r dealing REBOL's only deployment, but when ic comes 
to manage interoperability things get a bit messy and confused.
I am with Chris here. XML may not be silver bullet, but you can do 
nothing if the other party decides to use and communicate using XML 
- you either can handle, or you can't - simple as that. You can argue 
with them about rebol and its blocks, they will not care :-)
XML as an interchange format is common, as Pekr says.....It many 
ways it is better than CVS files that we used to use.

But XML as a sort of toy in-memory database that can be updated with 
APIs like DOM -- well that is a lurch into a strange direction, and 
not one I'd he happy to take.
I'd never say XML was a silver bullet -- I wouldn't use Rebol if 
I did -- but it is a pain not to be able to do simple manipulation, 
especially when there is a standard method laid out for doing so.
Sunanda-  noone here talks about XML in-memory databases. XML databases 
are most of the time dirty tricks, as well as object ones ...
the thing is simple - you are ither able to read, change, store XML 
files, or not, simple as that .... so what Chris means is - being 
able to read XML into DOM like structure, then do something with 
particular fields, store it back into XML ...
I've thought a little about how to implement this, I see four main 
considerations -- parsing, internal representation, accessors, rendering. 
 1) can reuse RT's or Gavin's code, 2) objects? nested blocks?  how 
should this look?  3) functions tied to the objects in (2), or a 
dialect?  4) appears to be the easy part...
2) and to a lesser extent 3) are key to progressing.
I don't think it is a priority that 2) be moldable as we can take 
advantage of Rebol's 'quirks'.
3) -- xml [doc: load %file.xml elmt: doc/get-element-by-id "foo" 
elmt/tag-name: "p" save %file.xml doc] -- just one example of how 
it might work...
Chris - somehow I don't understand, what you are talking about here? 
You have to always to parse first, no?
Gavain' s code consists of two or so sections - first you parse into 
block representation, then you convert to object representation ...
Perhaps a hash for attributes and a block for contents. How would 
you represent namespaces?
Petr, it's best to know what format you're parsing to before you 
actually attempt to parse.  I'm making the assumption that the results 
of parse-xml, parse-xml+ and xml-to-object are unsuitable for manipulation.
oh, now I understand what did you mean. I thought you are trying 
to somehow "parse XML without actually parsing it", my bad :-)
Chris, you assume wrong. They may be a little awkward but standard 
path and block manipulations can be used on the structures generated 
by those parsers, as long as you stick to the overall structural 
Hypothetically, we stick to Gavin's block format -- how much work 
will it be to implement, say 'get-tags-by-name' , 'get-element-by-id', 
Maybe it would be good to look at Gabriele's Temple - he did only 
basic html parsing, but provided such code
in fact, I like his templating system and I don't want to allow any 
other kind of templating system, which does not respect my requirements. 
Temle is good here. Even Jaime likes it or so it seems :-)
It is certainly a Rebolish way to look at the XML data, I see a linear 
structure as being more manageable...
Maybe we could look at those, study the code and then start to talk 
of which way to go ...
what do you mean by linear structure? Block of blocks?
Block of objects.
Perhaps I don't understand Temple fully, but it doesn't so much manipulate 
an arbitrary XML file, rather pick and choose parts of a larger XML-based 
hmm, dunno of how to explain it. It simply parses XML, creates block 
of blocks structure. Then you have those functions like find-by-id, 
find-by-name, etc., which you can use to manipulate values ... then, 
once done, you generate XML. What I did not like is, that ti builds 
the structure from the scratch, so e.g. with html page, you loose 
nice formatting, comments etc. But others said, you could have pointers 
from such nodes to original doc and rebuild the doc properly ...
Objects aren't a good way to store XML values or even attributes. 
XML attribute names can be specified using characters that are difficult 
to use in REBOL words, like :, and you can't add and remove fields 
from objects at runtime. Hashes are better to store attributes, with 
keys and values of strings. Blocks are best to store element contents, 
with perhaps the none value to specify closed elements.
For what I had in mind, these fears are perhaps not appropriate. 
 Ill try and compose a quick example...
You might want to support namespaces like this:

["tag without namespace" "namespace" #[hash! ["attribute name" "attribute 
namespace" "attribute value" ...]] ["text" ["tag" ...] ...]]
You might even be able to replace attribute value strings with REBOL 
values if you implement XML Schema typing.
You could then represent other XML data items using a word in the 
tag spot and then type-specific contents. For example:
[comment "comment text"]
This is a convoluted as I'm faking the end document object (which 
would be created by a parse rule):
Consider the XML document:
<?xml version="1.0"?>
<foobar><foo:bar>Some Text</foo:bar></foobar>
The document would look a little like:
node-prototype: context [
    node-name: tag-name: ""
    node-value: ""
    node-type: 0
    child-nodes: []

foobar: make node-prototype [
    node-name: tag-name: "foobar"
    node-type: 1

bar: make node-prototype [
    node-name: tag-name: "foo:bar"
    prefix: "foo" local-name: "bar"
    node-type: 1
    parent-node: :foo

append foobar/child-nodes bar

text: make node-prototype [
    node-name: #text
    node-value: "Some Text"
    parent-node: :bar

append bar/child-nodes text

document: context [
    get-elements-by-tag-name: func [tag-name][
        remove-each element copy nodes [
            not equal? tag-name element/tag-name
    nodes: reduce [foo bar text]
Yes, it's big and bulky, but it is not intended for consumption by 
the user, any less than a View object is...
There are some typos there, but also a semblance of the document 
object working.
Using my structure, with empties for data not there:

["foobar" "" #[hash! []] [["bar" "foo" #[hash! []] ["Some Text"]]]]
or with the none value for data not there:
["foobar" none none [["bar" "foo" none ["Some Text"]]]]
There are advantages to either method.
If you have accessor functions premade for your structure, using 
the none value is better because it makes it easier to implement 
default values with any.
The strings would of course be unicode! when they finish implementing 
that data type.
Or UTF-8 now...
The contents of the string can be UTF-8 quite easily, although you 
will have to encode the higher characters yourself.
The imported characters would be fine (their integrity can be checked 
by the parse rule) but local Rebol higher characters would need to 
be vetted before inserting them...
Remember that objects in REBOL have a lot more overhead than blocks, 
and that XML documents can get quite large. Unless you are using 
an event-driven parser, every bit of memory you can save is a good 
REBOL isn't an object-oriented language you know...