World: r3wp
[XML] xml related conversations
older newer | first last |
Christophe 7-Nov-2005 [266x2] | Geomol: you've done a great job with your rebxml. But we really need some kind a dialect to easilly acces nested data. Like Xpath... I need to be able to say get-data [//*/bbb/ccc[@id='geek']] and get the info. I think xpath have a great notation for that (and a standard). So e have to find the format wich best fit this dialect... |
I was fighting today to find the best internal data format. Out of the tests seems object! the most performant when using nested data structure. hash! when not nested. but the problem with object! is that we cannot have a recurrent element in the structure, like: <aaa> <bbb>content</bbb> <bbb bbb_attrib="attrib1"></bbb> </aaa> because, of course, when evaluated the last definition of bbb overrides the others. So, we are trying to work with hash! We got a little diminution of the overhead comparing to XML, but the processing time compare to block! seems from 10 to 20% more. I need some more tests about data retrieving in the structure to find the right combination; Any suggestion is welcome ! | |
Volker 7-Nov-2005 [268] | A rough idea: Maybe like vid does it? /color /colors ? it puts the first color in color if there is only one. if there are more, they are put in /colors-block . |
Christophe 7-Nov-2005 [269] | I do not get where you gain in performance? Or do i get it wrong ? |
Volker 7-Nov-2005 [270x3] | because you can use an object as long as there is only one value. But not sure if that helps. |
but 10-20% is not much anyway. | |
And with blocks there is a better chance to use rebcode? | |
BrianH 7-Nov-2005 [273] | Or for that matter, block parsing. |
Christophe 7-Nov-2005 [274x2] | Volker: i got your point. I don't know yet. I will study it tomorrow. |
rebcode could be an issue. But still under development .. | |
Gregg 7-Nov-2005 [276] | Should this group be web public? |
Pekr 7-Nov-2005 [277] | Gregg - I think no problem here to make it web-public ... |
Gregg 7-Nov-2005 [278] | Done. |
Christophe 7-Nov-2005 [279] | Gregg: as fast as lightning :-) |
Geomol 7-Nov-2005 [280] | He's like a Marvel Super Hero! :-) |
Volker 7-Nov-2005 [281] | Hat-man? :) |
Graham 7-Nov-2005 [282] | lol |
MichaelB 7-Nov-2005 [283] | carsten: I should have kept my mouth shut about XOM and asked you before :-) the port-idea was just that a thought - in any case if one wants to use a dialect there has to be an entity to interpret the dialect, whether that's an function or something else doesn't matter, but a port seams to be a common rebol entity to encapsulate things - that's why I thought it would maybe even make sense to use a port as abstraction .... opening a port to an xml file and the port will parse it in whatever way - by sending (inserting) a dialected block into the port the xml document could be worked on - at least from the users point of view one wouldn't have to handle the xml-code-block/rebol code block separetely - even though it might be nice to access it directly .... well maybe I have too little glue about ports so the idea might not make too much sense when I forgot about some important drawbacks and the like |
CarstenK 7-Nov-2005 [284x3] | to michael: maybe you can show some rebol pseude code, how to read all chapters from a book.xml file, so we had some nice use case to think about |
... using a XML port | |
to John (or geomol), first I've got the following error: >> my-cdoc: xml2rebxml/preserve read %short.xml ** Syntax Error: Invalid word -- --> ** Near: (line 9) --> So I replaced insert tail output load join "<!--" data with insert tail output join "<!--" data and it works fine with my files! You were right, the replacements in text nodes are only & > <. In attributes we need to escape the other 2 entities as allready done by you. | |
MichaelB 7-Nov-2005 [287] | carsten: I have to think about it ... quite some time I even used a java xml library |
CarstenK 7-Nov-2005 [288] | Some more ideas: I think the idea behind rebxml is great - build some common format representing xml in REBOL blocks. Some more ideas/wishes: - maybe rebxml could be changed to ignore ignorable whitespaces, thats all whitespace between elements like line feeds, indention (beside elements with xml:space="preserve"), the block would be much smaller, but so the rebxml2xml script requires maybe a refinement /prettyprint with automatic indention - I think rebxml is a great idea, but for easier parsing maybe some words would help that indicate the beginning of special nodes like [elem "chapter" attribs [name "value" id "0815"] [ elem "sect" attribs [ id "5x12"] [ ....]] does it make sense? |
Geomol 7-Nov-2005 [289x2] | Yes, it makes sense. I'll think about it, before I answer. |
Carsten, I think, your removal of LOAD in the error solution, you posted, does lead to some problems. But there also is a problem with the script, as it is now. I'm doing some investigation. | |
CarstenK 7-Nov-2005 [291] | Is there some test script in rebol like Junit for java, so we could assemble some automated tests with different xml files? |
Volker 7-Nov-2005 [292] | something called runit exists AFAIK. But i never understood what the advantage in regard to rebol is. i can just write a testscript and call it? |
yeksoon 7-Nov-2005 [293] | think there is one.. rebol-unit.. http://vydra.net/rebol-unit/rebol-unit.html never use it though |
CarstenK 7-Nov-2005 [294] | But if you have 10 or more you can collect them, maybe they print some report (time, errors etc.) and you avoid things like this: carstens removes a "load", it works for him, but breaks another piece of code. And often nobody writes test scripts/code. And the test scripts, if available, are always a good code base to learn how the real script should be used. I'll look into rebol-unit (but only tomorrow)... |
Volker 7-Nov-2005 [295x2] | foreach file scripts[ call/wait file ] and in each script: echo on print "Test1" .. -> report |
together with a bit unix for copy/deep test-directories and a diff later. | |
Geomol 7-Nov-2005 [297] | Carsten, I tried to handle comments internal in RebXML as the tag! datatype, but there seem to be a problem with tags containing newlines, other tags, etc. as a comment in XML can. So my solution doesn't work. Now I consider, if comments should be stored as strings in RebXML, but then there's the problem to distinguish them from data strings. |
Volker 7-Nov-2005 [298] | files and such can be abused as strings too. |
Geomol 7-Nov-2005 [299] | A solution could be to do, as you suggested with node words (elem, attribs), which could be extended with the word: comment |
Christophe 7-Nov-2005 [300] | More recent and up-to-date (and used by the french community) is RUn : http://rebol-unit.sourceforge.net/ |
Geomol 7-Nov-2005 [301] | But that'll add to the size. I like RebXML to take up minimal space. |
Christophe 7-Nov-2005 [302] | > Some more ideas: I think the idea behind rebxml is great - build some common format representing xml in REBOL blocks. Some more ideas/wishes: > nodes like [elem "chapter" attribs [name "value" id "0815"] [ elem "sect" attribs [ id "5x12"] [ ....]] Our first solution (actually the one we're now using in production) was similar to that. But it brings a lot of ovehead to the data and the data adressing is far to be intuitive : aaa/elem/bbb/elem/ccc/attribs/name instead of aaa/bbb/ccc/name for instance. Not the most suitable solution as we experimented. |
Geomol 7-Nov-2005 [303x2] | I agree. I think, if comments are to be handled in RebXML, they should be represented as strings. Then the hurdle to distinguish them from data strings has to be solved. |
It would be triviel to parse a RebXML block and add the node names (elem, attribs and comment), if that format is desired, but RebXML itself should be with as little overhead as possible. | |
Christophe 7-Nov-2005 [305] | Geomol: why do you need to handle comments ? Aren't they there to facilitate the _reading_ of the XML code ? You'd not need them if you want to manipulate the data, right? |
Geomol 7-Nov-2005 [306] | Right, but Carsten asked for comments, so: output: rebxml2xml xml2rebxml <XML file> will make output the same as the original XML input. |
Christophe 7-Nov-2005 [307x2] | BTW, we called our project (not having find a better name): EasyXML. Just for the record :-) |
Ok, Geomol, I missed the point | |
Volker 7-Nov-2005 [309] | how about using some extra char? elem! attrib? aaa!/bbb!/ccc?/name ? |
Christophe 7-Nov-2005 [310] | In this case, perhaps you could consider the comments as a special case of an empty tag, marking it with an heading "--" for example. It would not create a lot of overhead i think |
Geomol 7-Nov-2005 [311] | I need to sleep on it. :-) |
CarstenK 8-Nov-2005 [312] | Christophe: Thanks for the rebol-unit link, how different is EasyXML from rebXML? Another question: how near to XML 1.0 should the REBOL implementation be? If yes, so the block format needs a document block with doctype information and children (elements, text, comments, processing instructions and attributes) and of course namespaces. How about DTD support and external entities like this: <?xml version="1.0"?> <!DOCTYPE root [ <!ENTITY test SYSTEM "external.xml"> ]> <root> &test; </root> They don't need to be preserved but should be resolved. Geomol: I fully agree with you, to have a small format, but I think it would be nice if it supports the basic XML nodes. These are only my wishes of course ..., maybe we don't need extra words for elems and attributes, only for comments or PIs as special types of element children? |
Geomol 8-Nov-2005 [313] | Carsten, I've uploaded new versions of the RebXML scripts to: http://home.tiscali.dk/john.niclasen/rebxml/ Comments are now handled as strings, they are simple preserved without modifications, and in rebxml2xml I then check for "<!--" in the start of the string to distinguish them from other string data. Sending xml-data through first xml2rebxml and then rebxml2xml should only change white-space within tags. Try the new versions and let me know, if it works. |
Christophe 8-Nov-2005 [314x2] | Carsten: "how different is EasyXML from rebXML?" I don't know :-) The most of our REBOL development is conditioned by the need of my job. Now I need an easy way to access to the parsed data. Xpath is an easy way. So we are creating a structure which facilitate the access to nested data. And it's fun :-) Now it could be john create something similar, and that we like it and adopt it. Who knows ? |
Has anybody think about a rigth data structure to use with a SAX-implementation ? I was thinking of the hash! and its performence for level 1 data retrieval. Perhaps an appropriate data structure could be a binary array labeling each element with a concatenation of the access path. Like this: <aaa attaaa="aaa1"><bbb>contentbbb</bbb></aaa> becomes make hash! [aaa id2 aaa-attaaa "aaa1" aaa-bbb "contentbbb"] based on a mapping table make hash! [id1 aaa id2 bbb] or something similar... just a rough though ! | |
older newer | first last |