r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[XML] xml related conversations

Graham
15-Aug-2009
[850]
Sounds like too much overhead ... unzip the docx, make changes to 
the xml portion and then rezip.
Janko
2-Jan-2010
[851]
I will need a xml parser .. I was thinkinf something fast and quick 
like sax style .. I found this one http://www.rebol.org/view-script.r?script=xml-parse.r
but by looking of it it seems to offer a lot of things I don't need. 
Has anyone used it for "serrious" xml parsing with it. I am thinking 
of making my own simple minimal event based xml parser.
Graham
2-Jan-2010
[852x2]
Yes, I have used it to parse large XML files
You can turn the xml file into a rebol object with it
Janko
2-Jan-2010
[854]
I imagine that is too costly .. I preferr the callback model to just 
extract the relevant data out
Graham
2-Jan-2010
[855]
Mine is a desktop application .. your needs for a web service differ 
..
Janko
2-Jan-2010
[856]
yes, I get a big xml made by "official" BLOATED standard for invoices 
.. I want to parse it as quick as possible and that's all
Geomol
2-Jan-2010
[857]
Janko,
http://www.fys.ku.dk/~niclasen/rebxml/rebxml-spec.html
http://www.rebol.org/view-script.r?script=xml2rebxml.r
http://www.rebol.org/view-script.r?script=rebxml2xml.r
Janko
2-Jan-2010
[858]
thanks Geomol, I will study  the links .. xml2rebxml seems short 
which is nice, but I haven't yet figured out what exactly rebxml 
is .. I am reading the first link you gave me
Robert
2-Jan-2010
[859]
Wouldn't it make a lot more sense to use a C based XML parser, construct 
a Rebol data-structure/string and return that to Rebol?
Geomol
2-Jan-2010
[860]
Janko, rebxml is a rebol version of xml. It can do the same things, 
but without the bad implementation, xml suffers from. The idea behind 
xml is ok, it's just not implemented well. Much of that is solved 
with the rebxml format.
Gregg
2-Jan-2010
[861]
I believe Maarten has done a SAX style parser.  I've used parse-xml 
in the past, sometimes post-processing the output to a different 
REBOL form, but my needs were simple.


Janko, have you tested any of the existing soluitions, with test 
input on target hardware, and found them to be too slow? If so, what 
were the results, and how fast do you need it to be?
BrianH
2-Jan-2010
[862]
SAX pull parsing would work well with the port model.
Janko
3-Jan-2010
[863x2]
Robert: it's a good idea but not for my case. I don't want the data 
strucure from whole xml , I want to stream it through parser and 
collect out the data. 

Geomol: I will look at it but probably not what I want in this particular 
case for the reason above

Gregg: I haven't tested any yet, I googled and found that xml-parse.r 
above , which has sax style of work but seems huge. I only care to 
support the simplified subset of xml, xml with all the variants is 
a total bloat so I believe it can be that complex (and it doesn't 
support 100% of it also).  Thats why I am considering writing a simple 
sax liek parser, I wrote it in c once and it was small (but it parsed 
even smaller subset of xml)
BrianH: What does that mean "port model"?
BrianH
3-Jan-2010
[865]
The semantic model of REBOL protocol schemes, implemented with the 
port! type, would fix well with the semantic model of SAX pull. SAX 
pull generates the same SAX events, except they are not propagated 
through callbacks - instead they are returned from function calls. 
SAX pull is sort of like an generator (in the Icon or Python sense) 
of SAX events. That is very similar in model to the behavior of command 
ports (like database ports).
Pekr
4-Jan-2010
[866]
I like SAX model, because IIRC it allows to work on things in a "streamed" 
way, whereas DOM requires you load everything in memory? Sorry if 
I oversimpilifed it :-) IIRC Doc used such aproach in his Postgress 
SQL driver, in opposite to his mySQL one ...
Dockimbel
4-Jan-2010
[867]
It's a matter of tradeoff, if you only need fast XML document reading, 
SAX is the winner. If you need to modify the document, you need DOM 
(with or without SAX).
james_nak
11-Oct-2010
[868]
Does anyone know if there is a rebol object to xml script. I've got 
xml to rebol objects but now I want to change it back to xml. (and 
I'm lazy)
GrahamC
11-Oct-2010
[869]
Lazy evaluation is useful.. a lazy programmer not so!
Maxim
12-Oct-2010
[870]
I use blocks, although a bit slower to access, they are faster for 
big loads cause thery require less ram and do not required binding 
which is a big issue on large XML blocks.
james_nak
12-Oct-2010
[871]
Yeah, what I am trying to do is convert back to XML after I've done 
my thing.
Maxim
12-Oct-2010
[872x2]
well, going to xml is easy no?
how are your objects structured?
james_nak
12-Oct-2010
[874]
Sorry for the delay. They are nested objects that represent the tags 
they were created from. I think the answer is that I will just have 
to create the routines to do what I wanted. I thought that perhaps 
there was something already out there. Thanks.
Maxim
12-Oct-2010
[875]
might find some inspiration in the JSON converters ?
james_nak
12-Oct-2010
[876]
Yes, if I run into any problems I will look into those.
Thanks Maxim. That renote app is really cool, btw.
Maxim
12-Oct-2010
[877]
thx it will improve about once a week.
Oldes
13-Oct-2010
[878]
It depends what's your input and how should look the output, but 
you can use something like that:
context [
	xml:  copy ""
	tabs: copy ""
	set 'to-xml func[node /init][
		if init [
			xml:  copy ""
			tabs: copy ""
		]
		switch/default type?/word node [
			object! [
				append tabs #"^-"
				foreach child next first node [
					append xml rejoin [tabs "<" child ">^/"]
					to-xml node/(child)
					append xml rejoin [tabs "</" child ">^/"]
				]
				remove tabs
			]
		][
			append xml rejoin [
				tabs "<" type? node ">" node "</" type? node ">^/"
			]
		]
		xml
	]
]
o: context [
	person: context [
		name: "bla"
		age:  1
	]
]


print rejoin [
	"<o>^/"
		to-xml o
	"</o>"
]
james_nak
13-Oct-2010
[879]
Thanks Oldes.
GrahamC
13-Oct-2010
[880x3]
this is something I wrote a couple of years back ... maybe it will 
help

obj2xml: func [obj [object!] out [string!]
	/local o
] [
	foreach element next first obj [
		if all [ not function? o: get in obj element o] [
                        ; not a none tag
				repend out [to-tag element]
				either object? o [
					obj2xml o out
				] [
					repend out any [o copy ""]
				]
				repend out [to-tag join "/" element]
		]
	]
	out
]
If there's a function in the object, it drops it.
posted last year to this group!  http://www.rebol.org/aga-display-posts.r?post=r3wp323x568
james_nak
14-Oct-2010
[883]
Thanks Graham. With the way that the programs I am using create the 
objects, I'm finding that I have to create something pretty specific. 
I appreciate the thought and code though.
GrahamC
3-Nov-2010
[884]
Is John's the only rebol utility that turns a rebol representation 
back into an xml document with attributes?
Maxim
3-Nov-2010
[885]
check your PMs... (it might not hav turned red.... altme bug)
Maxim
10-Nov-2010
[886x2]
A question for XML users related to namespaces.


is it possible for a tag's attributes to originate from two different 
namespaces?

ex:
<tag  ns1:attr="data" ns2:other-attr="data">

or even worse:
<tag  ns1:attr="data" ns2:attr="data">


my gut tells me no, but I've been wrong before in this delightfull 
world of XML spec overcomplexification .
FYI, I've just discovered that yes... you can have the same attribute 
several times in a tag so long as the namespace is different.

XML is ... so ... much ... fun....


NOT!
Gregg
10-Nov-2010
[888]
I don't like XML, but it makes sense that namespaces prevent collisions.
Maxim
10-Nov-2010
[889]
yes, its just pretty complicated to manage two attributes of a tag 
which come from different namespaces... so in the end, what does 
the attribute really mean.
Gregg
10-Nov-2010
[890]
Well, if the words has bindings...oh wait, wrong language. ;-)
Maxim
10-Nov-2010
[891]
hehe, yes it is similar to advanced word usage in REBOL.  but xml 
isn't really a language in my understanding (interpretation) of the 
word.
Steeve
10-Nov-2010
[892]
just a dialect with a bad messy syntax...
Oldes
13-Nov-2010
[893x3]
I just created this function to convert the data tree returned from 
REBOL's default parse-xml function back to the same string:
context [
	out: copy ""
	emitxml: func[dom][
		foreach node dom [
			either string? node [
				out: insert out node
			][
				foreach [ name atts content ] node [
					out: insert out join {<} [name #" "]
					if atts [
						foreach [att val] atts [ 
							out: insert out ajoin [att {="} any [val ""] {" }]
						]
					]
					out: remove back out
					
					either all [content not empty? content] [
						out: insert out #">"
						emitxml content
						out: insert out ajoin ["</" name #">"]
					][
						out: insert out "/>"
					]
				]
			]
		]
	]
	set 'xmltree-to-str func[dom][
		clear head out
		emitxml dom
		head out
	]
]
>> xmltree-to-str third parse-xml {<test arg="1"><bla/>hello</test>}
== {<test arg="1"><bla/>hello</test>}
I'm not sure if the naming is correct, but I don't care, I need the 
functionality.
GrahamC
13-Nov-2010
[896]
Does it handle name spaces?
Oldes
14-Nov-2010
[897x3]
do you mean this?:

>> print xmltree-to-str third parse-xml {<h:table xmlns:h="http://www.w3.org/TR/html4/">
{      <h:tr>
{        <h:td>Apples</h:td>
{        <h:td>Bananas</h:td>
{      </h:tr>
{    </h:table>}
<h:table xmlns:h="http://www.w3.org/TR/html4/">
  <h:tr>
    <h:td>Apples</h:td>
    <h:td>Bananas</h:td>
  </h:tr>
</h:table>
But REBOL's default parse-xml has limitations, so better use Gavin's 
http://www.rebol.org/view-script.r?script=xml-parse.rif you must 
parse some advanced XML doc's.
Also it's probably possible to use parse instead of the recursive 
function call, but it's working for me so I will stay with this one.