XML-processor toy. Or: "RFC"
[1/3] from: christian::ensel::gmx::de at: 18-Nov-2000 1:01
Hello list,
looking thru the hundreds of read and unread posts to this list, XML and
REBOL's inbuild XML 'support' is mentioned at least every some days.
Someone - wasn't it Andrew? - wanted to convert XML's DTDs to REBOL's parse
rules. I must have overseen the :) which followed this idea ...
Okay. Instead of preparing for an exam, I played with XML and read about
it on w3.org. The grammar specified there inspired me to convert it to a
parse dialect, which isn't that hard as I first thought. In it's current
state it's far from being complete - but it does some cute little things
which give a hint to what it can do some day in far future.
>> xml-data: {<?xml version="1.0" standalone="yes"?>
<!DOCTYPE test [
<!ENTITY ME "Christian Ensel">
<!ELEMENT che:money ANY>
<!ATTLIST che:money che:currency CDATA "USD"
che:amount CDATA #REQUIRED
]>
<space:name>
This is some text typed by &ME;.
<che:money xmlns:che = "http://www.foo.bar"
che:amount = "0.02"
che:currency = "USD"
My two cents someday?!?
<element attribute="<1>" />
</che:money>
</space:name>}
>> xml/process xml-data
This results in the following object tree, far from beeing complete,
but IMHO some very cute things work already (e.g. declaring Entities,
see the marker ^^^^^^):
>> probe xml/the-Document
make object! [
name: none
attrs: []
content: [
make object! [
name: "space:name"
attrs: []
content: [
"^/ This is some text typed by "
"Christian Ensel"
^^^^^^^^^^^^^^^^^
".^/ "
make object! [
name: "che:money"
attrs: [
make object! [
name: "xmlns:che"
value: "http://www.foo.bar"
]
make object! [
name: "che:amount"
value: "0.02"
]
make object! [
name: "che:currency"
value: "USD"
]
]
content: [
"^/ My two cents?!?^/ "
make object! [
name: "element"
attrs: [
make object! [
name: "attribute"
value: "<1>"
^^^^^
]
]
content: []
]
"^/ "
]
]
"^/"
]
]
]
]
It's fun working with PARSE , even though I'm strongly missing some
features which would help a lot, e.g. a NOT keyword or the possibility
to parse a string TO ["<" | "&" | "]]>"]. Things like that ...
The processor recognizes tags which aren't nested correctly, but is very
strict in this - it simply stops execution.
I'm very busy these days, so I'll make only little steps in next days,
but I will appreciate any comments on the idea to parse. Because
I'm a little bit uncertain on some design decisions :) I'm not even sure
if processing XML is a task where REBOL is well suited for (thinking
about things like UNICODE etc.).
As I already said, comments, please ;)
Attached you find the most recent version. I guess in it's current state
it does some 2 or 3 % of what a XML processor should do, and the code
looks (and is, I guess) very ugly :(
Hint: calling XML/PROCESS with the refinement /APPLY-RULES and the name
of one of the rules in the XML object (simple the word, no lit-word, no path)
allows for testing single rules.
As in
>> xml/process/apply-rule {<?xml?>} Prolog
== true
But you'll probably end up with
== false
more often ...
As I already said, comments, please ;)
Regards
Christian
[Christian--Ensel--GMX--De]
-- Attached file included as plaintext by Listar --
-- File: xml-processor.r
;######################################################## REBOL XML-Processor ##
;
ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ
REBOL [
title: "XML-Processor"
author: "Christian 'CHE' Ensel"
email: [christian--ensel--gmx--de]
date: 16-Nov-2000
version: 0.0.4
]
;xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx XML xx
; ŻŻŻ
XML: make object! [
;=============================================================== SETTINGS ==
; ŻŻŻŻŻŻŻŻ
comments?: no
validate?: [ yes | no ]
the-application-wants-comments: true
the-application-wants-no-comments: false
;======================================================= HELPER-FUNCTIONS ==
; ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ
MAKE-TAG: does [ make object! [ name: none attrs: make block! 0 content:
make block! 0 ] ]
MAKE-NAMESPACE: function [ the-NSPrefix [string!] the-NSTarget [string!] ] []
[ repend the-Namespaces [ the-NSPrefix the-NSTarget make block! 0 ] ]
QNAME-PREFIX: function [ a-QName [string!] ] [ the-Namespace ] [ if equal?
2 length? the-Namespace: parse a-QName ":" [first the-Namespace] ]
QNAME-LOCALPART: function [ a-QName [string!] ] [ the-Namespace ] [ either equal?
2 length? the-Namespace: parse a-QName ":" [second the-Namespace] [first the-Namespace]
]
SAME-NAME?: function [ a-QName b-QName ] [ a-NSTarget b-NSTarget a-NSName
b-NSName ] [ a-NSTarget: select the-Namespaces qname-prefix a-QName b-NSTarget: select
the-Namespaces qname-prefix b-QName a-NSName: qname-localpart a-QName b-NSName: qname-localpart
b-QName (equal? a-NSName b-NSName) and (equal? a-NSTarget b-NSTarget) ]
;======================================================== DATA-CONTAINERS ==
; ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ
the-Document: none
the-Tags: none
the-Tag: none
the-EntityRefs: [ "&" ("&") | "<" ("<") | ">" (">") | """ ({"}) | "'"
("'") ]
the-PEReferences: [ "%DEBUG;" ("DEBUG") ]
the-Namespaces: ["xml" "http://www.w3.org/XML/1998/namespace" [] "che" "http://www.che.de"
["book" "title" "isbn" "author" "price"] "ensel" "http://www.che.de" ["book" "title"
"isbn" "author" "price"] "w3c" "http://www.w3.org" ["book" "title" "isbn" "author" "price"]
]
;======================================================= PROCESS xml-data ==
; ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ
PROCESS: function [data [string!] /APPLY-RULE 'rule [word!] ] []
[
the-Tag: the-Document: make-tag
append the-Tags: make block! [] the-Tag
either apply-rule
[
parse/all/case data get in self rule
][
parse/all/case data Document
]
]
;---------------------------------------------------------------------------
;====================================================== GENERIC DTD RULES ==
; ŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻŻ
; Rules like these will be generated automatically some day. Or some
; other approach will be choosen.
the-Amount-AttRule: [ "amount" Eq AttValue ]
the-Currency-AttRule: [ "currency" Eq [ "'" [ "DEM" | "OES" | "SFR" ] "'" | {"} [
"DEM" | "OES" | "SFR" ] {"} ] ]
the-Money-ElemRule: [ "<money" S the-Amount-AttRule S the-Currency-AttRule Opt S
"/>" ]
;================================================================ GRAMMAR ==
{ ŻŻŻŻŻŻŻ
[Nş] ---------- http://www.w3.org/???
|
| Conventions:
| ŻŻŻŻŻŻŻŻŻŻŻŻ
| · A number in format x.y denotes a rules added by me
| wich acts as a helper to the rule x
|
| · Rules which are "terminal" rules in some sense are
| specifying nothing but charsets. In opposition to the
| official XML grammar these rules' names end with
| an exclamation mark - so I can use them as if they
| were REBOL datatypes.
|
| · Results of a rule name 'FooBar usually are to be
| kept in a word named 'the-FooBar .
|
| [Nş] - http://www.w3.org/TR/1999/REC-xml-names-19990114
| |
| |
| }
[01] Document:
[
Prolog
Element
any Misc
]
[03] S:
[
copy the-S some WhiteSpace!
]
[03.1] WhiteSpace!: charset [ " ^-^/^M" ]
[04] [05] NCNameChar!:
[
Letter!
|
Digit!
|
#"."
|
#"-"
|
#"_"
|
CombiningChar!
|
Extender!
]
[04] NameChar!:
[
Letter!
|
Digit!
|
#"."
|
#"-"
|
#"_"
|
#":"
|
CombiningChar!
|
Extender!
]
[05] [04] NCName:
[
copy the-NCName
[
[
Letter!
|
#"_"
]
any NCNameChar!
]
]
[05] [06] QName:
[
copy the-QName
[
opt
[
Prefix
":"
]
LocalPart
]
]
[05] Name:
[
copy the-Name
[
[
Letter!
|
#"_"
|
#":"
]
any NameChar!
]
]
[07] Nmtoken:
[
copy the-Nmtoken some NameChar!
]
[09.1] EntityValueChar!: complement charset {%&"}
[09] EntityValue:
[
(
the-EntityValue: make string! 0
)
[
{"}
any
[
Reference
(
append the-EntityValue the-Reference
)
|
PEReference
(
append the-EntityValue the-PEReference
)
|
copy the-EntityValueChar EntityValueChar!
(
append the-EntityValue the-EntityValueChar
)
]
{"}
|
"'"
any
[
Reference
(
append the-EntityValue the-Reference
)
|
PEReference
(
append the-EntityValue the-PEReference
)
|
copy the-EntityValueChar EntityValueChar!
(
append the-EntityValue the-EntityValueChar
)
]
"'"
]
]
[10.1] AttChar!: complement charset {<&"}
[10] AttValue:
[
(
the-AttValue: make string! 0
)
[
{"}
any
[
Reference
(
append the-AttValue the-Reference
)
|
copy the-AttChar AttChar!
(
append the-AttValue the-AttChar
)
]
{"}
|
"'"
any
[
Reference
(
append the-AttValue the-Reference
)
|
copy the-AttChar AttChar!
(
append the-AttValue the-AttChar
)
]
{"}
]
]
[11] SystemLiteral:
[
copy the-SystemLiteral
[
{"}
any SystemChar!
{"}
|
"'"
any SystemChar!
"'"
]
]
[11.1] SystemChar!: complement charset {"}
[12.1] Pubid:
[
"PUBLIC"
S
Public-ID-Lit
]
[12] PubidLiteral:
[
copy the-PubidLiteral
[
{"}
any
[2/3] from: al:bri:xtra at: 18-Nov-2000 13:39
Christian wrote:
> Someone - wasn't it Andrew? - wanted to convert XML's DTDs to REBOL's
parse rules. I must have overseen the :) which followed this idea ...
Here's the smileyface:
:-)
> > ...I haven't got time tonight/this morning to produce a full XML
solution that uses objects. Maybe next week.
> It's fun working with PARSE , even though I'm strongly missing some
features which would help a lot, e.g. a NOT keyword or the possibility to
parse a string TO ["<" | "&" | "]]>"]. Things like that ...
I agree that 'to in the 'parse dialect does seem unnecessarily restricted. I
keep banging my head against 'to's limitations.
Now it would be nice to have a Rebol object! to XML converter.
Andrew Martin
ICQ: 26227169
http://members.nbci.com/AndrewMartin/
[3/3] from: brett:codeconscious at: 18-Nov-2000 13:17
I've been musing on this recently. Funnily enough I've been building my own
xml-processor toy like Christian's though I haven't got as far yet as
actually having it do something.
I wanted to parse the xml-dtd to create a rebol object that would in turn
offer two functions. The first to parse the actual xml document into a Rebol
form. The second to convert the Rebol form back to an xml document. Using a
seperately generated object gets around the problem of mimicking xml into
Rebol (eg. attributes / contents) and skirts around the issue of having one
right
Rebol form for holding xml data. So the Rebol form could be a wordy
dialect, nested objects, block structure suitable for use with paths, or
combination of these. Whatever is the most useful.
Unfortunately I haven't quite got the time to see this through at the
moment.
Brett.