World: r3wp
[REBOL Syntax] Discussions about REBOL syntax
older newer | first last |
Andreas 19-Feb-2012 [296] | We are not expanding anything :) We are just describing what syntactical rules the REBOL email! literal syntax follows. |
BrianH 19-Feb-2012 [297] | I'm a little more concerned with R3 URL syntax though, since in that case there are real bugs that have already affected people in real cases, and because hypothetically a lot of the bugs are fixable in mezzanine code. |
Andreas 19-Feb-2012 [298] | And as the email! datatype can be used for many a purpose within dialects, it does not necessarily have to match RFC822 (or rather 5322) exactly. |
Steeve 19-Feb-2012 [299] | but the syntax checking can't be corrected witth mezzs right ? |
Andreas 19-Feb-2012 [300] | (Which would be a relatively complex problem anyway ... http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html) |
BrianH 19-Feb-2012 [301x2] | Steeve: For emails, no. For urls, yes. |
For url! the syntax checking is mostly done by the DECODE-URL mezzanine. We can't change what is recognized as a url! by REBOL, but we can change how the data is treated once it's recognized. There are errors in escape handling, for instance. | |
Steeve 19-Feb-2012 [303] | Corrected version, works with R2 and R3: escape-uri: [#"%" 2 hex-digit] email-char: complement union charset {%@:} termination-char email-esc: [email-char | escape-uri] email-syntax: [ [ #":" any [email-esc | #":" ] #"@" any [email-esc | #":" ] | not #"<" some email-esc #"@" any email-esc ] termination ] |
Andreas 19-Feb-2012 [304] | Ah, was wondering. So we can't change the syntax or url!s in R3 as well, we can only improve/bugfix url! handling. |
BrianH 19-Feb-2012 [305] | You'd be surprised at how flexible the syntax of url! is in R3 :) |
Andreas 19-Feb-2012 [306] | I don't think I would. |
BrianH 19-Feb-2012 [307x2] | Fair enough. But if you can figure out exactly hor MOLD handles escaping of urls, that would help narrow down what bugs we can fix in DECODE-URL. |
hor -> how | |
Andreas 19-Feb-2012 [309] | I would be slightly surprised if it is more flexible than string syntax, but I somehow doubt that :) |
BrianH 19-Feb-2012 [310] | Fewer escaping methods, so no. What's weird is that some kinds of string escaping work for the file! type. |
Steeve 20-Feb-2012 [311] | It's calm here |
Ladislav 20-Feb-2012 [312x2] | committed a couple of 1903-5 additions. You were right that #1905 is ugly, Steeve. |
Caught up with the code posted above. | |
Steeve 23-Feb-2012 [314x5] | url! syntax (both R2,R3) I've not created specific charsets, so the rule is more verbose. - The first char! same as for word! (less "+-") - Must contain at least one ':' - "/" Allowed only after the first ":" - Escape-uri allowed like in email! url-syntax: [ not digit not #"'" not sign word-char any [escape-uri | not termination-char not #":" skip] #":" any [escape-uri | #"/" | not termination-char skip] ] |
Forgot the case when it begins with '"." I should have stick with the word-syntax much closer | |
url-syntax: [ [#"." not digit | not digit not #"'" not sign word-char] any [escape-uri | not termination-char not #":" skip] #":" any [escape-uri | #"/" | not termination-char skip] ] | |
hum... still wrong | |
url-syntax: [ not [digit | #"'" | #"." digit | sign] word-char any [escape-uri | not termination-char not #":" skip] #":" any [escape-uri | #"/" | not termination-char skip] ] | |
BrianH 23-Feb-2012 [319x3] | That's a good start! I'm really curious about whether ulrs and emails deal with chars over 127, especially in R3. As far as I know, the URI standards don't support them directly, but various internationalization extensions add recodings for these non-ASCII characters. It would be good to know exactly which chars supported in the data model, so we can hack the code that supports that data to match. |
When last I checked, R3 considers all chars over 127 to be word-chars. It is considered to be non of REBOL's business whether a printer or display would show the character, so that even includes the additional Unicode space and control characters beyond ASCII. R3 has a binary parser, you see. | |
non of -> none of | |
Steeve 23-Feb-2012 [322] | yeah |
BrianH 23-Feb-2012 [323] | Do you know if the REBOL syntax parser (LOAD and TRANSCODE) handles the unescaping and puts the decoded data into the url! structure, or if that is handled by the DECODE-URL mezzanine code? I'm hoping it's handled by the mezzanine, because it's broken in both R2 and R3 and mezzanine changes are the only kind we can make at the moment. |
Maxim 23-Feb-2012 [324x3] | AFAICT it's part of the datatype... since a space will go back and forth when you go to/from URL! and other types like string (in R2 at least): >> to-url "gogo://a.com/space here" == gogo://a.com/space here >> to-string gogo://a.com/space here == "gogo://a.com/space here" |
or did I get you wron? | |
wrong | |
Steeve 23-Feb-2012 [327] | Brian, Can you show me what is broken ? I'm a bit unsettled by your concern |
BrianH 23-Feb-2012 [328x3] | The escape decoding gets done too early. The decoding should not be done after until the URI structure has been parsed. If you do the escape decoding too early, characters that are escaped so that they won't be treated as syntax characters (like /) are treated as syntax characters erroneously. This is a bad problem for schemes like HTTP or FTP that can use usernames and passwords, because the passwords in particular either get corrupted or have inappropriately restricted character sets. IDN encoding should be put off until the last minute too, once we add support for Unicode to the url handlers of HTTP, plus any others that should support that standard. |
Given that the URI structure is parsed by DECODE-URL (or the R3 equivalent), that means that any unescaping should be done in that function, or in the scheme handler itself, not in the native code that runs before the mezzanine code is called. | |
Re-escaping in MOLD is OK though. It's the input that's the problem, not the output. | |
Maxim 23-Feb-2012 [331] | yep... and I've lost hours trying to get some ftp code to work because it had strange urls (with passwds)... which the interpreter would break all the time. At some point you are mystified by what is the actual URL being sent to the server. once you see what is going on, you can get it to work, but realizing that you didn't actually send the url you expect, can take quite a long time to realize and properly fix once you've got a whole app expecting/playing with urls. |
BrianH 23-Feb-2012 [332] | I've been hoping to fix that. I can load a hot-patch into R2, and include a patch in a host kit build in R3 or replace functions from %rebol.r if necessary. |
Steeve 23-Feb-2012 [333x5] | Ok I try to resume our concern. The url! and email! syntax is more permissive than a valid URI. It's not a problem nor a design flaw. The escape decoding should not be done at all when decoded as a part of an url! or email!. Right, but it will not be corrected until Carl does it. DECODE-URL can be rewritten (used by schemes). The parser is too strict and can't deal with complex forms. |
Lot of inconsistencies with file! datatype between R2 and R3. Escaping notation = huge mess | |
you can use 2 forms for file! : in R2 - %"*" quoted sting file, with ^ escape notation allowed - %* Form with %ff escape notation allowed in R3 - quoted string file works fine - in the %* form, the % escape notation works fine but the ^ char mess up things in some cases without issuing an error | |
In the %* form, R3 should recognise the ^ char as a normal char (not one escaping notation) as R2 does. | |
So for the moment; I think it's better to reject the ^ char in the R3 syntax | |
Maxim 23-Feb-2012 [338] | yeah, its surely some left over copy/paste code from the string loader, left in the file loader by error. |
BrianH 23-Feb-2012 [339x3] | Worse than being a huge mess, R2 and R3 have different messes. R2 MOLD fails to encode the % character properly. R3 chokes on the ^ character in unquoted mode, and allows both ^ and % escaping in quoted mode, and MOLDs the ^ character without encoding it (a problem because it chokes on that character). Overall the R2 MOLD problem is worse than all of the R3 problems put together because % is a more common character in filenames than ^, but both need fixing. I wish it just did one escaping method for files, % escaping, or did only % escaping for unquoted files and only ^ escaping for quoted files. % escaping doesn't support Unicode characters over 255, but no characters like that need to be escaped anyways - they can be written directly. |
R2 file! syntax may have more problems that I'm not aware of though. | |
I guess that I just want the escaping behavior Steeve described for R2, but with the MOLD of %%25 fix from R3, along with % by itself being interpreted as and molding as %"". | |
Steeve 24-Feb-2012 [342x4] | file-char: complement union charset {%:@} termination-char file-char/#"/": true ;** #"/" added file-syntax: [ #"%" [ quoted-string | any [file-char | escape-uri] ;** fail on ^ char ] termination ] alternative-syntax R2 file-syntax: [ #"%" [ quoted-string | some [file-char | escape-uri | #"^^"] ;** ^ valid char ] termination ] |
Missing rules... path! refinement! date! time! Anything else ??? | |
pair! | |
Sources https://github.com/rebolsource/rebol-syntax | |
older newer | first last |