problems with url...

[1/10] from: cyphre:seznam:cz at: 18-Jun-2002 17:48

Hi List, I have this problem, how to 'read following url from rebol? http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8 Anyone ? regards, Cyphre

[2/10] from: gscottjones:mchsi at: 18-Jun-2002 14:14

From: "Cyphre"

> I have this problem, how to 'read following url from rebol? > > http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8 > > Anyone ?

Hi, Cyphre, It is easier to explain how to bypass the problem than to explain where the real problem lies. :) The problem seems to be that the percent sign can be used to escape hex coded characters. The dehex-ed character for C8 is �. When the url is entered, the interpreter immediately substitues the character "�" for "%C8". However, the url parser will no longer parse the entire url, because "�" is not a part of its rules. Probing the http scheme *after* a failed read shows that the file portion contains the fragment "?co=naslepo&kde=A-", indicating to me that it failed at the next character, which *it* thinks is � instead of "%" (followed by "C8", of course). The way to work around the problem is to do something like the following: read rejoin [http://slovnik.nettown.cz/?co=naslepo&kde=A- "%C8"] which then returns the page. What I am unsure about is exactly "where" the problem lies? Is it that some urls contain hex encoded characters and that REBOL improperly translates the results in an incorrect manner? I do not know for sure. I am not sure why my work-around works! Unfortunately, I am out of time to explore the problem further right now. Hope this helps a bit anyway. --Scott Jones

[3/10] from: rotenca:telvia:it at: 18-Jun-2002 22:47

Hi, Cyphre

>I have this problem, how to 'read following url from rebol? >http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8

read to-url "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8" == {<!doctype html public "-//w3c//dtd html 3.2 final//en">... But load fails: read load "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8" ** User Error: URL error: http://slovnik.nettown.cz/?co=naslepo&kde=A-� ** Near: read load "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8" I do not know why. Any ideas? --- Ciao Romano

[4/10] from: gscottjones:mchsi at: 18-Jun-2002 16:51

From: "G. Scott Jones"

> The way to work around the problem is to do something like the following: > > read rejoin [http://slovnik.nettown.cz/?co=naslepo&kde=A- "%C8"] > > which then returns the page.

Responding to self: or also the following: read to-url "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8" --Scott Jones

[5/10] from: joel:neely:fedex at: 18-Jun-2002 17:28

Hi, Romano, No really useful suggestions, but... Romano Paolo Tenca wrote:

> Hi, Cyphre > >I have this problem, how to 'read following url from rebol?

<<quoted lines omitted: 6>>

> ** Near: read load "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8" > I do not know why. Any ideas?

I notice that LOAD seems to be doing something interesting with that percent-escaped character at the end, and possibly transforming it into a character that's not legal for a URL (prematurely?).

>> read to-url "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8"

>> read load "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8"

** User Error: URL error: http://slovnik.nettown.cz/?co=naslepo&kde=A-� ** Near: read load "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8" Notice that the message is User Error: URL error: ... and not Syntax Error: Invalid url ... as in

>> read load "http://%"

** Syntax Error: Invalid url -- http://% ** Near: (line 1) http://% implying to my eye that LOAD was happy but the result wasn't usable by READ. When I say something like

>> gorp: "http://%77%77%77.rebol%2ecom/"

== "http://%77%77%77.rebol%2ecom/"

>> load gorp

== http://www.rebol.com/ LOAD seems to want to unescape the string. That's OK in this case, since all of the escaped characters are actually valid in URLs, but in the case of

>> bletch: "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8"

== "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8"

>> load bletch

== http://slovnik.nettown.cz/?co=naslepo&kde=A-� the (premature) unescaping of "%C8" back to the high-bit-on accented-E character may be the source of grief when that literal character is deemed invalid for use in a URL. Hope this helps!!! -jn-

[6/10] from: belkasri::1stlegal::com at: 18-Jun-2002 16:47

I am really new to REBOL, but not new to progamming! How about read the URL as string, then replace the reserved characters, and then put it in URL var data type? -_Abdel.

[7/10] from: ingo:2b1 at: 19-Jun-2002 8:11

Hi Cyphre, Cyphre wrote:

> I have this problem, how to 'read following url from rebol? > > http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8

There is a bug in the url! handling. Percent-escaped characters are unescaped at two different places. So one possible solution is to percent-escape the percent ...

>> read http://slovnik.nettown.cz/?co=naslepo&kde=A-%25C8

[Accept Connection User-Agent Host] ["*/*" "close" "REBOL 1.2.5.4.2" "slovnik.nettown.cz"] */* close REBOL 1.2.5.4.2 slovnik.nettown.cz ** User Error: Error. Target url: http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8 could not be retrieved. Server response: HTTP/1.0 404 WWWOFFLE Will Get ** Near: read http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8 I hope that helps, Ingo

[8/10] from: rotenca:telvia:it at: 19-Jun-2002 15:52

Hi Ingo, Joel, Cyphre I think the problem is in Load, which interprets %xx and trasform it in a char. But net-utils/url-parser/parse-url which is called by decode-url expects that the url confomsitself to the RFC. Ingo, What are the two place in which code is badly escaped? I can think only of Load (and do which internally call Load with string, file ...) Cyphre, about proxy error, it is not exactly the same, because Load do not change (i try to explain this at the end), it escape it correctly. The log message "The URL Parse: ..." is correct, this should mean that url has been decoded in the exact mode. Joel,

>Notice that the message is > User Error: URL error: ...

This is the result of the function: net-utils/url-parser/parse-url/no-error

>and not > Syntax Error: Invalid url ...

This is a Load error message.

>implying to my eye that LOAD was happy

Yes

> but the result wasn't usable by READ.

because of decode-url, which makes the right thing, is Load which happy fails :-)

>When I say something like > >> gorp: "http://%77%77%77.rebol%2ecom/"

<<quoted lines omitted: 3>>

>LOAD seems to want to unescape the string. That's OK in this case, >since all of the escaped characters are actually valid in URLs,

I think it is not correct, because Load could be unaware of some legal custom rules valid for some existing (and not existing) url. Reading the RFC, I understand that almost all chars in an URL could be escaped, i do not understand why Load should change this. It produces only a more readable url (sometimes invalid). That could be made by a function like Form not by Load.

>the (premature) unescaping of "%C8" back to the high-bit-on accented-E >character may be the source of grief when that literal character is >deemed invalid for use in a URL.

Yes. I think that the bug arise from the fact that RT use the same code for Loading a file and Loading an url, but this ends sometimes with uncorrected url:

>> load "% " ;== % >> load "ftp:// " ;== ftp:// >> load "%%22" ;== %%22 >> load "ftp://%22" ;== ftp://%22 >> load "%%C8" ;== %� >> load "ftp://%C8" ;== ftp://�

--- Ciao Romano

[9/10] from: joel:neely:fedex at: 19-Jun-2002 10:20

Hi, Romano, On Wednesday, June 19, 2002, at 08:52 AM, Romano Paolo Tenca wrote:

> Joel, > >> LOAD seems to want to unescape the string. That's OK in this case, >> since all of the escaped characters are actually valid in URLs, > > I think it is not correct, because Load could be unaware of some legal > custom > rules valid for some existing (and not existing) url.

We're saying the same thing (although sloppily in my case ;-). All I meant was that the specific example of escaping allowable characters showed that LOAD was unescaping them, but in an example that didn't immediately blow up. I agree that it's problematic for LOAD to unescape the URL prior to use. -jn-

[10/10] from: ingo:2b1 at: 20-Jun-2002 0:00

Hi Romano, Romano Paolo Tenca wrote:

> Hi Ingo, Joel, Cyphre > I think the problem is in Load, which interprets %xx and trasform it in a

<<quoted lines omitted: 3>>

> Ingo, What are the two place in which code is badly escaped? I can think only > of Load (and do which internally call Load with string, file ...)

If I knew, I'd be the guru, not you :-) I really can't add anything that's not already been said, but it seems to me, that the problem really is 'load, which does more than would really be good for it. Kind regards, Ingo <..>

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted