problems with url...
[1/10] from: cyphre:seznam:cz at: 18-Jun-2002 17:48
Hi List,
I have this problem, how to 'read following url from rebol?
http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8
Anyone ?
regards,
Cyphre
[2/10] from: gscottjones:mchsi at: 18-Jun-2002 14:14
From: "Cyphre"
> I have this problem, how to 'read following url from rebol?
>
> http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8
>
> Anyone ?
Hi, Cyphre,
It is easier to explain how to bypass the problem than to explain where the
real problem lies.
:)
The problem seems to be that the percent sign can be used to escape hex
coded characters. The dehex-ed character for C8 is È. When the url is
entered, the interpreter immediately substitues the character "È" for "%C8".
However, the url parser will no longer parse the entire url, because "È" is
not a part of its rules. Probing the http scheme *after* a failed read
shows that the file portion contains the fragment "?co=naslepo&kde=A-",
indicating to me that it failed at the next character, which *it* thinks is
È
instead of "%" (followed by "C8", of course).
The way to work around the problem is to do something like the following:
read rejoin [http://slovnik.nettown.cz/?co=naslepo&kde=A- "%C8"]
which then returns the page.
What I am unsure about is exactly "where" the problem lies? Is it that some
urls contain hex encoded characters and that REBOL improperly translates the
results in an incorrect manner? I do not know for sure. I am not sure why
my work-around works! Unfortunately, I am out of time to explore the
problem further right now.
Hope this helps a bit anyway.
--Scott Jones
[3/10] from: rotenca:telvia:it at: 18-Jun-2002 22:47
Hi, Cyphre
>I have this problem, how to 'read following url from rebol?
>http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8
read to-url "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8"
== {<!doctype html public "-//w3c//dtd html 3.2 final//en">...
But load fails:
read load "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8"
** User Error: URL error: http://slovnik.nettown.cz/?co=naslepo&kde=A-È
** Near: read load "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8"
I do not know why. Any ideas?
---
Ciao
Romano
[4/10] from: gscottjones:mchsi at: 18-Jun-2002 16:51
From: "G. Scott Jones"
> The way to work around the problem is to do something like the following:
>
> read rejoin [http://slovnik.nettown.cz/?co=naslepo&kde=A- "%C8"]
>
> which then returns the page.
Responding to self:
or also the following:
read to-url "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8"
--Scott Jones
[5/10] from: joel:neely:fedex at: 18-Jun-2002 17:28
Hi, Romano,
No really useful suggestions, but...
Romano Paolo Tenca wrote:
> Hi, Cyphre
> >I have this problem, how to 'read following url from rebol?
<<quoted lines omitted: 6>>
> ** Near: read load "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8"
> I do not know why. Any ideas?
I notice that LOAD seems to be doing something interesting with that
percent-escaped character at the end, and possibly transforming it
into a character that's not legal for a URL (prematurely?).
>> read to-url "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8"
== {<!doctype html public "-//w3c//dtd html 3.2 final//en">
<!--
Copyright (C) 2000 Petr Kùra, [kura--nettown--cz]
All rights reserved...
>> read load "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8"
** User Error: URL error:
http://slovnik.nettown.cz/?co=naslepo&kde=A-È
** Near: read load "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8"
Notice that the message is
User Error: URL error: ...
and not
Syntax Error: Invalid url ...
as in
>> read load "http://%"
** Syntax Error: Invalid url -- http://%
** Near: (line 1) http://%
implying to my eye that LOAD was happy but the result wasn't usable
by READ.
When I say something like
>> gorp: "http://%77%77%77.rebol%2ecom/"
== "http://%77%77%77.rebol%2ecom/"
>> load gorp
== http://www.rebol.com/
LOAD seems to want to unescape the string. That's OK in this case,
since all of the escaped characters are actually valid in URLs, but
in the case of
>> bletch: "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8"
== "http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8"
>> load bletch
== http://slovnik.nettown.cz/?co=naslepo&kde=A-È
the (premature) unescaping of "%C8" back to the high-bit-on accented-E
character may be the source of grief when that literal character is
deemed invalid for use in a URL.
Hope this helps!!!
-jn-
[6/10] from: belkasri::1stlegal::com at: 18-Jun-2002 16:47
I am really new to REBOL, but not new to progamming! How about read the URL
as string, then replace the reserved characters, and then put it in URL var
data type?
-_Abdel.
[7/10] from: ingo:2b1 at: 19-Jun-2002 8:11
Hi Cyphre,
Cyphre wrote:
> I have this problem, how to 'read following url from rebol?
>
> http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8
There is a bug in the url! handling. Percent-escaped characters are
unescaped at two different places. So one possible solution is to
percent-escape the percent ...
>> read http://slovnik.nettown.cz/?co=naslepo&kde=A-%25C8
[Accept Connection User-Agent Host]
["*/*" "close" "REBOL 1.2.5.4.2" "slovnik.nettown.cz"]
*/*
close
REBOL 1.2.5.4.2
slovnik.nettown.cz
** User Error: Error. Target url:
http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8 could not be retrieved.
Server response: HTTP/1.0 404 WWWOFFLE Will Get
** Near: read http://slovnik.nettown.cz/?co=naslepo&kde=A-%C8
I hope that helps,
Ingo
[8/10] from: rotenca:telvia:it at: 19-Jun-2002 15:52
Hi Ingo, Joel, Cyphre
I think the problem is in Load, which interprets %xx and trasform it in a
char.
But net-utils/url-parser/parse-url which is called by decode-url expects that
the url confomsitself to the RFC.
Ingo, What are the two place in which code is badly escaped? I can think only
of Load (and do which internally call Load with string, file ...)
Cyphre,
about proxy error, it is not exactly the same, because Load do not change
(i try to explain this at the end), it escape it correctly.
The log message "The URL Parse: ..." is correct, this should mean that url has
been decoded in the exact mode.
Joel,
>Notice that the message is
> User Error: URL error: ...
This is the result of the function:
net-utils/url-parser/parse-url/no-error
>and not
> Syntax Error: Invalid url ...
This is a Load error message.
>implying to my eye that LOAD was happy
Yes
> but the result wasn't usable by READ.
because of decode-url, which makes the right thing, is Load which happy fails
:-)
>When I say something like
> >> gorp: "http://%77%77%77.rebol%2ecom/"
<<quoted lines omitted: 3>>
>LOAD seems to want to unescape the string. That's OK in this case,
>since all of the escaped characters are actually valid in URLs,
I think it is not correct, because Load could be unaware of some legal custom
rules valid for some existing (and not existing) url.
Reading the RFC, I understand that almost all chars in an URL could be
escaped, i do not understand why Load should change this. It produces only a
more readable url (sometimes invalid). That could be made by a function like
Form not by Load.
>the (premature) unescaping of "%C8" back to the high-bit-on accented-E
>character may be the source of grief when that literal character is
>deemed invalid for use in a URL.
Yes.
I think that the bug arise from the fact that RT use the same code for Loading
a file and Loading an url, but this ends sometimes with uncorrected url:
>> load "% " ;== %
>> load "ftp:// " ;== ftp://
>> load "%%22" ;== %%22
>> load "ftp://%22" ;== ftp://%22
>> load "%%C8" ;== %È
>> load "ftp://%C8" ;== ftp://È
---
Ciao
Romano
[9/10] from: joel:neely:fedex at: 19-Jun-2002 10:20
Hi, Romano,
On Wednesday, June 19, 2002, at 08:52 AM, Romano Paolo Tenca wrote:
> Joel,
>
>> LOAD seems to want to unescape the string. That's OK in this case,
>> since all of the escaped characters are actually valid in URLs,
>
> I think it is not correct, because Load could be unaware of some legal
> custom
> rules valid for some existing (and not existing) url.
We're saying the same thing (although sloppily in my case ;-). All I
meant
was that the specific example of escaping allowable characters showed
that
LOAD was unescaping them, but in an example that didn't immediately blow
up.
I agree that it's problematic for LOAD to unescape the URL prior to use.
-jn-
[10/10] from: ingo:2b1 at: 20-Jun-2002 0:00
Hi Romano,
Romano Paolo Tenca wrote:
> Hi Ingo, Joel, Cyphre
> I think the problem is in Load, which interprets %xx and trasform it in a
<<quoted lines omitted: 3>>
> Ingo, What are the two place in which code is badly escaped? I can think only
> of Load (and do which internally call Load with string, file ...)
If I knew, I'd be the guru, not you :-)
I really can't add anything that's not already been said, but it seems
to me, that the problem really is 'load, which does more than would
really be good for it.
Kind regards,
Ingo
<..>
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted