Mailing List Archive: 49091 messages
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

[REBOL] Re: Bug! Rebol's parsing of urls is incorrect.

From: holger::rebol::com at: 9-Feb-2001 16:14

On Sat, Feb 10, 2001 at 12:16:58PM +1300, Andrew Martin wrote:
> Rebol also has a problem with parsing urls: > >> type? http://www.rebol.com! > == url! > >> type? http://www.rebol.com. > == url! > >> type? http://www.rebol.com? > == url! > >> type? http://www.rebol.com, > == url! > > My email client correctly leaves out the exclamation mark "!", period ".", > question mark "?" and comma "," but Rebol treats all of them as part of the > URL, which is clearly incorrect.
There is a difference between a legal, parsed URL and REBOL's detection of datatypes. REBOL parses pretty much anything that starts with text: followed by something else as a url! datatype. That does not mean that all of them are necessarily legal URLs. In fact, whether a URL is legal or not depends completely on the scheme. For instance the first component after (scheme):// does not necessarily have to be a host name. Just take file://a,b,c! as an example. This is completely legal, yet your email client might incorrectly stop after the "a". REBOL's scanner has no knowledge of schemes and their particular rules for URL wellformedness, because otherwise it would be impossible to add user-defined schemes with their own URL schemes. It would also prevent you from doing some types of dynamic URL generation/manipulation at the series level. The actual URL parsing and check for wellformedness is done much later, when you try to open a port with a URL. Individual schemes have their own parsers. There is one universal parser (decode-url), which is what most schemes use to parse URLs. It knows about the most common URL formats, including username/password, hostname, ports directory and file parts. The same concept applies to scanning vs. parsing of emails. -- Holger Kruse [holger--rebol--com]