World: r3wp
[Core] Discuss core issues
older newer | first last |
BrianH 26-Apr-2011 [1344x2] | It used to be generated, but Carl says it's faster. I don't doubt him, because I've used dozens of parser generators before and that always seems to be the case. Sometimes you can get faster generated parsers, but generated lexers are usually slower because they're table-driven rather than code-driven. The advantage to generated lexers is that they are easier to write for complex lexical rules; for simple lexical rules, it is usually worth hand-coding. |
One of the tricks when refining the details is to realize that there is a real runtime difference between recommending that people not do something, and prohibiting something. Every time we prohibit something it has runtime overhead to enforce that prohibition. So every recommendation needs documenting and explaining, but every prohibition needs justifying. There are situational tradeoffs that recommendations can resolve easier than prohibitions. This is why we have to be extra careful about this. | |
Geomol 26-Apr-2011 [1346] | REBOL has 26 or so datatypes recognized by the scanner. That I would call complex lexical rules. Maybe a generated lexer will resolve many of the problems? |
BrianH 26-Apr-2011 [1347x2] | Actually, that's still considered pretty simple. You still might need a DFA for some of the rules, but most of them can be recognized by hand-written code more efficiently. The problems are not caused by not using a generated lexer - even a generated lexer can have precedence errors. The real syntax bugs in R3 are there because noone has really gone through and figured out what they are, systematically; most of them are still undocumented. Recently, in my spare time, I've been trying to go through and document the syntax and ticket the bugs, so soon the limit will be developer time. (In R2, the bugs are there because the syntax is frozen for backwards compatibility.) |
As for the syntax-vs-memory data restrictions, it's another tradeoff. Regular REBOL syntax is much more limited than the full data model of REBOL, even if you include MOLD/all syntax, because the syntax was designed more for readability and writeability by humans. If we limit the data model to match the syntax, we limit our capabilities drastically. Limiting to the syntactic form only makes sense when you are serializing the data for storage or transport; in memory, it's unnecessary. A better solution is making a more comprehensive serialization format that doesn't have to be human readable - Rebin - and then using it when we need to serialize more of the in-memory data. | |
Geomol 26-Apr-2011 [1349] | I went through the scanner systematically 2 years ago, produced a document, which I sent to Carl. It's here: http://www.fys.ku.dk/~niclasen/rebol/rebol_scanner.html |
BrianH 26-Apr-2011 [1350] | Cool, I'll take a look. I've been trying to generate compatible parsers in mezzanine PARSE code, which could then be translated to other parse models like syntax highlighters for editors when necessary. I'm hoping to make a module of rules that can be used by a wide variety of syntax analyzers. |
Geomol 26-Apr-2011 [1351] | Actually, that's still considered pretty simple. Can you give examples of other lexers, that has to recognize more different tokens? |
BrianH 26-Apr-2011 [1352] | C++ and Perl. |
Maxim 26-Apr-2011 [1353] | if you include schema validation... I'd say XML is a nightmare :-) |
Geomol 26-Apr-2011 [1354] | C++ hmm. Is that because you see each of the reserved keywords as a different token? I see all them as one. |
BrianH 26-Apr-2011 [1355] | One of the interesting tradeoff tickets is http://issue.cc/r3/537 - I wrote up the ticket initially and expanded it to include all affected characters, but looking at it now I'd have to recommend that it be dismissed. If it is accepted it would have the side effect that more syntax would be accepted, but all of the newly accepted syntax would be hard to read. Accepting that ticket would make R3 more difficult to read, debug and maintain, so it's a bad tradeoff. |
Geomol 26-Apr-2011 [1356] | XML is some of the simplest to parse, and I guess schema too. |
BrianH 26-Apr-2011 [1357] | With C++, it's not that bad to lex, but really hard to parse. Perl is both. |
Maxim 26-Apr-2011 [1358] | XML schema validation process is an 80 page document guide and 80 page reference. it isn't quite as easy as the xml it is stored in. |
Geomol 26-Apr-2011 [1359] | Ok, I mix lex and parse. I mean lexical analysis. |
BrianH 26-Apr-2011 [1360x6] | XML and HTML are relatively easy to lex, and require Unicode support, so hand-written lexers are probably best. Schema validation is a diffferent issue. |
REBOL is trickier to lex than to parse, but still in the middle of complexity overall. | |
Most generators seperate lexical analysis and parsing, but I've used ones that don't, like ANTLR and Coco/R. There are strengths to both approaches. | |
In answer to your comments link above: - Syntax errors are triggered before semantic errors: 1.3, 11 - Words that start with + and - are special because of potential ambiguity with numbers: 1.1 - Arrows are only allowed in the special-case arrow words, not generally: 1.2, 1.3, 4 - %: is ambiguous - it could be a file that wouldn't work on any OS, or the set-word form of %, so an error splits the difference: 10.2 - Fixed already: 2.2 for arrows in R3, 7, 13 Some of the rest are related to http://issue.cc/r3/537and others have been reported already. If you want 10.2 to not trigger an error, it is more likely to be accepted as a set-word than a file. Thanks for these, particularly the lit-word bugs. | |
Also fixed already: 10.1 for ( ) [ ] | |
Never mind about the 10.2 stuff: For some reason I forgot that % wasn't a modulus operator :( | |
Geomol 1-May-2011 [1366] | If I in a function have a local variable, v, but I want the value of a variable v in the context outside the function, I can write: get bind 'v bound? 'f , where f is the name of the function. Is that the way to do it, or is there a better way? Full example: >> v: 1 == 1 >> f: func [/local v] [v: 2 get bind 'v bound? 'f] >> f == 1 |
Ladislav 1-May-2011 [1367x2] | Is that the way to do it - I guess not, there is a more efficient way |
If you know the context you want to use and it is always the same, then it is a bit inefficient to call the BIND function, not to mention, that bind 'v 'f is more efficient than bind 'v bound? 'f | |
Geomol 1-May-2011 [1369x2] | Thanks! |
It's for the parse function, I'm working on, and I want to be sure, I don't get a local var, if vars are used in the parse rules. | |
Maxim 1-May-2011 [1371] | if the parse rule is given as a parameter, vars within the rule will not be bound to the function. the binding is static, i.e. it occurs only once, when the function is created. the word in the parse, already is bound (or not). |
Geomol 1-May-2011 [1372] | Ah yes, thanks. |
Geomol 9-May-2011 [1373] | Tonights moment of REBOL ZEN: >> f: func [/ref x] [print [ref x]] >> f/ref/ref 1 2 true 2 |
onetom 9-May-2011 [1374] | ahhha... >> f: func [/ref x y] [print [ref x y]] f/ref/ref 1 2 3 4 true 3 4 |
PeterWood 9-May-2011 [1375] | Can somebody confirm if the following crashes 2.7.8 on their machine >> -1 * -2147483648 |
Sunanda 10-May-2011 [1376] | Crashes under Windows here, Nice catch! |
PeterWood 10-May-2011 [1377x2] | I was also running under Windows. |
Crashes under OS X too: >> -1 * -2147483648 Floating point exception | |
BrianH 10-May-2011 [1379x2] | Geomol, that's something I've never seen anyone do in REBOL before. The discarded arguments are even evaluated properly and typechecked. |
Works in R3 as well. | |
Ladislav 10-May-2011 [1381] | An old one, Peter. It is in %core-tests.r |
PeterWood 10-May-2011 [1382] | Do you know if it is RAMBO as I guess Carl doesn't take much interest in %core-tests.r ? |
Ladislav 10-May-2011 [1383] | #4229 |
PeterWood 10-May-2011 [1384] | Thanks Ladislav. |
Geomol 10-May-2011 [1385x2] | Tonight's Moment of REBOL Zen: Check this Fibonacci function: fib: func [ n [integer!] ( if not local [ a: 0 b: 1 ] prin [a b ""] loop n [ prin [c: a + b ""] a: b b: c ] print "" ) /local a [integer!] b [integer!] c ][ do bind third :fib 'a ] >> fib 10 0 1 1 2 3 5 8 13 21 34 55 89 == 89 >> fib/local 10 55 89 none 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 == 10946 If you only want to execute the paren in the function spec, put this in the body instead: do bind to block! third second load mold :fib 'a |
A more simple example of this weird function construction: >> hello-world: func [(print "Hello, World!")] [do third :hello-world] >> hello-world Hello, World! | |
onetom 10-May-2011 [1387] | moebius function. its body bends and bites back into its own spec :) |
Maxim 10-May-2011 [1388] | can anyone confirm that 'CALL on windows 7 is unable to launch apps without using the /show refinement.... which is VERY annoying. it seems that call/shell no longer works. |
Dockimbel 11-May-2011 [1389] | I use CALL without /show in Cheyenne to start worker processes. Anyway, CALL is quite unreliable in 2.7.8 on Windows as shown by this RAMBO ticket: http://www.rebol.net/cgi-bin/rambo.r?id=4416& |
Geomol 12-May-2011 [1390] | Tonight's Moment of REBOL Zen: The /local refinement in functions is just like any other refinement. This again mean, any refinement can be used for local variables, like in this example: exp2: func [ "2 raised to exponent" exponent [number!] /il-locale number [number!] ][ if not il-locale [number: 2] number ** exponent ] >> exp2 3 == 8.0 >> exp2/il-locale 3 3 == 27.0 But HELP will search for the /local refinement, when producing its output. But as any word, HELP can just be redefined to serve ones needs. HELP is even a function, so its source can be looked at, if someone wants to produce ones own HELP function. |
ChristianE 12-May-2011 [1391x2] | /local is special only in HELP not listing any refinements and args from the /local refinement onwards. You can even use that to hide refinements (sth. like 'private' refinements): |
>> zen: func [arg [integer!] /local /private base] [add any [base 0] arg] >> zen 1 == 1 >> help zen USAGE: ZEN arg DESCRIPTION: (undocumented) ZEN is a function value. ARGUMENTS: arg -- (Type: integer) >> zen/private 1 10 == 11 | |
Maxim 12-May-2011 [1393] | just remember that in R3, the /local refinement might be given special status in a future release. this was mainly to prevent you from supplying default values to locals which can be a pretty big security hole right now. |
older newer | first last |