World: r3wp
[Parse] Discussion of PARSE dialect
older newer | first last |
Maxim 21-Sep-2010 [5283] | obviously we could alter the rules again to account for data-less tokens, but this would require a bit different structure. |
Claude 21-Sep-2010 [5284] | great working ;-) |
Maxim 21-Sep-2010 [5285] | oops... in the above text... " this rule" should read.. "the next rule" |
Claude 21-Sep-2010 [5286] | if you have time to show me how i am ok. but for now i must take my children to school and go to work. thanks a lot again |
Maxim 21-Sep-2010 [5287x4] | basically you have to create two complete (& and alternate) rule structures. and separate them with an "|" . but you have to be sure that the first rule doesn't "pre-empt" the second one.... meaning that the first rule must not also match the second rule, or else, you will never reach the second rule. |
for example.. some [["a" | "aa"]] here we will never reach "aa" because "a" will be satisfied and the alternative will never be attempted ... so instead of matching "aa" you'd always match "a" twice . | |
where as specifying [some ["aa" | "a"]] will always match "aa" IF there is still more than one "a" to parse... and will only ever reach "a" if the sequence is an odd number of "a" characters (or just one, obviously). so "aaaa" will match the "aa" rule twice, and "aaa" will match "aa" then "a" . | |
IMHO, this is the basic premise of all of parse. once you really understand how this applies to a rule which has sub rules... you really understand parse.... and then you can start doing more funky stuff. | |
BrianH 21-Sep-2010 [5291] | Ladislav, your use-rule is effectively the same implementation that I had in mind when I made the USE operation proposal in the first place, but mezzanine instead of native. With the same overhead that made Carl initially reject the proposal when they were being implemented. I'm glad that you are making more headway towards getting him to accept it this time :) |
Ladislav 22-Sep-2010 [5292x5] | Actually, both variants are implemented, even the one without the overhead (which I implemented first). |
(or, to be more precise, maybe there is a possibility to make a variant not binding the rule at all, which would then deserve to be called "without the overhead" rather than any of my variants) | |
But, as you said, one of my motivations was to write it as a mezzanine to have some "inspiration"/experiences with it for Carl. | |
, since I guess, that this way, he will not have to just go into an "unknown territory" | |
I must say, that I was actually surprised, how people (including me) have struggled to circumvent this problem, while having such an elegant way available to solve it. | |
GrahamC 18-Oct-2010 [5297] | a regex question ... ([0-9]{4})(-([0-9]{2})(-([0-9]{2})(T([0-9]{2}):([0-9]{2})(:([0-9]{2})(\.([0-9]+))?)?(Z|(([-+])([0-9]{2}):([0-9]{2})))))) is apparently failing this string : 2010-10-18T07:06:25.00Z What tool can I use to check this string against this regex ? |
Sunanda 18-Oct-2010 [5298] | Regexlib has a different ISO-8601 date matching regex: http://regexlib.com/REDetails.aspx?regexp_id=2092 And the ability to enter any regex and target strings to test what happens: http://regexlib.com/RETester.aspx? |
GrahamC 18-Oct-2010 [5299x2] | found this one too http://www.fileformat.info/tool/regex.htm |
and it seems my string is passing ... hmm | |
Sunanda 18-Oct-2010 [5301] | The problem with regexes is they are impossible to debug.....Best just to rewrite continually until they work :) |
GrahamC 18-Oct-2010 [5302] | I'm trying to validate some XML against an online validator and it's rejecting my dates :( |
Henrik 18-Oct-2010 [5303] | how do you specify an element to be of the type any-type! except none! ? |
Ladislav 18-Oct-2010 [5304] | I am afraid, that you need to list all types excluding none |
Henrik 18-Oct-2010 [5305] | does R3 solve this? if not, maybe that would be a good problem to solve. |
Ladislav 18-Oct-2010 [5306] | R3 can let you define that typeset and use it any time you like |
Henrik 18-Oct-2010 [5307] | ok, that is possibly good enough for generating specs. |
Gregg 18-Oct-2010 [5308] | I don't remember what all we did Henrik, but some of our test generation stuff on another world had some support for typesets IIRC. |
Henrik 18-Oct-2010 [5309] | Gregg, ok |
Steeve 18-Oct-2010 [5310] | Henrik, with a parse rule ? |
Henrik 18-Oct-2010 [5311] | Steeve, yes. |
Steeve 18-Oct-2010 [5312] | R3 does it |
AdrianS 18-Oct-2010 [5313] | Graham, try http://gskinner.com/RegExrfor working out regexes. It has a really nice UI where you can hover over the components of the regex and see exactly what they do. |
GrahamC 18-Oct-2010 [5314] | Thanks |
Sunanda 4-Nov-2010 [5315] | Question on StackOverflow.....there must be a better answer than mine, and I'd suspect it involves PARSE (better answers usually do:) http://stackoverflow.com/questions/4093714/is-there-finer-granularity-than-load-next-for-reading-structured-data |
GrahamC 4-Nov-2010 [5316x3] | Use fixed length records |
Anyone got a parse rule that strips out everything between tags in an "xml" document | |
whitespace: charset [ "^/^- " ] swsp: [ any whitespace ] result: copy "" parse/all pqri-xml [ some [ copy t thru ">" (append result t) swsp to "<" ]] | |
Ladislav 4-Nov-2010 [5319] | Posted an answer mentioning the test framework, which does almost exactly what Fork asked |
Gabriele 5-Nov-2010 [5320x3] | also, Carl's clean-script and script colorizer use parse + load/next to do the same thing. my Wetan uses the same method. |
http://www.colellachiara.com/soft/MD3/emitters/wetan.html#section-4.2 | |
basically, as long as you skip over [, (, ), and ] you can just use load/next. I'm also skipping over #[ because I want to preserve literal values while formatting (that is, preserve what the user typed) | |
Oldes 1-Dec-2010 [5323] | How to use the new INTO parse keyword? Could it be used to avoid the temp parse like in this (very simplified example)? parse "<a>123</a>" [thru "<a>" copy tmp to "</a>" (probe tmp probe parse tmp ["123"]) to end] Note that I know that in this example it's easy to use just one parse and avoid the temp. |
Ladislav 1-Dec-2010 [5324x3] | INTO is neither new, not it is meant for string parsing |
You can take advantage of using it when parsing a block and needing to parse a subblock (of any-block! type) or a substring | |
(of the said block) | |
Oldes 1-Dec-2010 [5327] | can you give me a simple example, please? |
Ladislav 1-Dec-2010 [5328x2] | >> parse [a b "123" c] [2 word! into [3 skip] word!] == true |
>> parse [a b c/d/e] [2 word! into [3 word!]] == true | |
Oldes 1-Dec-2010 [5330x2] | I understand now, thanks. |
it's very useful, I woder why I've not found it earlier :) | |
Ladislav 1-Dec-2010 [5332] | The substring property is just a recent addition |
older newer | first last |