World: r3wp
[I'm new] Ask any question, and a helpful person will try to answer.
older newer | first last |
mhinson 14-Apr-2009 [1696x2] | Thanks very much Pater & sqlab. those examples both do exactly what I was thinking. I now need to try & understand how this relates to the parse-tutorial & hopefully I will be able to start using the principles myself. Thanks again. |
Hi again. Sorry to be asking questions again so soon. I started using the syntax suggested with success, but in my input file I find the first key word is only valid if it is right at the start of the line. I have been searching through the documentation for the last hour & failed to find any references to "start of line" or similar. (like ^ in reg expressions). I wondered if there was any document to help people convert from regular expressions to Rebol parse expressions too please? Thanks, /\/\ | |
Pekr 14-Apr-2009 [1698x3] | Regexp is quite different beast, and there are no single rules for translation to REBOL's parse. However - what do you mean by the beginning of the line? Is it the first char right after the end-of-line? |
btw - do you use parse/all? I prefer to use parse with the refinement, because using plain 'parse ignores whitespaces, and I don't like when the engine messes with things instead of me :-) | |
Could you please post few lines of your input file? | |
sqlab 14-Apr-2009 [1701] | thry this rule: [(wanted: copy [] ) any [copy line ["wanted" to "rubbish" ] (append wanted line) | thru newline] ] |
mhinson 14-Apr-2009 [1702] | Hi, Pekr, I appreciate that the concept for parsing is different to the use of regular expressions, but there are some things that do map from one to the other & I wondered if any table of those things existed. As a noob sometimes the hardest questions to get answered are the ones where the answer is that there is no concept such as that sought by the noob. e.g. how do you grow strawberries in the sea? The first match must be at the begining of the line. If it was the first line in the set then it would not be after a new line, but other cases it would be. I will use parse/all from now, I like the extra control you describe. here a few lines of a test input, the script I am hoping to develop is to parse the config files from Cisco devices in order to extract the layer 2 & 3 information together with the interface names & descriptions. lines: {interface FastEthernet0 description The connection to the printer ! interface FastEthernet1 ! interface Vlan1 description User vlan (only 1 vlan allowed) no ip address ! interface Dialer0 description Outside ip address negotiated ! interface BVI1 description Inside ip address 192.168.0.1 255.255.255.0 ! ip sla 3 icmp-echo 217.0.0.1 source-interface Dialer0 ip route 0.0.0.0 0.0.0.0 Dialer0 interface ATM0.1 point-to-point no ip redirects no snmp trap link-status pvc 0/38 pppoe-client dial-pool-number 1 ! } ; sqlab, your change to use "thru newline" does what I wanted in this case which is good. ; my next step is to try & understand the "or" construct properly as the code below dosn't quite cut it. wanted: copy [] interface: ["interface" [to #"^/" | to "point-to-point"]] parse lines [any [[copy temp interface (insert tail wanted temp)] | thru newline ]] foreach line wanted [print line] ; thanks very much for your help, /\/\ |
Pekr 14-Apr-2009 [1703x2] | I am far from parse guru, but above rule (while works) looks weird :-) Why to produce interface rule that way? The line is ending with line terminator anyway, no? parse/all lines [ any [ [ "interface" copy int-name to newline (print int-name) newline | skip ] ] ] |
... this is really simpler, no subrule to ruin your brain is needed ... | |
sqlab 14-Apr-2009 [1705] | I am not sure that I understand your intention. Do you want just interface ATM0.1, then you have to switch the order of your interface rule, as the condition to #"^/" (newline) is already true and done, and your cursor behind "point-to-point". As the first part is true, the second will never be done. |
Pekr 14-Apr-2009 [1706x2] | should point-to-point be filtered out? Then the rule would be a bit different .. |
Slightly different version: wanted: copy [] spacer: charset " ^/" name-char: complement spacer interface: [ "interface " copy int-name some name-char (append wanted int-name) spacer ] parse/all lines [any [interface | skip]] print mold wanted | |
mhinson 14-Apr-2009 [1708] | yes, point-to-point needs to be ignored from the result, an other similar cases in real life. once the interface string & details are found the script will need a sub search that is looking for "description" or "ip address" I was hoping that by extracting the rule used for each search i would make it easier to add new rules as the requirement becomes clear. I tried swapping the order in the rule to interface: ["interface" [to "point-to-point" | to #"^/"]] but this just finds everything in the whole input. Perhaps I am to old to learn this. I worked programming in Pascal a good few years ago, but only for about a year. I failed to grasp SmallTalk more recently & I am really struggling with this. Thanks fpr all your helps. /\/\ |
Pekr 14-Apr-2009 [1709x2] | to [ aaaa | bbbb] is long time parse enhancement request, which is not yet implemented, but is planned for 3.0. It would really make lifes of parse beginners much easier. Your parse rule simply means - try to find "point-to-point" or the end of the line. But - it looks for the point-to-point till it reaches end of the input string. |
mhinson - just don't give up ... if you are beginner with REBOL, you choosed to start with pretty advanced topic. | |
Henrik 14-Apr-2009 [1711] | yes, parsing is one of the most difficult topics of REBOL. |
mhinson 14-Apr-2009 [1712] | Thanks for the encouragement.. I wont give up yet for a good while. Most of the programming I have done is out of a need to produce a specific result & that quite often needs to be fairly complex, however having a real need also makes the effort seem more worth while. I appreciate that parsing is quite hard, but it also seems to be one of the features that differentiates REBOL from other languages & is often refered to as being more efficent once the concepts are fully grasped. If this is not true, then perhaps I would be better off with php or perl etc. I have also already had some fun with the very straight forward graphical stuff which is fantastic. I am off out now, I hope to make a bit more code work tommrow as I am on holiday this week. :-) Thanks again |
Pekr 14-Apr-2009 [1713x3] | you can also use rebol and call php or perl for some stuff :-) However - you rules could be made - you just need to scatter it into sections and find some rules for the parsed file structure. |
spacer: charset " ^/" name-char: complement spacer interface: [ "interface " copy int-text some name-char (print ["interface: " int-text]) (append wanted int-text) thru newline ] description: [ "description " copy desc-text to newline (print ["description: " desc-text]) newline ] ip-address: [ ["ip address " copy add-text to newline (print ["ip address: " add-text]) newline | "no ip address" newline (print ["ip address:" "no adress"]) ] ] int-section: [interface any [description | ip-address | "!" break | skip]] parse/all lines [any [int-section | skip]] | |
... ignore (append wanted inte-text) above - I did not use it in the code, I just used print to check how sections work ... | |
mhinson 15-Apr-2009 [1716x2] | Hi, I have broken this down to try & understand it, but my understanding is still very vague, paticularly in respect of the order of things like the copy statement & also the number of brackets needed is confusing me. lines: {junk Interface fa0 ! interface fa1} spacer: charset " ^/" name-char: complement spacer parse/all lines [ any [ [ [ "interface " copy int-text some name-char (print ["interface: " int-text]) thru newline ] any ["!" break | skip] ] | skip ] ] I need to find some way to make it only get the "interface " if it starts at the first position on the line. I thought I needed to remove the word "any" to do this, but that did not work. |
Perhaps I should also say that the structure of these Cisco config files tends to have the section start at the first position & sub sections are indented. The use of "!" is a bit sporadic & varies in different contexts. I have been trying to hunt down a bunch of test examples without success, test data that can be shared freely is hard to get hold of. Thanks for your help. | |
PeterWood 15-Apr-2009 [1718x2] | It is quite easy to find something that starts in the first postion of a line by matching against newline+the something. I'm too lazy to remember the newline character so I tend to write something like this: >> interface: join newline "interface " == "^/interface " >> spacer: charset to string! newline == make bitset! #{ 0004000000000000000000000000000000000000000000000000000000000000 } >> name-char: complement space r == make bitset! #{ FFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF } >> parse/all lines [any [interface copy int-text some name-char (print ["interface: " int-text]) | skip]] interface: fa1 == true |
In case this isn't clear. I'll try to explain the parse rule. First any effectively says to match any of the rules in the following block until the end of the string is reached. The first rule in the block is [interface copy int-text some name-char (print ["interface: " int-text)] says match with the word interface (newline + "interface ") then if there is a match, copy some name-char which says copy one or more characters which match the criteria of a name-char then if there are some name-char characters evaluate the rebol code in parentheses. If there wasn't a match with that first rule, then the second rule that follows the | will be applied. skip will pass over one character and always provides a match. | |
mhinson 16-Apr-2009 [1720] | Thanks for your help. I am beginning to wonder if what I am trying to do is not possiable in Rebol. I am impressed at the number of responses, but I still cant find a way to use all the bits together to create a structure that is going to find the bits of data I am after. One of the problems seems to be that catching the the data starting with new line & ending at newline uses up the "newline" for the following line so then that line gets missed. Is there really no symbolic way in Rebol to identify the begining of the line without using the newline char from the end of the previous line? |
PeterWood 16-Apr-2009 [1721x2] | Mike - The method that I showed you does not use up the "newline" at the end of the line. If you check again, the parse rule simply says copy in-text some name-char. This "stops" before the newline at the end of the line. In fact guessing at your requirements a little and assuming the name-char is available. Some thing along these lines should be close to what you want: keywords: ["^/interface " | "^/another keyword " | "^/yet another kerword"] parse/all lines [any [ copy int-keyword [keywords copy int-text some name-char ( print int-keyword ": " int-text] ) | skip ] ] {I obviously haven't tested this code.) |
Sorry a typo, this line copy int-keyword [keywords copy int-text some name-char ( should be copy int-keyword keywords copy int-text some name-char ( | |
sqlab 16-Apr-2009 [1723] | I see just two ways to get what you desire either you define different rules for interface at the beginning and interface after newline or you do it in a two pass way: first you separate the lines (either by parse or by read/lines) and then you process every line by itself. I would go the easy way with two passes. |
mhinson 16-Apr-2009 [1724] | The mist maybe slowly clearing (sorry to be so slow to catch on). The 2 stage process may be the answer, perhaps I can add a key char at the first line position when I read the file, then use this as the line start reference, but continue to use the end of line as normal. I think I understand Peter's example & have tweaked it a bit to make it work for me. lines: {~junk Interface fa0 ~! ~interface fa1 ~interface fa2 point-to-point ~! ~interface Fa3 ~ description test three ~ ip address 1.1.3.3 255.255.255.0 ~! ~interface Fa4 ~ ip address 1.1.4.4 255.255.255.0 ~! ~interface Fa3 ~ description test four etc ~} spacer: charset "^/" name-char: complement spacer stopwords: "point-to-point" keywords: ["~interface " | "~ description " | "~ ip address"] parse/all lines [any [ copy int-keyword keywords copy int-text [to stopwords | some name-char] ( print [int-keyword ": " int-text] ) | skip ] ] |
sqlab 16-Apr-2009 [1725x2] | This got very long, but i think it should work ifrule: [ ifa: "interface" some [ ife: "point-to-point" break | ife: newline break | skip ] (append/only append wanted copy/part ifa ife interf: copy [] ) ] drule: [ "description" copy descr to newline (append interf descr) ] iprule: ["ip address" copy ip to newline (append interf ip) ] norule: ["no" to newline] pvcrule: ["pvc" to newline] pprule: ["pppoe" to newline] !rule: ["!" to newline] rule: [(wanted: copy [] ) some [ifrule | some [ s: " interface" | #" " | drule | iprule | norule | pvcrule | pprule | !rule | break ] thru newline ] ] parse/all lines rule |
There is a flaw use this rule: [(wanted: copy [] ) some [ifrule | some [ s: " interface" (interf: copy []) | #" " | drule | iprule | norule | pvcrule | pprule | !rule | break ] thru newline ] ] prevents collecting the not wanted interface attributes. | |
Pekr 16-Apr-2009 [1727] | uh, was on slow connection, so my reply got lost. Mhinson - there is no symbolic way to represent beginning of the line. I don't know any in any system. The only thing I know is end-of-line (newline). I know what you probably mean - you want to identify beginning of your lines, but even for first line (so not a rule, matching newline first, then next char = beginning of line). But - there is still various ways of how to do it. First - I think that your config files are chaos. Do they have any rules for some sections at all? :-) I also like what sqlab mentioned - sometimes it is easier to break stuff into 2 pass strategy. Read/lines is your friend here. You can try it on text files and you'll see, that the result is going to be a block of lines. I usually do: data: read/lines %my-data-file.txt ;--- remove empty lines from block of lines ... remove-each line data [empty? trim copy line] foreach line data [do something with data ....] Simply put - if rules for parser are out of my scope of capabilities (which happens easily with me :-), I try to find my other way around ... |
mhinson 16-Apr-2009 [1728] | sqlab, I like this as it also gives the extracted data some structure, which will be essential when using it. Pekr the type of symbolic start & end of line is described as regular expression anchoring http://www.regular-expressions.info/anchors.html matching a line using anchoring in the implimations I have seen does not preclude the following line from being matched even in this example. ^abcd$ will match both lines. abcd abcd In some contexts this is concidered an extention to regular expressions, but it is very useful. |
Izkata 16-Apr-2009 [1729x2] | Also, this is a bit slower, but avoids using complicated parse rules: >> lines: {junk Interface fa0 { ! { interface fa1} == "junk Interface fa0^/!^/interface fa1" >> SplitLines: parse/all lines {^/} ; {^/} is a string containing only the newline character, so this is a list of the separate lines == ["junk Interface fa0" "!" "interface fa1"] >> foreach line SplitLines [ [ if all [ [ not none? find line {interface} ;Find returns none! (equivalent of NULL or NIL) on "!" [ head? find line {interface} ;find goes to the first instance of what is being searched for, and head? checks if it's currently at the beginning of the line [ ][print line] [ ] interface fa1 ;The only match |
(hah, bit late to the party... I see it's gone beyond the simple question now) | |
mhinson 16-Apr-2009 [1731] | there is a lot to be said for straight forward finds & excludes, paticularly if it is done repeatedly on the previous output. I am trying to understand how to use Rebol in a way that will be flexable to read maybe a few hundred Cisco config files & command outputs with perhaps 20 or 30 different types of rules for finding stuff then putting it into a structure that will be easy to search for patterns & extract summeries of information. All the information you might have in a network diagram, but in a text or database format. |
Sunanda 17-Apr-2009 [1732] | One huge parse may be technically neat. But it probably does not match the real world needs. Petr's (and other's advice) to break the problem down into managable (and separately debuggable chunks) is a good approach. And, remember, in the Real World, you may also need to track original line number (before removal of comments and blanks) to be able to give good error messages : "Bad data encountered near line 12,652" |
mhinson 17-Apr-2009 [1733] | I have been studying the code from sqlab but I cant understand it enough to modify it. This is a deconstruction of part of it with my comments added. I would love a hand to understand this a bit more. I cant find any documentation for this sort of thing that I can understand. I have also been trying to retrieve an index number when reading lines so it can be used as suggested by Sunanda. drawn a blank so far. parse/all lines [ ;; parse the whole block called lines /all makes parsing only use values given below ;; I am not sure if this is itteratied or the whole block parsed as one. (wanted: copy []) ;; initalise wanted | some [ ;; one or more matches needed to return true ifa: "interface" some [ ;; ifa is given a string value right in the middle of the parsing code ;; I see why, but not how this is able to slip into the middle here ;; then some starts another block so perhaps the "interface" is used by parse too?? ife: "point-to-point" break ;; no idea how the syntax works here | ife: newline break ;; or here | skip ;; this skips I think till one of the OR conditions are met from below? ] (append/only append wanted copy/part ifa ife interf: copy []) ;; I dont understand what block append/only is working on here ;; append to block wanted using a part copy between ifa & ife but I ;; dont understand the source for the copy | some [ ;; I think perhaps all the below rules are end or search paterns? s: " interface" (interf: copy []) | drule | iprule | norule | pvcrul | pprule | !rule | break ] thru newline ;; final catchall end search pattern. ] ] Sorry to ask so many questions, feel free to throw me out if this is just too much, but I have spent several hours on this fragment allready. Thanks. |
Henrik 17-Apr-2009 [1734] | I think we need to take them a few bits at a time |
mhinson 17-Apr-2009 [1735] | :-) |
Geomol 17-Apr-2009 [1736] | You can parse strings and blocks. The /all refinement is used when parsing strings, not blocks. From your first comments, it seems, you're parsing blocks, so you don't need /all. What is lines? A string or a block? |
Henrik 17-Apr-2009 [1737] | the ife: mentions you have there are not strings that are set in the middle of things. a set-word! will register the current index in the block being parsed. |
mhinson 17-Apr-2009 [1738] | lines is from something like lines: read %file.txt or lines: {line one line2 line3} |
Geomol 17-Apr-2009 [1739x2] | Ok, you're parsing a string then. Then using /all is ok. |
Put the wanted: copy [] up front before you parse. Then drop the first or, |, just before SOME | |
Henrik 17-Apr-2009 [1741x3] | the difference between using a set-word and SET word!: parse [a b c d] [ w1: word! (probe w1) w2: word! (probe w1 probe w2) set w3 word! (probe w1 probe w2 probe w3) w4: word! (probew1 probe w2 probe w3 probe w4/1) ] |
using a get-word! will allow you to change the position of parsing. | |
basically you must remember that a dialect doesn't uphold normal REBOL syntax. | |
mhinson 17-Apr-2009 [1744] | It sounds as if I have missed the understanding of what a dialect is. |
[unknown: 5] 17-Apr-2009 [1745] | are you familiar with SQL? SQL is a form of dialect. |
older newer | first last |