r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

Brett
12-Mar-2005
[44x2]
Hi Graham. Line ending in Internet protocols = Apples; Line/Paragraph 
representation in text files = Oranges. :-)

Best not to compare them for this task.  As far as I've seen, Internet 
protocols have an on-the-wire line ending of CRLF.  So yes this is 
very likely a problem with your parse rules.
With the port in binary mode - you must use CRLF to identify/transmit 
line endings.
Graham
12-Mar-2005
[46]
Hi Brett.  Do you then see a problem with me parse rules?
Tomc
12-Mar-2005
[47]
seems link to get all possible raw line endings something  like  
copy line thru [ 1 2 ["^J" | "^M"]]   might work
Graham
12-Mar-2005
[48x2]
my one-line-rule seems to work already ..
copy header thru [ 2 [ "^J" | "^M"] ] ... ?
Tomc
12-Mar-2005
[50]
unix would be a single "^J"
Graham
12-Mar-2005
[51x2]
header is separated from body by one blank line.
So, I need to look for two consecutive line endings to find the end 
of the header
Tomc
12-Mar-2005
[53]
yes ,  my rule was for a single lin as in your first code  sample
Graham
12-Mar-2005
[54]
is ^J the line feed ?
Tomc
12-Mar-2005
[55x2]
copy header ti ["^J^J" | "^M^M" | "^M^J^M^J"]
^J same as ^/
Graham
12-Mar-2005
[57]
>> header-rule: [copy header thru ["^J^J" | "^M^M" | "^M^J^M^J"] 
 (write-client header)]
== [copy header thru ["^/^/" | "^M^M" | {^M
^M
}] (write-client header)]
>> parse m header-rule
** Script Error: Invalid argument:

 |


** Near: parse m header-rule
Tomc
12-Mar-2005
[58x2]
try {^M^/^M^/}
not that it shoulf matter
Graham
12-Mar-2005
[60]
same problem
Tomc
12-Mar-2005
[61x3]
it is interperting the third newline ... odd  buggy odd
parse all?
os use |  2 CRLF
Graham
12-Mar-2005
[64]
>> unix: [ copy header thru "^J^J" ]
== [copy header thru "^/^/"]
>> msdos: [ copy header thru "^/^/" ]
== [copy header thru "^/^/"]
>> parse m [ [ unix | msdos ] ( write-client header ) ]
Tomc
12-Mar-2005
[65]
ms dos  not correct
Graham
12-Mar-2005
[66]
is that correct rule for unix ?
Tomc
12-Mar-2005
[67]
yes
Graham
12-Mar-2005
[68]
what's msdos ?
Tomc
12-Mar-2005
[69x3]
mac ^M^M  dod ^M^J unix ^J
mac is a single ^M for a single line
[2 "^/" | 2 "^M" | 2 CLRF]
Graham
12-Mar-2005
[72x2]
>> parse m [ [2 "^/" | 2 "^M" | 2 crlf ] ( write-client header ) 
]
== false
dos is : join crlf crlf .. isn't it ?
Tomc
12-Mar-2005
[74]
... copy haader theu ...
Graham
12-Mar-2005
[75]
parse m [ copy header thru [2 "^/" | 2 "^M" | 2 crlf ] ( write-client 
header ) ]
** Script Error: Invalid argument: 2
 | 2 crlf
Tomc
12-Mar-2005
[76x2]
copy header [thru 2 "^/" | thru 1 "^M" | thru 2 crlf]
it annoys me the to/thru does not distribure over the block of ORs
Graham
12-Mar-2005
[78]
me too ...
Tomc
12-Mar-2005
[79]
off to the coast for the weekend, got to pack
Graham
12-Mar-2005
[80x3]
ok.
This rule seems to now work for me ...


  header-rule:  [ [ copy header thru {^M^J^M^J} | copy header thru 
  {^/^/} ] (write-client header )]
not sure if the second rule is needed ...
Chris
12-Mar-2005
[83x2]
;I tend to use charsets as a way of skipping 'thru while looking 
for multiple possibilities:
line-end: charset [#"^/" #"^M"] ; etc
line: complement newlines

parse m [copy header [some [some line line-end]]]
a way of skipping
 -> "an alternative to"
Graham
12-Mar-2005
[85]
is it faster?
Chris
12-Mar-2005
[86]
It's hard to compare when 'thru doesn't fork.
Graham
12-Mar-2005
[87]
or is it slower ?
Chris
12-Mar-2005
[88]
For what it's worth, my benchmarks show it to be quick, but my benchmarks 
tend to be crude...
Graham
12-Mar-2005
[89]
line: complement line-end
Chris
12-Mar-2005
[90]
That means any character that isn't a line-ending...
Graham
12-Mar-2005
[91]
yes, you have comlement newlines
Chris
12-Mar-2005
[92x2]
Sorry, changing names as I go :o)
line-end: charset [#"^/" #"^M" #"^J"]
line: complement line-end
parse m [copy header [some [some line line-end]] to end]

; Note that 'line-end in the parse line should be replaced with permutations 
of what a line-ending can be, without describing any permutation 
of a double line-ending.