• Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

World: r4wp

[Rebol School] REBOL School

 doesn't accept block, so put "to" inside the block:

s1: {http://myfile.txt</br>}
s2: {http://myfile.txt</div>}
parse s1 [copy link [to </br> | to </div>] (print link)] ;works

parse s2 [copy link [to </br> | to </div>] (print link)] ;works too
I need something like the following
parse s1 [any [to "http://"copy link [to </br>|to </div>]]
Is it possible ?
Of course it is possible, if I understand if well what you want, 

s1: "a http://xxx</div>b http://yyy</br>"

parse/all s1 [any [to "http://"copy link any [</br> break | </div> 
break | skip] (print link)]]
If you do want to leave out the </br> and </div> substrings, the 
simplest way probably is:

s1: "a http://xxx</div>b http://yyy</br>"

parse/all s1 [any [to "http://"start: any [end: </br> break | </div> 
break | skip] (print copy/part start end)]]
Also note that it is easy to replace the </br> or </div> subrule 
by a more complicated subrule
Thanks God ! ;-)
There is no documentaion about BREAK in PARSE (for R2), so it is 
always difficult to remember for me. Thanks Ladislav.
Guiseppe: if you didn't read this before, here is a very good article:
other articles are also great, take a look at them all.
In Ladislav's examples I am not ablie to understand the use of Break. 
Why it is useful ?

Also in the second example why there isn't a "end:" before "</div> 
break" ?

parse/all s1 [any [to "http://"copy link any [</br> break | </div> 
break | skip] (print link)]]
Could it be written as:

parse/all s1 [any [to "http://"copy link TO any [</br> break | </div> 
break | skip] (print link)]]

parse/all s1 [any [to "http://"copy link any [TO </br> break |TO 
 </div> break | skip] (print link)]]

Finally, which is the purpose of the SKIP keywork in this context 
I use Artisteer to prototype web pages, and it saves content in UTF-8. 
Later on, I need to do few adaptations to such generated pages, so 
I opened it in R2, reparsed, inserted some stuff, deleted other, 
but it did not work out ....
What are my options, apart from doing it in R3?
Use some external tool to convert it to ANSI, do adaptations, and 
covert it back to UTF-8?
Why don't you want to do it in R3? That's the obvious solution
I am trying now. I somehow lost interest in R3, as it is non-finished, 
and dead product. But probably still easier than to use iconv together 
with R2, although I did it in the past that way, using CALL
One of the few advantages of R3 is processing Unicode. It fixed the 
Russian Syllable website
I am somehow not able to load one czech text properly ....
I mean - text I need to input into the resulting file (UTF-8) is 
ANSI. I do print to-string read %text-slider.html, and in R3 console, 
Czech text is not correct ....
I'll try with some other version than rather old view.exe
The console may be broken. How about the actual text, in an editor?
in editor, it's correct. Simply put - I read czech text from an ansi 
file, and it is distorted in console, ditto when writing it back 
to file of course ....
When you cut and paste it from the console, or when you write it 
with REBOL?
So you're saying the input file is not UTF-8?
Yes, ANSI. I solved it by re-saving the same source file as UTF-8 
istead of ANSI. Still a bad complication, as by default, Windows 
sets Notepad to ANSI, so it is a bit inconvenient ...
I am surprised R3 is not able to properly read/decode ANSI file with 
Czech alphabet ...
Guiseppe: "I am not ablie to understand the use of Break. Why it 
is useful ?"
I'll try to explain:

>> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy 
x any [".txt" | ".dat" | skip] (print x)]]

http://a.txthttp://b.dat;it prints just one line, from the first 
http:// to the last .dat

>> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy 
x any [".txt" break | ".dat" break | skip] (print x)]]

http://a.txt;now it works as expected, from http:// to .txt 
and breaks
http://b.dat;and from the next http:// to .dat
Guiseppe: "Could it be written as: ..."
TO ANY doesn't work.
but ANY [TO "..." BREAK | TO "..." BREAK] works.

just be careful using ANY and TO together, because they both don't 
advance the series pointer. So you can easily put the console in 
an infinit loop (escape key also doesn't work)
But still there is a problem in your example. Here I'll try to explain:

>> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy 
x any [thru ".txt" (print 1) break | thru ".dat" (print 2) break 
| skip (print 3)] (print x)]]

it looks correct. but actually it depends on which one is first (.txt 
or .dat)

here is the problem:

>> parse/all "http://a.txthttp://b.dat"[any [to "http://"copy 
x [thru ".dat" (print 1) | thru ".txt" (print 2) | skip (print 3)] 
(print x)]]
hmm.. links look weird in AltME, select all text, copy and paste 
to a text editor to see it correctly.
Petr, R3 can't decode any 8bit encodings with its built-in code, 
just ASCII (which is 7bit) and UTF-8. However, its binary handling 
is better so it should be easy to write your own converters. For 
R2, I would suggest looking at Gabriele's PowerMezz package; it has 
some great text converters. Of course you lose out on R3's PARSE 
if you use R2.
Pekr, look for Oldes' UTF8 package for Rebol 2, I believe it's on 
rebol.org, it can convert anything (it supports downloading code 
pages from net) to/from UTF8, I really saved me lot of time when 
I was working on translations for Windows Vista.
Rebolek - thanks, I forgot about it. I needed it only once in the 
past, and so I used iconv command line tool  via CALL ....
Recently I switched to R3, as I don't need the gui, just a script 
to do some webpage source code post-processing ....
End I have tried. Exchanging the position of .txt and .dat I have 
only a single line. How it could be solved ?
Also, which is the purpose of the SKIP as third option ?
Also in Ladislav's example why there is only one END ?
On my mac the script I made on windows using a couple of international 
characters the chars are also displayed wrong. "Nederlands" "English" 
"Deutsch" "Français"

 "Español" "Italiano" "Português". When I saved as UTF-8 I hoped my 
 problems would have resolved, but then REBOL complained my script 
 had no REBOL header. :-(
In Ladislav's examples I am not ablie to understand the use of Break. 
Why it is useful ?

 - in any [</div> break ...] the BREAK keyword stops searching for 
 the terminator when one was found (</div>). If you don't use BREAK, 
 you simply don't stop searching even if you already found the terminator.
Also in the second example why there isn't a 

end:" before "</div> break" ?" - it is because the first END: was 
already used and the position is remembered. (however, you can use 
end: twice if you like)
You should just remember that end: never fails, so the expression:

    end: </div> break | </br> break | ...

is equivalent to:

    end: [</div> break | </br> break ...]

, i.e., the end: part is known for all alternatives
'Could it be written as:

parse/all s1 [any [to "http://"copy link TO any [</br> break | </div> 
break | skip] (print link)]]' - no, since:

- TO ANY is not supported

- if it were supported it would not do what you want (you want to 
find the first terminator whatever it is, while TO ANY would find 
the </div> if it were in the input text even when a "closer" </br> 
would be "closer"
parse/all s1 [any [to 

http://"copy link any [TO </br> break |TO  </div> break | skip] 
(print link)]]" - this *is* supported, but it does not do what you 
want; it finds the </br> even if </div> occurs "sooner"
Finally, which is the purpose of the SKIP keywork in this context 
 - that is the easiest question. The expression

    any [end: </div> break | </br> break | skip]

simply checks whether it "sees" the </div> terminator. If it does 
then the search for the terminator is over. If it does not then we 
check immediately whether we do not "see" the second possible terminator. 
However, if we are not at the terminator, both alternatives fail 
and the third alternative has to advance to the next position to 
be able to finally find the terminator.
(if you do not advance you cannot expect to find the terminator)
This may be a simpler/more understandable description of the idea:

    terminator: [</div> | </br>]

    find-terminator: [start: any [end: terminator break | skip] (contents: 
    copy/part start end)]
The code is a simplification anyway. It does not work well when the 
rule is expected to fail at the tail of the input if the terminator 
was not found. That would require the REJECT keyword or a more complicated 
The simplest way how to write the FIND-TERMINATOR would be recursive:

find-terminator: [terminator | skip find-terminator]

However, this version is recursive, which means that it fails when 
the search is "long" exceeding the available stack size.
By "it fails" I mean that the recursive expression would not be able 
to find the terminator even if it were present when the length of 
the search would exceed the available stack size.
Arnold: I believe that Rebol/View uses Windows Codepages under Windows, 
MacRoman on OS X and ISO-8859-1 on  Linux. Sadly this means it only 
really supports true ASCII characterrs cross platform unless you 
manage encoding your self.
Ladislav, some questions are still open. I am currently remotely 
connected to my machine. I'll study your "lesson" tomorrow and I'll 