r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[!REBOL3]

BrianH
7-Oct-2010
[5196x6]
Here's a low-level function to parse and process script headers, 
which shows how many features are built into the base script model 
in R3:

load-script: funct [
	"Decode a script into [header-obj script-ref body-ref]"

 source [binary! string!] "Source code (string will be UTF-8 encoded)"
	/header "Return the header object only, no script processing"

 ;/check "Calculate checksum and assign it to the header checksum 
 field"
	/original "Use original source for Content header if possible"
] compose [
	data: either string? source [to-binary source] [
		unless find [0 8] tmp: utf? source [ ; Not UTF-8
			cause-error 'script 'no-decode ajoin ["UTF-" abs tmp]
		]
		source
	]

 ; Checksum all the data, even that before the header or outside the 
 block
	;sum: if check [checksum/secure data]  ; saved for later
	
	if tmp: script? data [data: tmp] ; Find the start of the script
	
	; Check for a REBOL header
	set/any [hdr: rst:] transcode/only data
	unless case [
		:hdr = 'rebol [ ; Possible REBOL header
			set/any [hdr rst] transcode/next/error rst
			block? :hdr ; If true, hdr is header spec
		]
		:hdr = [rebol] [ ; Possible script-in-a-block
			set/any [hdr rst] transcode/next/error rst
			if block? :hdr [ ; Is script-in-a-block
				unless header [ ; Don't decode the rest if /header
					data: first transcode/next data
					rst: skip data 2
				]
				true
			] ; If true, hdr is header spec
		]
	] [ ; No REBOL header, use default
		hdr: [] rst: data
	]
	; hdr is the header spec block, rst the position afterwards

 ;assert/type [hdr block! data [binary! block!] rst [binary! block!]]
	;assert [same? head data head rst]
	
	; Make the header object, or fail if we can't

 unless hdr: attempt [construct/with :hdr system/standard/header] 
 [
		cause-error 'syntax 'no-header data
	]
	; hdr is a correct header object! here, or you don't get here

 ;if check [append hdr 'checksum  hdr/checksum: sum]  ; calculated 
 earlier

 ;assert [sum =? select hdr 'checksum]  ; Should hdr/checksum be reserved?
	

 if header [return hdr] ; If /header, no further processing necessary

 ; Note: Some fields may not be final because post-processing is not 
 done.
	
	; Skip any whitespace after the header

 ws: (charset [1 - 32]) ; For whitespace skipping (DEL not included)
	if binary? rst [parse rst [any ws rst:]] ; Skip any whitespace
	
	; Check for compressed data and decompress if necessary
	case [
		; Magic autodetection of compressed binary
		tmp: attempt [decompress rst] [
			data: rst: tmp  ; Use decompressed data (no header source)
			append hdr 'compressed  hdr/compressed: true ; Just in case
		]
		; Else not directly compressed (without encoding)
		(select hdr 'compressed) != true [] ; Not declared, do nothing
		; Else it's declared to be compressed, thus should be
		binary? rst [ ; Regular script, check for encoded binary
			set/any [tmp rst] transcode/next/error rst
			either tmp: attempt [decompress :tmp] [
				data: rst: tmp  ; Use the decoded binary (no header source)
				hdr/compressed: 'script  ; So it saves the same way
				; Anything after the first binary! is ignored
			] [cause-error 'script 'bad-press -3] ; Else failure
		]
		; Else it's a block, check for script-encoded compressed binary
		tmp: attempt [decompress first rst] [

   data: rst: tmp  hdr/compressed: 'script  ; It's binary again now
		]
		; Else declared compressed but not compressed, so fail
		'else [cause-error 'script 'bad-press -3]
	]
	
	; Save the script content in the header if specified
	if :hdr/content = true [
		hdr/content: either original [source] [copy source]
	]
	

 ;assert/type [hdr object! data [binary! block!] rst [binary! block!]]
	;assert [same? head data head rst]

 reduce [hdr data rst]  ; Header object, start of source, start of 
 body
]


Note all the commented assert statements: they are for testing (when 
uncommented) and documentation. Also, I later removed the checksum 
calculation from this code because it was the wrong place to put 
it, at least as far as modules are concerned. However, Carl didn't 
know this because he was working on it while I was offline for a 
few days.
Here is the corresponding function in the code reorg, renamed. The 
friendly empty lines and comments haven't been added yet.

load-header: funct/with [
	"Loads script header object and body binary (not loaded)."

 source [binary! string!] "Source code (a string! will get UTF-8 encoded)"

 no-decompress [logic!] "Skip decompression of body (because we want 
 to look at header mainly)"
][
	; This function decodes the script header from the script body.

 ; It checks the 'checksum, 'compress and 'content fields of the header.

 ; It will set the 'content field to the binary source if 'content 
 is true.

 ; It will set the 'compress field to 'script for compressed embedded 
 scripts.

 ; If body is compressed, it will be decompressed (header required).

 ; Normally, returns the header object and the body text (as binary).

 ; If no-decompress is false and the script is embedded and not compressed
	; then the body text will be a decoded block instead of binary.
	; Errors are returned as words:
	;    no-header
	;    bad-header
	;    bad-checksum
	;    bad-compress
	; Note: set/any and :var used - prevent malicious code errors.
	case/all [
		binary? source [data: assert-utf8 source]
		string? source [data: to binary! source]
		not data: script? data [return reduce [none data]] ; no header

  set/any [key: rest:] transcode/only data none ; get 'rebol keyword

  set/any [hdr: rest:] transcode/next/error data none ; get header 
  block

  not block? :hdr [return 'no-header] ; header block is incomplete

  not attempt [hdr: construct/with :hdr system/standard/header][return 
  'bad-header]

  :hdr/content = true [hdr/content: data] ; as of start of header (??correct 
  position??)
		:key = 'rebol [ ; regular script

   rest: any [find rest non-ws rest] ; skip whitespace after header

   ;rest: any [find rest #[bitset! [not bits #{7FFFFFFF80}]] rest] ; 
   skip whitespace
			case/all [

    all [:hdr/checksum :hdr/checksum != checksum/secure rest] [return 
    'bad-checksum]

    no-decompress [return reduce [hdr rest]] ; decompress not done

    :hdr/compress = 'script [set/any 'rest first transcode/next rest]
			] ; rest is now suspect, use :rest
		]

  :key = [rebol] [ ; embedded script, only 'script compression supported
			case/all [
				:hdr/checksum [return 'bad-checksum] ; checksum not supported

    no-decompress [return reduce [hdr rest]] ; decompress not done

    rest: skip first transcode/next data 2 none ; decode embedded script

    :hdr/compress [hdr/compress: unbind 'script  set/any 'rest first 
    rest]
			] ; rest is now suspect, use :rest
		]

  :hdr/compress [rest: attempt [decompress :rest]] ; :rest type-checked 
  by decompress

  not :rest [return 'bad-compress] ; only happens if above decompress 
  failed
	]

 ;assert/type [hdr object! rest [binary! block!]] ; just for documentation
	reduce [hdr rest]
][
	non-ws: charset [not 1 - 32]
]

Notes:

- The other half of the CASE/all style is a lot of explicit shortcut 
RETURN statements, whenever the normal flow differs.

- Errors are returned as a word from the error catalog, which is 
later passed to CAUSE-ERROR.

- Carl redid the checksum calculation so that scripts can verify 
against a checksum in their header, to detect corruption.

- The checksum in the header probably can't be used for the module 
checksum because the header itself matters for modules.

- Compressed scripts lost a couple minor, unimportant features that 
we are likely better without. Quiz: What features?

- Part, but not all of the reason the code is shorter is because 
the doc comments haven't been added yet. The CASE/all style helps 
though.
- The option of using the original data in the content field is now 
mandatory. If you need to copy it (rare), do so yourself.
I am not yet sure if using FUNCT/with is OK with the new build process 
(haven't heard back), but serialized values are now OK. This is why 
I have some alternate code with a serialized bitset.
The new code is not much less complex than the original, but it is 
more compact and faster too. And it is easier to maintain, because 
rearranging CASE clauses is easier to do without a full reorg than 
nested conditional code.
There are some other micro-optimizations as well in the new code. 
I was writing the original to determine functionality, not trying 
to prematurely optimize.
Anton
10-Oct-2010
[5202]
Thankyou, BrianH. Illuminating.
Pekr
11-Oct-2010
[5203]
Is tasking close now? Express your opinion to proposed interpreter 
RESET functionality - http://www.rebol.net/r3blogs/0340.html
Maxim
12-Oct-2010
[5204x4]
can someone please tell me how we can generate errors in R3.  cause-error 
has no list of appropriate values and everytime I've tried to use 
it it just fails with "you have no clue" errors.
to-error doesn't create armed errors anymore which is a bit strange... 

in the least they are not triggering errors when used within an extension's 
init block.
one function which I would really like to see added to R3 is a search 
function which searches the body of all resident code and returns 
paths or full text of every place an occurence of your search is 
found.
in this case, I could see where cause-error is used and could learn 
from the mezz code.
Gabriele
12-Oct-2010
[5208]
iirc you just pass an error! value to cause-error
Maxim
12-Oct-2010
[5209x5]
It seems    do  to-error "whatever"    also works.


but I'd like to get the list of valid types and expected args for 
cause-error.
the online-docs just say that the lists should be filled in...  ' 
:-/
I'm building a search function, btw.  so far not bad.  still have 
to solve a little unset! issue
here my simple but effective r3 search function:

;------------------------------------------------------------
search-body: funct [
	data [object! block! function!] "what to search"
	word [word!] "what to find"
	/paths "only returns paths, not their values"
	/indents i "how many tabs when listing?"
	/into blk "Add matches to this block"
	/path pth [lit-path!] "keep track of path"
][
	
	i: any [i 0]
	
	unless into [
		set 'searched-objects copy [] ; will set in "globals"
	]
	
	either block? :data [
		b: data
	][
		b: body-of :data
	]
	
	
	; locals
	item: none
	match?: false
	blk: any [blk copy []]
	pth: any [all [pth copy pth] to-lit-path ""]
	last-set-word: none
	counter: 0
	
	foreach item :b [
		counter: counter + 1
		
		result: switch/default type?/word :item [
			set-word! [
				last-set-word: :item
				false
			]
			object! [
				; prevent endless cycles on self or inter references.
				unless find searched-objects :item [
					append searched-objects :item
					either block? data [

      search-body/indents/into/path :item word i + 1 blk append copy pth 
      counter
					][

      search-body/indents/into/path :item word i + 1 blk append copy pth 
      to-word last-set-word
					]
				]
				true
			]
			function! [
				either word = to-word last-set-word [
					; adds the definition OF the searched item

     append/only blk to-lit-path append/only copy pth last-set-word
					append/only blk mold :item
				][
					if search-body/indents/into/path :item word i + 1 blk pth [
						; adds a function WITH the searched item in it

      append/only blk to-lit-path append/only copy pth last-set-word
						append/only blk mold :item
					]
				]
				true
			]
			integer! tuple! string! [
				if last-set-word [
					if word = to-word last-set-word [

      append/only blk to-lit-path append/only copy pth last-set-word
						append/only blk  :item
					]
				]
				true
			]
			block! [

    search-body/indents/into/path :item word i + 1 blk append copy pth 
    counter
				true
			]
			
			; this is what we search for
			word! [
				either :item = word [
					match?: true
				][
					false
				]
			]
			
		][
			; these types are not specifically managed by the search
			false
		]
	]
	either into [
		match?
	][
		set 'quiet-search? false
		new-line/skip blk true 2
	]
]
;----------------------------------------------
in A107... 

search-body system 'red 
== [
    'contexts/system/red: 255.0.0
    'contexts/user/red: 255.0.0
]

search-body system 'error!
== [
    'contexts/system/map: {make function! [[
    "Temporary function to catch MAP usage changes."
][

    make error! {The MAP function has been rename to MAP-EACH. Update 
    your code.}
]]}
    'contexts/system/cause-error: {make function! [[

    {Causes an immediate error throw with the provided information.}
    err-type [word!]
    err-id [word!]
    args
][
    args: compose [(:args)]
    forall args [
        if any-function? first args [
            change/only args spec-of first args
        ]
    ]
    do make error! [
        type: err-type
        id: err-id
        arg1: first args
        arg2: second args
        arg3: third args
    ]
]]}

    'contexts/system/to-error: {make function! [["Converts to error! 
    value." value][to error! :value]]}
]
Henrik
12-Oct-2010
[5214]
a: [a]

parse [] a

R3 quits. Bug?
Maxim
12-Oct-2010
[5215x2]
oops ... the end of the function should be replaced by:

	either into [
		match?
	][
		either paths [
			blk: extract blk 2
			new-line/all blk true
		][
			new-line/skip blk true 2
		]
	]
henrik, any case where R3 just quits is a bug... no?
Henrik
12-Oct-2010
[5217]
I would assume so, but still asking to be sure.
Maxim
12-Oct-2010
[5218x4]
with above changes, one can use search-body()  using  the paths refinement.... 
like so:

>> search-body/paths system 'error!

== [
    'contexts/system/map:
    'contexts/system/cause-error:
    'contexts/system/to-error:
]
I hope the  above function makes it easier for you guys to track 
down where words are being used and defined.  :-)
note, it only accumulates then within objects and functions for now.
in some cases block, might trigger a match but it doesn't seem completely 
functional, but for me, the above is enough.
BrianH
12-Oct-2010
[5222]
Maxim, to answer your questions about cause-error: Three arguments, 
the first two being words from system/catalog/errors, the last one 
eiither being a single value of any type or a block of up to three 
values, depending on which error you are generating. All the info 
you need about a particular error is in system/catalog/errors. The 
number of arguments in the argument block is fixed, per error. The 
usage is in the phrasing of the error message. Pick arguments that 
when molded and put in that position in the error message would make 
the error message make sense.
Maxim
12-Oct-2010
[5223]
thx  :-)
BrianH
12-Oct-2010
[5224x3]
For instance, when I needed to come up with the right error to trigger 
when a function refinement was incompatible with the datatype of 
another argument, there wasn't an explicit error for that. But after 
looking through the catalog, I came up with this:
>> cause-error 'script 'cannot-use [load-module/as block!]
** script error: cannot use load-module/as on block! value
It will do until there is a better error for that situation.
CAUSE-ERROR is mezzanine in R3 and 2.7.7, so just source it :)
Gregg
12-Oct-2010
[5227]
Could the above notes be added to cause-error docs?
BrianH
12-Oct-2010
[5228]
Sounds like a good idea.
Maxim
12-Oct-2010
[5229]
yes, that would be a good first clue, since the current docs give 
no indication on how to proceed right now... 


I should have sourced cause-error, and I usually do... but this time 
well... I guess I assumed it was a native  :-)
BrianH
12-Oct-2010
[5230]
First thing I do when wondering about a function is HELP it. That 
tells me the basics, and also mentions its datatype.
Maxim
12-Oct-2010
[5231x2]
yep.
step 2 is source it  ;-)
BrianH
12-Oct-2010
[5233x2]
Yup :)
Step three is experiments at the console, calling it with test data.
Maxim
12-Oct-2010
[5235]
Q:  does reflect unbind the blocks it returns?
BrianH
12-Oct-2010
[5236]
It unbinds function code blocks, but intentionally binds object word 
blocks.
Maxim
12-Oct-2010
[5237x3]
is there a way to get a bound copy of a function's body?  sometimes, 
its nice to be able to figure out why a sub-function isn't doing 
what its supposed to...


this could be subject to protection schemes... so that a protected 
member cannot be shown via its function body.
this is also true of the stack function... it should not cross any 
protection... since doing so reveals what *it* calls...
(though I'm not saying it currently does just raising up the issue 
if it wasn't planned already)
Andreas
12-Oct-2010
[5240x2]
http://www.rebol.com/r3/docs/functions/cause-error.html
I took Brian's notes from above and edited some basic CAUSE-ERROR 
docs around them.
BrianH
12-Oct-2010
[5242x2]
No bound copy of a function body, for security reasons. The kind 
of hot-patching that was possible in R2 was always a security hole. 
Plus, it's not task-safe. For that matter, BODY-OF always returns 
a copy or constructed value, never the original, and code that currently 
uses it relies on this.
BODY-OF doesn't return the original for objects or modules either.
Maxim
12-Oct-2010
[5244x2]
the copy I don't mind... that's cool, its the fact that it always 
unbinds (which is what you seem to say).  its not a security hole 
if the functions aren't hidden or protected in some way.


I just want to know what a function within a function actually is 
calling... if its unbound... well I can't make any real inspection 
tool or debugger... right now I can go a lot further than R2, except 
this ... unless I didn't properly understand you.
its a bit like a dll, you only have access to the dll within a debugger 
if it was compiled with debugging... I'd like that to still be the 
case within rebol.