r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search

World: r3wp

[rebcode] Rebcode discussion

I'll see, if I can make a good ram-file to test with. And maybe a 
little wrapper, so it's possible to monitor, what's going on in the 
Oldes, the older rebcode version wasn't slower, it just had less 
features. We had to change the naming convention of the opcodes to 
add the features.
Just to clear things out regarding performance. This is an emulation 
of a 1MHz cpu. It requires quite some computing power to emulate 
another cpu. To give a hint: an instruction line INX, which increment 
the X register by 1 requires 2 cycles on the 6502. So you can do 
half a million of those instructions on a 1MHz 6502 each second. 
In my emulator, that INX instruction become 11 rebcode instructions 
plus 6 rebcode instructions to control the loop, a total of 17 rebcode 
instructions. And it takes less than half a second to do 1 million 
of those, which is like a 4MHz 6502. So with this initial test, I'll 
say, rebcode is useable.
John; still can't get into 6502.org but the site holds quite a bit 
of source code, from snippets to floating point math by Steve Wozniak.
A first version of a MOS 6502 workbench tool is ready:
do http://www.fys.ku.dk/~niclasen/rebol/language/m6502wb.r

It'll load the 6502 assembler and emulator. It's a tool to compile 
6502 assembler programs to machinecode and run it with the rebcode 
emulator. It's possible to see the 6502 registers and flags. Both 
asm6502.r and em6502.r has been updated.
You'll need REBOL 1.3.50 to run this!!!
Scroll the 65kb ram with arrow-keys, page-up/down, home and end.
It works like this:
1) Write some 6502 asm in the text area. Example:
lda #&80

2) Press the button "Assemble". Now you can see the opcodes in the 
ram at address 0000.

3) Press the button "Begin" to run the emulator with the produced 
machine code and see the results in the registers and flags.
Hm, probably a bad idea to use arrows to navigate ram, because it 
makes them not work in the text area.
A performance test program:

lda #0
sta &1001
lda #0
sta &1002
lda #0
sta &1003
lda &1003
adc #1
sta &1003
lda &1003
bne l3
lda &1002
adc #1
sta &1002
lda &1002
bne l2
lda &1001
adc #1
sta &1001
lda &1001
bne l1

It takes 40s to run on a BBC emulator emulating a 1MHz 6502. It took 
around 14s using the rebcode emulator on my 1.2 GHz G4, and it took 
9.5s using the rebcode emulator on my 2.4GHz Pentium 4.
A similar rebcode performance test program might look like:

ram: make binary! 3
insert/dup ram #"^(00)" 3
looptest: rebcode [/local a] [
	set a 0
	pokez ram 0 a
	label l1
	set a 0
	pokez ram 1 a
	label l2
	set a 0
	pokez ram 2 a
	label l3
	pickz a ram 2
	add a 1
	pokez ram 2 a
	eq a 256
	braf l3
	pickz a ram 1
	add a 1
	pokez ram 1 a
	eq a 256
	braf l2
	pickz a ram 0
	add a 1
	pokez ram 0 a
	eq a 256
	braf l1

It does 16'777'216 loops and takes less than 3 seconds on my 1.2 
GHz G4.
To sum it up:
A 1MHz 6502 takes 40 sec to do 16'777'216 loops of this kind.

Emulating the 6502 using rebcode can do the same thing in 14 sec 
(on a 1.2 GHz G4) and in 9.5 sec (on a 2.4 GHz P4).

A pure rebcode program (no emulation) can do the 'same' 16'777'216 
loops in around 2.7 sec on a 1.2 GHz G4.

So a conclusion might be, that programming in rebcode is like having 
a 40 / 2.7 = 15 MHz cpu (if run on a 1.2 GHz G4). Is this a correct 
Is it known how many cpu clocks, each rebcode instruction use in 
sounds pretty slow?
I'm not sure.
This is just one single test using only a few of the available instructions. 
To have a better view, more tests are needed. I made a similar loop 
in C, compiled it with gcc, and it runs around 6 times faster than 
the pure rebcode version. Initially I won't call rebcode slow, but 
not blasting fast either.
and R3 rebcode si going to be even slower ....
There's something wrong with my compare with a 1MHz 6502. I counted 
the number of cycles in the inner loop and found 17 cycles. A 1MHz 
6502 can then do 1'000'000 / 17 * 40 = 2'352'941 loops in 40 seconds. 
But the BeebEm emulator made 16.7 mio. loops in that time. It should 
have taken 285 sec. So programming in rebcode is more like a 107 
MHz cpu in this test.
(It's probably not correct to measure it this way.)
Rebcode is a higher-level language than 6502 assembler. Perhaps a 
peephole optimizer can rewrite your generated rebcode into better 
equivalent rebcode.
Geomol, i had a look on your emulator code, i think perfs could be 
improved if you delay the update of all flags only when they are 
Good idea! Do you have previous experience with emulators like this, 
because I have none.
in fact the engine is very similar with the z80 one, i think we could 
make a meta-emulator using external data-sheets (one for 6502, one 
for Z80)
i' made a Z80 emulator using rebcode (not complete), you can see 
it in galaga.r on rebol.org
Ah, that was you. Someone mentioned that one lately.
ah BrianH, i remember that you made the same proposal for my z80 
emu (peephole optimzation)
hard to do
interesting to do on ROMs (static analysis before to launch the code) 
but not valuable in RAM because the code can be modified
Steeva, about flags: e.g. the zero flag Z (bit 1 of P). In stead 
of that I set it each time A, X or Y become zero, I could save any 
of those (A, X or Y) in a variable, and then test on that var and 
set the flag correctly, if and when the flag is actual used. Is that 
what you mean?
and limited because on 6502 for example, many branchements are calculated 
(not statics)
yes Geomol, it's that
Flags are calculated on the last accumulator value if i don't do 
ok. One optimization, I consider, is to cross-compile 6502 opcodes 
to rebcode, instead of emulating the 6502. That won't work with self-modifying 
code and branches will be a problem. So it's hard, but I think, it 
might work.
in theory
i give you an example with the TAX opcode

; updating flags in real time

label TAX
seti X A
eq X 0
either [or P 2] [and P 253]
seti i X
and i 128
eq i 128
either [or P 128] [and P 127]
bra continue

;  delay the calcul of flags

label TAX
seti X A

or maskA (2 + 128)     ; remember that we have to recalculate zero 
and negative flags using A, but don't do it now
bra continue
you got the idea ? ;-)
Yup! :)
did you think that using PC as an offset (integer) instead of as 
a serie could be faster ?
I should do a test before saying that
I didn't consider much in deep actually. It can be improved, I'm 
sure. :)
anyone happes to still have a rebol/core binary with rebcode functionality 
archived somewhere?
it's not in the download section of rebol.com anymore ?
ah, got it: http://www.rebol.net/builds/042/rebview1350042.tar.gz
resp. http://www.rebol.net/builds/031/rebview1350031.exefor windows