Tom Grove

Sep 6, 202011 min read

Writing A Hit Game In White Lightning

Frogger is all very well, but we're not going to make it big with poor quality arcade knock-offs. If White Lightning is really going to deliver, it needs to let you make something that stand out along side the best of other developer. It needs to be something like 1983's Manic Miner ( https://en.wikipedia.org/wiki/Manic_Miner ) - one of the earliest hit games on the platform and one that catapulted its teenage author Matt Smith to fame, fortune and ... problems. For Matt Smith and Manic Miner, check out Kim Justice's documentary https://www.youtube.com/watch?v=0GjTJt40VRY - one of the many excellent videos on her channel chronicling the games and personalities of the early British video game scene.

But as to my attempt, I give you the imaginatively titled "Fox Game".

This unimaginatively attempts to copy Manic Miner. It has:

An animated fox!
A number of levels!
... featuring crumbling platforms
...conveyor belts
...hazards
...animated enemies
...keys to collect
...a title screen and attract mode

I.e. the sorts of fun gameplay features that you need to make a hit 1984 spectrum game. But doing this in White Lightning is something of a challenge, even in 2020.

There challenges are, unsurprisingly, centered around:

Architecture
CPU time
Memory

Architecture

As this is a more ambitious title than Frogger, we need slightly more software structure. On the face of it, Forth looks pretty uncooperative - it does not even support arrays, much less useful things like strings and structures. What we ideally want is an architecture that gives us a simple abstraction of game objects - e.g. an array of object records, each looking something like this ( C++ )

struct GameObject
{
    int XPos;
    int YPos;
    int Dx;
    int Dy;
    uint IsAlive: 1 ;     
    :
    etc
    :
    virtual TickDraw(); 
}

We can then step through this array and call TickDraw on every object for which IsAlive is true. Different classes objects can implement TickDraw differently. Something like this was in many games of the 80s and this GDC post-mortem on "Robotron: 2084" describes a more sophisticated version of the same idea https://www.youtube.com/watch?v=90GuCjmNzVI ). The majority of the "Ultimate: Play The Game" ( https://en.wikipedia.org/wiki/Ultimate_Play_the_Game ) spectrum titles use this technique - as do many of the later Rare published titles from the late 80s and 90s.

But how do you do this in Forth? The Forth approach is to simply extend the language to provide the abstractions you need. So, for example, we can do:


( define a compiler word that allocates 2 bytes on the stack and )
( and leaves that address on the stack. The runtime semantics of this )
( word leave the contents of this address on the stack )

: <STRUCT 
	<BUILDS 0 , HERE 2 -
DOES>
	@ ;

( create a compiler word that, at compile time, takes the current )
( value on the stack, stores it in this word, and replaces it with )
( its original value plus an argument. At runtime the stored word )
( is appended to the current value on the stack )
	
: | SWAP DUP @ <BUILDS ,  DUP ROT SWAP +! DOES>  @ + ;

( compiler words defined in terms of the previous for word and bytes )

: W| 2 | ;
: C| 1 | ;

: STRUCT> DROP ;

( an array of N records of a fixed size )

: [] SWAP <BUILDS DUP , * 1 + ALLOT  DOES> DUP @ ROT * + 2 +  ;

This is very Forth. With these esoteric definitions we can now define:

1 CONSTANT ?SPRITE-ISALIVE

<STRUCT OBJ-STRUCT
	W| OBJ-SPN
	W| OBJ-X
	W| OBJ-Y
	W| OBJ-OLDX
	W| OBJ-DX
	W| OBJ-DY
	W| OBJ-CXMIN
	W| OBJ-CYMIN
	W| OBJ-CXMAX
	W| OBJ-CYMAX
	W| OBJ-TICK
	C| OBJ-FLAGS
STRUCT>

And

7 CONSTANT MAX-SPRITES
OBJ-STRUCT MAX-SPRITES [] GAME-SPRITES

This ability to create new compiler words is probably Forth's cleverest feature. The code following <BUILDS effectively constructs the new word and the code following DOES> defines what the colon definition of the newly constructed word should be: i.e. what it does when it is executed at runtime. The word defined by <STRUCT ( e.g. OBJ-STRUCT above ) allocates 2 bytes, which are initially set to 0. Its runtime returns those bytes. It leaves the address of these bytes on the stack at compile time, so that the field words ( W|, C| ) can store and update it. What this effectively means is that the runtime semantics of the word defined by <STRUCT is similar to the C sizeof operator. The runtime semantics of the field words take the offset and add it to the stop of the stack. I.e. they take an address - which is assumed to be a pointer to an OBJ-STRUCT and add an offset.

The array defining word '[]' is similar - it takes two arguments from the stack and stores a field size in its first two bytes. This is followed by field size * array length bytes of data. The runtime semantics use the field size to compute the correct offset into this array.

OBJ-TICK is a pointer to a tick function. We can take the address of a word and store it for later execution, much as you would take a pointer to a function in C.

Working with structures helps address one of the issues I initially had with Forth: that reasoning about the contents of the computation stack is hard. By packing data together in structures one can use this "return stack" idiom.

( checks to do when we are falling )

: PLAYER-FALLING-HITS 
>R
	R PLAYER-CHECK-GROUND  
	GROUND-FLAG @ IF						
		R OBJ-SPN @ 239 AND R OBJ-SPN !
		E-WALKING JUMP !
	ENDIF 
R> DROP ;

As mentioned in an earlier post, Forth uses two stacks - the computation stack which is used to pass parameters between words - and the return stack, used to store the return addresses in the same manner as a conventional call stack. It's possible to store values on the return stack, so long as you are careful to leave it in the same state you found it. A value stored in this way can be retrieved using the "R" word. In the code above PLAYER-FALLING-HITS is a word that expects a pointer to an OBJ-STRUCT. On executing this word, this pointer is immediately stored on the top of the return stack, allowing the following code to just refer to it as "R". At the end of the word, the pointer is thrown away. If you squint this is a bit like referring to an object through the "this" pointer in C++. And while this might not look terribly readable, this is a lot more manageable than passing multiple values through the computation stack. I am not sure that using the return stack is "good" Forth style, though, and code that uses it has the potential to crash hard if the stack is not properly managed.

But this is a win for Forth, I think. Other languages with these sorts of high level features were not available and/or not practical on the machine.

CPU Time

Forth is slow compared to machine code. Oasis felt that this was not going to be a serious problem, since in a real game on a machine without hardware support for sprites or scrolling, most CPU time is going to be spent in graphics code rather than game logic. As long as the graphics routines are efficient, the overhead of Forth was not going to be noticeable. A platform game, though, has a lot of game logic ( compared to , say, Frogger ) and the cost of running this soon adds up - about 1/3 of the code is related to player movement or collision detection. This is mostly simple stuff - checking whether a cell in the background is solid, etc - but the cost adds up. Every word that is executed has an overhead that is often 5 or times the cost of the actual "work" that the word does. And some words are also more expensive than one would like - in placing sprites we are frequently doing operations like:

COL = (int) ( Xpos/8 )
FRAME = Xpos % 8

To find the cell and the appropriately shifted frame.

That's two divisions - arguably the most expensive arithmetic operation you could do on a CPU with no support for arithmetic outside of addition and subtraction. Forth does provide an AND word, so we can at least replace the % with an AND 7, but that still leaves us with the expensive division.

Fortunately, there is a way around this:

HEX

: UNROLL-RSHIFTS 0 DO  CB C, 2C C,        ( rr h )
		       CB C, 1D C,        ( rr l )
		LOOP ;
		
: /8X  CREATE ;CODE 
		  E1 C,                   ( pop hl )
		  AF C,                   ( xor a )
		  3 UNROLL-RSHIFTS        ( 3 unrolled hl shift rights)
		  E5 C,                   ( push hl )
		  C3 C, 45 C, 61 C,       ( jp Next )
		 SMUDGE
		
/8X  /8 SMUDGE

 DECIMAL

;CODE is effectively the native equivalent of DOES> - it allows you to specify the runtime semantics of a word using native instructions rather than Forth. This is only mentioned in passing in the manual's glossary but it is described in the FORTH-83 standard https://www.complang.tuwien.ac.at/forth/fth83std/FORTH83.TXT ) . I would be absolutely amazed if anyone used this in the past, but it's a nice feature - while one can in theory extend White Lightning with custom assembly - Oasis provide a CALL word for that purpose - it won't operate very naturally with Forth. Using ;CODE you can create new words that behave like ordinarily defined Forth words.

In a perfect world, there would be words that behaved like assembler mnemonics - the standard in fact talks about an ASSEMBLER vocabulary ( Vocabularies being something like Forth's equivalent of namespaces ) for this purpose. But White Lightning is not going to include a full z80 assembler, so assembly has to be manual. We can set the number base to hexadecimal and can use other words as macros to unroll loops. The code above creates a new word "/8" that divides the top of the stack by 8 using shift instructions.

The other source of performance problems are the IDEAL words. Unfortunately, these are by no means as efficient as one would like. As mentioned in an earlier post, the sprites are arranged in a linked list. That means that every word that needs to look up a sprite number must walk this list searching for the corresponding sprite id. On top of that, every operation checks that its parameters are valid. An appealing features of White lightning is the sandbox experience it provides when compared to assembly. This lets you experiment with the software through the interactive Forth interface with some degree of confidence. It takes away the worry that every mistake you make will crash the machine and force ( if you're lucky and saved your work, otherwise.... ) you into a lengthy reload from tape. But this comes at the cost of considerable CPU time.

Flexibility also adds a cost; White Lightning does add a lot of words and supports a fairly impressive ( for the time ) range of operations. Getting all this into a reasonable amount of memory means performance is going to be traded for memory. Writing performant graphics code on 8-bit machines involves catering for only a small number of cases - i.e. sprites are always the same size, always composited with the screen/back buffer in the same way etc. Loops must also be unrolled and the memory addresses of key structures chosen to support fast addressing. For white lighting to pack in its 200 words into 8k, most of these have to implemented through slow but reusable routines.

Nowhere is this more obvious than in this game. While the in game speed is borderline acceptable, the level preparation time is very long - taking 10s or so to prepare a new level. All this is doing is looping through a 16x32 element tile map array, looking up a tile image, and copying that image to the appropriate point in a larger background sprite. This wouldn't be instant in assembly, but it should take only a fraction of a second. Re-implementing this in straight Forth ( with no sprite id look-ups or error checking ) is actually twice as fast as going through the "efficient" assembly.

Now, one could re-write all the graphics code to be performant, but this is definitely crosses a line into cheating - I wanted to see how close I could get to a "hit game" just using what's in the box and documented in the manual.

Memory

The third area is memory. First let us look at the memory used by Forth. There is about 7K of memory available for game words. After that, the dictionary pointer will collide with the computation stack and everything will break. During compilation we can log how much memory each section of the code takes and note that the final figure - 7075 - is very close to the limit.

sprite creation 0 
custom assembly 356 
type extensions 568 
variables 217 
sprite array 219 
level decomp 657 
jump curve 178 
player draw 480 
player input 256 
player checks  621 
player hits  588 
player move 272 
sprite tick 446 
sprite draw 69 
sprite check 128 
enemy tick 119 
door tick 36 
door make 210 
key tick 209 
common  73 
make key  209 
make player  74 
make enemy  139 
make sprites  105 
loop code  432 
main 414 
 7075 OK

Of course, it's a lot easier for me to hit this limit than it would have been back in 1984 - there is nothing constraining the amount of source code I can use ... and I have neary 24k. I am also using longer, more descriptive variable names than one would have used back then - these names, while not used at runtime, are still present in the code and cost memory to store. The amount of memory each word takes up is proportional to the number of words used in its definition. Each word needs two bytes to store the address of its code field, with integer literals requiring another two bytes for the literal itself and branch instructions taking a further two bytes for the jump offset. String literals need one byte for length followed by as many bytes as there are characters. We can reduce the amount of code by baking out any integer literals.

It is still pretty easy to hit this code limit, and a few tricks are required to keep the code size manageable. The main trick is to build the project in a couple of steps. Like Frogger, we have some words that are used to build sprites. Once these sprites are built, these words are unused. The sprite generation code contains a lot of literals, and comes in at 777 bytes - pushing it well over the limit. We can run the sprite generation code, then forget ( e.g. erase ) the word used for this. We then save a new base image that includes the newly generated sprites but no additional words.

If we have arrays, we can move these into the sprite memory by treating this memory as a crude heap. I have done this for a couple of the arrays and could move object array ( 219 bytes ) to free a bit more memory for Forth at the expense of sprite memory.

Sprite memory is also tight, though. In-addition to sprites, we are also using the memory to store our level data. IDEAL makes some unfortunate choices in how it renders sprites. Sprites can only be positioned at cell granularity. This makes sense for horizontal positioning - the spectrum screen is 32 characters wide - but not for vertical positioning since it is not a character mapped display and has 192 addressable rows. It should be possible to position a sprite on any of these rows, but this is not something the IDEAL sprite system supports, and instead recommends that you vertically shift the sprites to support sub-cell vertical positioning. In order to position my jumping fox with 2 pixel precision, I need to create 32 4x3 cell sprites, 16 for each direction. This is in-addition to the eight 4x2 sprites I need for the walking animation. This takes a big 3.5k chunk of the 16k of available memory. on top of that I have the background that is built ( slowly! ) from tiles, which requires 4K of memory. A few tiles and enemies bring things close to 10k, leaving less than 6K for level data.

For a hit game, I feel I need to produce twenty screens. The solution, not surprisingly, is to add some simple compression. Each screen consists of 32x16 tiles - and looking at the screens above you will observe that they consist of long runs of the same tile. This makes them ideal candidates for run length encoding. This brings the cost of each screen down to around 200 bytes, so I would in theory be able to get around 20 into about 4K. In theory, because, lazy sod that I am, I made only 4 unique screens in the end, one of which was a rubbish title screen. This compressed level data is all packed into a single sprite. I could have compressed this further as I ended up using fewer than 16 tile types, so would be able to pack a screen into half this amount.

In the end, this was quite a bit of work and the resulting game is far from perfect. I doubt it would be a hit in 1984. But - with a little bit more polish and a little bit more native code - probably marketable. You can see the source here https://github.com/tomgrove/WhiteLightningBlog/tree/master/Jumper

Writing A Hit Game In White Lightning

Architecture

Recent Posts

Comments