N64 Assembly Tutorial - Lesson 3

Updated: N64 Assembly YouTube Series

The same content updated with more explanation and showing how it's done. Twitch stream style.

https://www.youtube.com/playlist?list=PLjwOF_LvxhqTXVUdWZJEVZxEUG5qt8fsA

Lesson 3

Inspired by Peter Lemon

Setup Video Interface

Unfortunately we haven't seen anything on the screen yet and that is the whole reason anybody would write a game/program for the N64. This tutorial will set the background color and include some suggestions for modifying the color and other details. It's simple but lots of concepts to learn so lets get started.

Since we are writing assembly language we have to explicitly initialize any hardware before we use it. The Video Interface uses memory mapped "Configuration Registers" that are not related to the CPU registers we have already talked about. The Configuration Registers for the Video Interface (VI) start at 0x0440 0000 and run for the next 56 bytes to 0x0440 0037. A lot of example code sets good defaults for the VI either NTSC or PAL and moves on. WARNING: Some combinations may be invalid! Double check your values, and please be careful if you run these tests on actual hardware. I am NOT responsible!

We are going to focus on a couple of assembly instructions and concepts and see how they work.

Next the instructions:

Most of these read right to left except where noted.

Example:

addi t0, t1, 4

In C

t0 = t1 + 4;

nop - No Operation (Delay Slot explanation)

The nop instruction is more of a strategy and optimization instruction. While the nop doesn't do anything it allows the instructions following it to continue filling the multiple stages of the pipeline instead of just waiting. In some of the code below it's used to separate blocks of code to improve readability in the debugger.

In MIPS assembly nop is commonly used to wait for something to finish before continuing. There are 3 types of instructions that require the use of delays.

  • Multiply & Divide Instructions

  • Load & Store: From register to Memory and back.

  • Branch and Jump Instructions

The Memory instructions have the following scenarios:

  • Write to Memory - Code pattern is write to a location of memory, then the next location, Delay slot doesn't really matter.

  • Write then Read Memory - If you really need to do this, it's better to space these actions by several instructions to avoid stalling the pipeline whether it's nop instructions or other code.

  • Read from Memory - Generally takes 1 instruction cycle before the result is available. If other instructions cannot be executed then use the nop instruction.

If you don't account for the Delay in Load/Store from Memory the following may happen:

  • The instruction may use the old value, very hard to debug!

  • The processor pipeline may stall and waste 1 or more instruction cycles.

The Branch and Jump Instructions require the Delay Slot. The processor doesn't know what is going to happen until the result of this instruction is complete, so the next instruction from the pipeline can do something that doesn't depend on the result of the Branch or Jump instruction. It's commonly an instruction that would of executed before the branch that gets moved into the Delay Slot, if none can be found then the nop instruction is used.

lui - Load Upper Immediate

The lui instruction is a way to place a 16-bit constant in the upper 16 bits of a Register, it also sets the lower 16-bits to zero. Naming wise remember the "Immediate" word because there are other related instructions and all of these use Constant values, while the other similar instructions use values from either registers or memory. Since most (all) registers are 32-bit this instruction gets used a lot, although you will see a pattern below using sll in a couple of cases.

Example:

before

t0 = $0000 7C00

lui t0, $A440

after t0 = $A440 0000

add & addi - Add and Add Immediate

These instructions are very similar, the difference is that add uses only registers and addi uses an "Immediate" aka constant value.

Example add:

before

t0 = $A440 0004

t1 = $0000 0004

add t0, t0, t1

after

t0 = $A440 0008

Example addi:

before

t0 = $A440 0004

addi t0, t0, 4

after

t0 = $A440 0008

Example zero initialize, then addi:

before

t0 = $A440 0004

addi t0, r0, 8

after

t0 = $0000 0008

Note:

The addi immediate value is signed, so if it's $8000 or greater it is considered to be negative.

ori - OR Immediate

Officially this is a bit manipulation instruction, similar to the | (pipe character) convention in C, C++ or C#. When the lower 2 bytes of the result register are zero then both addi and ori create the same result with the only difference being that ori supports unsigned numbers and therefore goes up to $FFFF vs addi only going to $7FFF (other values being negative).

sll - Shift Left Logical

This is a bit shifting instruction, similar to the << convention in C, C++ or C#. In the code below it's used for putting bits in to the right position to create a readable value. In other cases it can be used as a fast multiply by powers of 2.

Example:

before

t0 = 0x0000 00FF

sll t0, t0, 16

after

t0 = 0x00FF 0000

sw - Store Word

The Store Word instruction is our first serious work instruction, up to now we have only worked with constants in registers. By writing to RAM we can make things happen, for example changing the screen resolution and color depth.

This instruction has some extra abilities granted by the 'bass' assembler, the use of constants and compile time calculations. These abilities apply to all of the "Immediate" instructions as well.

Note: This is one of the "inverted" syntax instructions because the value occurs before the destination. The load word instruction has the same syntax so it goes the "correct" way.

Example for N64 Init to avoid reseting:

lui t0,PIF_BASE // A0 = PIF Base Register ($BFC00000)

addi t1, r0, 8

sw t1,PIF_RAM+$3C(t0) // PIF_RAM = $7C0

after:

Memory[PIF_BASE + PIF_RAM + $3C] = 8

aka

Memory[0xBFC0 07FC] = 8

bne - Branch Not Equal

This branch instruction is a simple one, but always remember to fill the delay slot! If the 2 register parameters are not equal take the Branch else execute the instruction after the delay slot.

Note: Delay Slot is always executed.

This example is definitely longer but it's a very useful pattern for zero filling any memory range. There will be a similar example below using non-zero values.

Example

setupZeroPIF:

lui t0, PIF_BASE

addi t0, t0, PIF_RAM

addi t1, t0, $3C

loopZeroPIF:

sw r0, 0(t0)

bne t1,t0,loopZeroPIF

addi t0,t0,4

The 3 lines following the setupZeroPIF label setup the start and end memory locations that are going to be zeroed. Starting at the label loopZeroPIF the Store Word Instruction places zero into the first memory location. Then the bne checks if the 2 registers have the same memory location. If they are not equal go back and do the Store Word Instruction after executing the addi in the delay slot. Replace the values in the setup to reuse this block of code.

The bass assemblers syntax

From now on I suggest more writing and less copy & paste, so lets talk about the syntax used by the bass assembler.

  • Assembly Instructions = lower case ONLY

  • Constants are case sensitive

  • binary numbers can be entered using 0b########

  • decimal numbers are entered without any notation

  • Hex values can be either 0x######## or $########

  • To improve readability a number can use an apostrophe anywhere

    • $0123'456'78 or 0b1111'0000

This is also a good time to mention what windows, documents and files I have open, essentially my 'workflow'.

  • Command Prompt

    • cd to the current directory and ran gobass.cmd when I started

    • run the 'make' and 'run debug' commands from here.

  • Text Editor that supports multiple files in their own tabs (Notepad++)

    • N64.INC from the LIB folder

    • Lesson3.asm

    • make.cmd

    • N64_Header.asm

  • PDF Viewer and/or web browser

Time to see something on the screen!

arch n64.cpu

endian msb

output "Lesson4.N64", create

fill 1052672

origin $00000000

base $80000000

include "../LIB/N64.INC"

include "N64_HEADER.ASM"

insert "../LIB/N64_BOOTCODE.BIN"

Start: lui t0,$BFC0

addi t1,r0,8 sw t1,$7FC(t0)

Video_Init:

lui a0, $A440

addi t0, r0, 3 // 2 = 16 BPP, 3 = 32 BPP

// Gamma, Dither, Serrate, Anti-Alias, Diagnostic

sw t0, $0000(a0)

lui t0, $A010 // Frame Buffer RDRAM Location

sw t0, $0004(a0)

addi t0, r0, 320 // Width in Pixels

sw t0, $0008(a0)

addi t0, r0, $200 // VI vertical intr

sw t0, $000C(a0)

addi t0, r0, 352

sw t0, $0010(a0)

// Conflicting documentation, sticking with a known good NTSC Value

lui t0, $3E5

addi t0, t0, $2239

sw t0, $0014(a0)

addi t0, r0, 525 // Number of Half-Lines Per field

sw t0, $0018(a0)

addi t0, r0, 0 // PAL 5-bit Leap pattern, NTSC = 0

sll t0, t0, 16 // Move current value left by 16 bits

addi t0, t0, 3093 // Total Duration Of A Line In 1/4 Pixel

sw t0, $001C(a0) // 28

addi t0, r0, 3093

sll t0, t0, 16

addi t0, t0, 3093

sw t0, $0020(a0) // 32

addi t0, r0, 108

sll t0, t0, 16

addi t0, t0, 748

sw t0, $0024(a0) // 36

addi t0, r0, 37

sll t0, t0, 16

addi t0, t0, 511

sw t0, $0028(a0) // 40

addi t0, r0, 14

sll t0, t0, 16

addi t0, t0, 516

sw t0, $002C(a0) // 44

addi t0, r0, 0 // Horizontal Sub Pixel Offset

sll t0, t0, 16 // Move current value left by 16 bits

addi t0, t0, 512 // Horizontal Scale Up Factor

sw t0, $0030(a0) // 48

addi t0, r0, 0 // Horizontal Sub Pixel Offset

sll t0, t0, 16 // Move current value left by 16 bits

addi t0, t0, 1024 // Vertical Scale Up Factor

sw t0, $0034(a0) // 52

nop // Marker NOP's

nop

setupFrameBufferBackground:

// Buffer Start

lui t1, $A010

// Buffer End

lui t2, $A014

ori t2, t2, $B000

// Red

lui t0, $FF00

// Green

//lui t0, $00FF

// Blue

// ori t0, r0, $FF00

// Transparency

// addiu t0, r0, $00FF

nop // Marker NOP's

nop

loopFrameBuffer:

sw t0, 0(t1)

bne t1,t2,loopFrameBuffer

addi t1,t1,4

nop // Marker NOP

Loop:

j Loop

nop // Delay Slot

This code block ends up being around 90 lines, it seems like a lot (Teaser: Next lesson is placing the proven working code into reusable Macro's).

There are really only a couple of code patterns

  • lui, addi, sw - 2 Parameters 1 upper, 1 lower

  • lui, sw - 1 Upper Parameter

  • addi, sw - 1 Lower Parameter

  • addi,sll,addi,sw - Multiple Parameters

  • Memory Fill Loop

Each of the Video Interface Registers are only a couple of lines, reading the instruction descriptions above, the Memory Map document and debugging should make these patterns clear.

Current Video Status

Once the Video Interface registers are all set the display is defined to be 320 x 240 with 4 bytes of color per pixel. This is an easier mode to experiment with the different colors because each of the Pixels is 4 bytes in RGBA format. A screen with those parameters requires a "framebuffer" of 300 KB which is located at memory address $A010 0000 to $A014 B000.

Challenges

  1. Modify the t0 values to create your favorite color, maybe yellow or purple if yours is too easy.

    1. Hint: lui t0, $FFFF Select the Text

  2. Create a letter box effect Black bars on the top and bottom any color in the middle.

    1. Hint: 16:9 Creates a resolution of 320 x 180

      1. 30 Black Lines

      2. 180 Any Color

      3. 30 Black Lines

    2. Hint: Check for odd pixels and remember to addi t0,t0, -4 for every Pixel.

  3. The flag of Ethiopia? (circa 1975-1987)

    1. 13 Lines of Black

    2. 71 Lines of Green

    3. 71 Lines of Yellow

    4. 71 Lines of Red

    5. 14 Lines of Black

Lesson 2 - Lesson 4