N64 Assembly Tutorial - Lesson 3
Updated: N64 Assembly YouTube Series
The same content updated with more explanation and showing how it's done. Twitch stream style.
https://www.youtube.com/playlist?list=PLjwOF_LvxhqTXVUdWZJEVZxEUG5qt8fsA
Lesson 3
Setup Video Interface
Unfortunately we haven't seen anything on the screen yet and that is the whole reason anybody would write a game/program for the N64. This tutorial will set the background color and include some suggestions for modifying the color and other details. It's simple but lots of concepts to learn so lets get started.
Since we are writing assembly language we have to explicitly initialize any hardware before we use it. The Video Interface uses memory mapped "Configuration Registers" that are not related to the CPU registers we have already talked about. The Configuration Registers for the Video Interface (VI) start at 0x0440 0000 and run for the next 56 bytes to 0x0440 0037. A lot of example code sets good defaults for the VI either NTSC or PAL and moves on. WARNING: Some combinations may be invalid! Double check your values, and please be careful if you run these tests on actual hardware. I am NOT responsible!
We are going to focus on a couple of assembly instructions and concepts and see how they work.
Next the instructions:
Most of these read right to left except where noted.
Example:
addi t0, t1, 4
In C
t0 = t1 + 4;
nop - No Operation (Delay Slot explanation)
The nop instruction is more of a strategy and optimization instruction. While the nop doesn't do anything it allows the instructions following it to continue filling the multiple stages of the pipeline instead of just waiting. In some of the code below it's used to separate blocks of code to improve readability in the debugger.
In MIPS assembly nop is commonly used to wait for something to finish before continuing. There are 3 types of instructions that require the use of delays.
Multiply & Divide Instructions
Load & Store: From register to Memory and back.
Branch and Jump Instructions
The Memory instructions have the following scenarios:
Write to Memory - Code pattern is write to a location of memory, then the next location, Delay slot doesn't really matter.
Write then Read Memory - If you really need to do this, it's better to space these actions by several instructions to avoid stalling the pipeline whether it's nop instructions or other code.
Read from Memory - Generally takes 1 instruction cycle before the result is available. If other instructions cannot be executed then use the nop instruction.
If you don't account for the Delay in Load/Store from Memory the following may happen:
The instruction may use the old value, very hard to debug!
The processor pipeline may stall and waste 1 or more instruction cycles.
The Branch and Jump Instructions require the Delay Slot. The processor doesn't know what is going to happen until the result of this instruction is complete, so the next instruction from the pipeline can do something that doesn't depend on the result of the Branch or Jump instruction. It's commonly an instruction that would of executed before the branch that gets moved into the Delay Slot, if none can be found then the nop instruction is used.
lui - Load Upper Immediate
The lui instruction is a way to place a 16-bit constant in the upper 16 bits of a Register, it also sets the lower 16-bits to zero. Naming wise remember the "Immediate" word because there are other related instructions and all of these use Constant values, while the other similar instructions use values from either registers or memory. Since most (all) registers are 32-bit this instruction gets used a lot, although you will see a pattern below using sll in a couple of cases.
Example:
before
t0 = $0000 7C00
lui t0, $A440
after t0 = $A440 0000
add & addi - Add and Add Immediate
These instructions are very similar, the difference is that add uses only registers and addi uses an "Immediate" aka constant value.
Example add:
before
t0 = $A440 0004
t1 = $0000 0004
add t0, t0, t1
after
t0 = $A440 0008
Example addi:
before
t0 = $A440 0004
addi t0, t0, 4
after
t0 = $A440 0008
Example zero initialize, then addi:
before
t0 = $A440 0004
addi t0, r0, 8
after
t0 = $0000 0008
Note:
The addi immediate value is signed, so if it's $8000 or greater it is considered to be negative.
ori - OR Immediate
Officially this is a bit manipulation instruction, similar to the | (pipe character) convention in C, C++ or C#. When the lower 2 bytes of the result register are zero then both addi and ori create the same result with the only difference being that ori supports unsigned numbers and therefore goes up to $FFFF vs addi only going to $7FFF (other values being negative).
sll - Shift Left Logical
This is a bit shifting instruction, similar to the << convention in C, C++ or C#. In the code below it's used for putting bits in to the right position to create a readable value. In other cases it can be used as a fast multiply by powers of 2.
Example:
before
t0 = 0x0000 00FF
sll t0, t0, 16
after
t0 = 0x00FF 0000
sw - Store Word
The Store Word instruction is our first serious work instruction, up to now we have only worked with constants in registers. By writing to RAM we can make things happen, for example changing the screen resolution and color depth.
This instruction has some extra abilities granted by the 'bass' assembler, the use of constants and compile time calculations. These abilities apply to all of the "Immediate" instructions as well.
Note: This is one of the "inverted" syntax instructions because the value occurs before the destination. The load word instruction has the same syntax so it goes the "correct" way.
Example for N64 Init to avoid reseting:
lui t0,PIF_BASE // A0 = PIF Base Register ($BFC00000)
addi t1, r0, 8
sw t1,PIF_RAM+$3C(t0) // PIF_RAM = $7C0
after:
Memory[PIF_BASE + PIF_RAM + $3C] = 8
aka
Memory[0xBFC0 07FC] = 8
bne - Branch Not Equal
This branch instruction is a simple one, but always remember to fill the delay slot! If the 2 register parameters are not equal take the Branch else execute the instruction after the delay slot.
Note: Delay Slot is always executed.
This example is definitely longer but it's a very useful pattern for zero filling any memory range. There will be a similar example below using non-zero values.
Example
setupZeroPIF:
lui t0, PIF_BASE
addi t0, t0, PIF_RAM
addi t1, t0, $3C
loopZeroPIF:
sw r0, 0(t0)
bne t1,t0,loopZeroPIF
addi t0,t0,4
The 3 lines following the setupZeroPIF label setup the start and end memory locations that are going to be zeroed. Starting at the label loopZeroPIF the Store Word Instruction places zero into the first memory location. Then the bne checks if the 2 registers have the same memory location. If they are not equal go back and do the Store Word Instruction after executing the addi in the delay slot. Replace the values in the setup to reuse this block of code.
The bass assemblers syntax
From now on I suggest more writing and less copy & paste, so lets talk about the syntax used by the bass assembler.
Assembly Instructions = lower case ONLY
Constants are case sensitive
binary numbers can be entered using 0b########
decimal numbers are entered without any notation
Hex values can be either 0x######## or $########
To improve readability a number can use an apostrophe anywhere
$0123'456'78 or 0b1111'0000
This is also a good time to mention what windows, documents and files I have open, essentially my 'workflow'.
Command Prompt
cd to the current directory and ran gobass.cmd when I started
run the 'make' and 'run debug' commands from here.
Text Editor that supports multiple files in their own tabs (Notepad++)
N64.INC from the LIB folder
Lesson3.asm
make.cmd
N64_Header.asm
PDF Viewer and/or web browser
mips-isa.pdf This is the official MIPS Instruction set documentation
file:///C:/bass/doc/bass.html The compiler documentation.
https://github.com/mikeryan/n64dev/blob/master/docs/n64ops/n64ops%23h.txt
The N64 Memory Map and Hardware Register Documentation
Time to see something on the screen!
arch n64.cpu
endian msb
output "Lesson4.N64", create
fill 1052672
origin $00000000
base $80000000
include "../LIB/N64.INC"
include "N64_HEADER.ASM"
insert "../LIB/N64_BOOTCODE.BIN"
Start: lui t0,$BFC0
addi t1,r0,8 sw t1,$7FC(t0)
Video_Init:
lui a0, $A440
addi t0, r0, 3 // 2 = 16 BPP, 3 = 32 BPP
// Gamma, Dither, Serrate, Anti-Alias, Diagnostic
sw t0, $0000(a0)
lui t0, $A010 // Frame Buffer RDRAM Location
sw t0, $0004(a0)
addi t0, r0, 320 // Width in Pixels
sw t0, $0008(a0)
addi t0, r0, $200 // VI vertical intr
sw t0, $000C(a0)
addi t0, r0, 352
sw t0, $0010(a0)
// Conflicting documentation, sticking with a known good NTSC Value
lui t0, $3E5
addi t0, t0, $2239
sw t0, $0014(a0)
addi t0, r0, 525 // Number of Half-Lines Per field
sw t0, $0018(a0)
addi t0, r0, 0 // PAL 5-bit Leap pattern, NTSC = 0
sll t0, t0, 16 // Move current value left by 16 bits
addi t0, t0, 3093 // Total Duration Of A Line In 1/4 Pixel
sw t0, $001C(a0) // 28
addi t0, r0, 3093
sll t0, t0, 16
addi t0, t0, 3093
sw t0, $0020(a0) // 32
addi t0, r0, 108
sll t0, t0, 16
addi t0, t0, 748
sw t0, $0024(a0) // 36
addi t0, r0, 37
sll t0, t0, 16
addi t0, t0, 511
sw t0, $0028(a0) // 40
addi t0, r0, 14
sll t0, t0, 16
addi t0, t0, 516
sw t0, $002C(a0) // 44
addi t0, r0, 0 // Horizontal Sub Pixel Offset
sll t0, t0, 16 // Move current value left by 16 bits
addi t0, t0, 512 // Horizontal Scale Up Factor
sw t0, $0030(a0) // 48
addi t0, r0, 0 // Horizontal Sub Pixel Offset
sll t0, t0, 16 // Move current value left by 16 bits
addi t0, t0, 1024 // Vertical Scale Up Factor
sw t0, $0034(a0) // 52
nop // Marker NOP's
nop
setupFrameBufferBackground:
// Buffer Start
lui t1, $A010
// Buffer End
lui t2, $A014
ori t2, t2, $B000
// Red
lui t0, $FF00
// Green
//lui t0, $00FF
// Blue
// ori t0, r0, $FF00
// Transparency
// addiu t0, r0, $00FF
nop // Marker NOP's
nop
loopFrameBuffer:
sw t0, 0(t1)
bne t1,t2,loopFrameBuffer
addi t1,t1,4
nop // Marker NOP
Loop:
j Loop
nop // Delay Slot
This code block ends up being around 90 lines, it seems like a lot (Teaser: Next lesson is placing the proven working code into reusable Macro's).
There are really only a couple of code patterns
lui, addi, sw - 2 Parameters 1 upper, 1 lower
lui, sw - 1 Upper Parameter
addi, sw - 1 Lower Parameter
addi,sll,addi,sw - Multiple Parameters
Memory Fill Loop
Each of the Video Interface Registers are only a couple of lines, reading the instruction descriptions above, the Memory Map document and debugging should make these patterns clear.
Current Video Status
Once the Video Interface registers are all set the display is defined to be 320 x 240 with 4 bytes of color per pixel. This is an easier mode to experiment with the different colors because each of the Pixels is 4 bytes in RGBA format. A screen with those parameters requires a "framebuffer" of 300 KB which is located at memory address $A010 0000 to $A014 B000.
Challenges
Modify the t0 values to create your favorite color, maybe yellow or purple if yours is too easy.
Hint: lui t0, $FFFF Select the Text
Create a letter box effect Black bars on the top and bottom any color in the middle.
Hint: 16:9 Creates a resolution of 320 x 180
30 Black Lines
180 Any Color
30 Black Lines
Hint: Check for odd pixels and remember to addi t0,t0, -4 for every Pixel.
The flag of Ethiopia? (circa 1975-1987)
13 Lines of Black
71 Lines of Green
71 Lines of Yellow
71 Lines of Red
14 Lines of Black