20.1 Overview of the Audio System

In order for the musician or sound designer to produce sounds and music for the Nintendo64, a short explanation of the audio system is helpful, though not necessary. To that end, a brief description of the audio system is included here. In addition to a brief description of the audio system, several important items the musician should be aware of are listed below.

20.1.1 Brief Description of the Audio System

The audio system for the N64 is composed of a Sound Player (for playing single samples, such as sound effects) and a Sequence Player (for playing music). When the game starts up, it creates and initializes (threads of) a sound player and a sequence player. It then assigns a bank of sound effects to the sound player, and assigns a bank of instruments and a bank of MIDI sequences to the sequence player. To play a sound effect, the game sends a message to the sound player, telling it what sound effect to set as its target, and then sends another message to the sound player, telling it to play the target sound. To play a MIDI sequence, the game must load the sequence data, then attach the sequence to the sequence player, and then send a message to the sequence player to start playing the music.

Note: Musical sequences can be stored as either type 0 MIDI files, or in a compressed MIDI format unique to the N64. It is very important that the programmer and the musician agree on which file format to use.

There are several components to the sound system. First, there are the samples that are stored in ROM. Accompanying the samples are a group of parameters used for playback (Key Mappings, Envelopes, Root Pitch, and so on). In order to process the sounds, a section of the RAM must be allocated for the audio system. However, N64 Audio System differs from many other systems that load grouped audio samples to RAM before playback. It loads a part of samples as the need arises.

In software, there are two main sections. One part runs on the CPU and the other part runs on the RSP. The audio system must share the RSP with the graphics processing. The RSP is where most of the low-level processing takes place, and this is where the samples are mixed into an output stream. This output stream is then fed to a pair of DACs for stereo output.

There are four types of files used by the game for audio production: .ctl, .tbl, .seq, and .sbk. Before the game can play back either sound effects or music, the musician and sound designer must create these files. The .tbl files contain the compressed samples. The .ctl files contain the associated control information necessary for playback. .ctl files and .tbl files are always paired.

The .seq files are MIDI files that have all unneeded events removed, and the .sbk files are banks of .seq files. Typically, there will be at least one pair of .ctl and .tbl files for music, and a separate pair for sound effects. (Although it would be possible to put all sounds into one pair, or alternatively, have numerous pairs.)

The reason that banks are stored in two files is that then the raw audio data doesn't need to be loaded into RAM; only the information pointing to the samples, and the values for the playback parameters. When a sound is to be played, only a small portion of the sample is loaded into a RAM buffer. After it has been used for playback, it can be discarded, and the buffer reused for the next portion of the sample. The result is that a comparatively small amount of RAM is needed for sound.

20.1.2 Typical Development Process

When creating audio for an N64 game, the musician typically follows these steps:

Create the samples as AIFF files.
Encode the samples into AIFC files.
Create a .inst file.
Compile the .inst file, with the samples into the bank files.
Create the MIDI sequence files.
Compile the MIDI sequence files into .seq files, and then compile the .seq files into a .sbk file.
Deliver the .tbl .ctl and .sbk files to the programmer.

20.1.3 Common Values

Throughout this document and when referring to .inst files, several things are kept constant:

Middle C (MIDI note 60) is referred to as C4. (Some synthesizer and software manufactures refer to Middle C as C3.)
Pan values range from 0 to 127, with 0 being full left, 64 center pan, and 127 full right.
Volumes are from 0 to 127, with 0 meaning there will be no sound, and 127 being full volume.