Q&A- CPU, I/O

QA1 Alignment in CPU memory access...
QA2 osGetTime() and osGetCount()...
QA3 How do I overlay?
QA4 Overlap error...
QA5 Is there a way to do high-speed RAM-to-RAM memory transfers?
QA6 Is access by the byte/word prohibited?
QA7 I'd like a detailed explanation of the RDRAM page register.
QA8 Is there a library that is compatible with the high-resolution pak? (199907)
QA9 The device hangs up during floating decimal point multiplication and subtraction. (199910)


Q1 What is the alignment when the CPU accesses memory?

A1 R4300 has memory access commands which use 8-bit alignment for 8-bit access, 16 for 16, 32 for 32, and 64 for 64. Each is executed in one cycle. There are commands for 32 and 64-bit access which can access memory even if the alignment doesn't match, which are both executed in two cycles.

Since the compiler generates code in accordance with these commands, access on an alignment which corresponds with the access will be executed at high speed and will generate lower volumes of code. However, since access with mismatched alignment is executed through the CPU's internal cache, the overhead isn't that great.

top


Q2 What is the content of the returned values from osGetTime() and osGetCount()?

A2 osGetCount() returns the CPU (R4300) count register, unaltered. This value is in u32 format and takes about 1.5 seconds per cycle. Therefore, processing is possible for a long time by using the OSTime type (same as u64) internal variable in OS and gradually adding the CPU counts taken by the frame in each retrace.

osGetTime() uses this to return the time from the immediately previous cold reset.

top


Q3 How do I assign two or more programs to the same address in RAM and then switch between them at run time.

A3 This can be accomplished by using multi-waves in the spec file.

top


Q4 When I create a spec file according to the method of using overlays in the Programming Manual, I get an overlap error at the makerom command.

A4 The method for using multi-waves is explained in the N64 Programming Manual, Section 10-4. When using overlays, the error should no longer occur if you divide the code between beginwave and endwave in two as described in the method for using multi-waves.

top


Q5 Is there a way to do high-speed RAM-to-RAM memory transfers?

A5 There is no special high-speed memory function for RAM-to-RAM transfers. Memory transfers can only be done via the CPU. However, since transfers between different banks (RDRAM has a 1Mbyte unit bank structure) is faster than transfers within the same bank, it is recommended to do memory configuration when bottlenecks occur in the transfer speed.

top


Q6 Is byte/word access to non-cache areas prohibited?

A6 8-bit/16-bit access to non-cache areas is prohibited.

Access to the various N64 control registers comprises access to non-cache areas (single transfer), where only 32-bit alignment access is supported.

top


Q7 I noticed a passage in the manual stating that "since the RDRAM has a page register for every 1M, it is more efficient to divide the frame buffer and the Z buffer into banks," but it is not clear to me why this would be more efficient. It would be helpful if you could explain the flow of the memory access operation via the page register, or something similar.

A7 First, this is an internal structure of the N64 RDRAM, but the RDRAM used in the N64 handles each 2048 bytes in units called "rows" and one bank is made up of 512 rows.

One bank has a capacity of 2048 bytes x 512 rows = 1 Mbyte, and one RDRAM is made from 2 banks.

1M x 2 = 2Mbytes
and since there are two RDRAM chips in the N64 Control Deck,
2Mbytes x 2 = 4Mbytes
(more accurately, since 1 byte = 9 bits in the N64, there are 36Mbits).

The RDRAM has a "page" for each bank. This "page" describes the "rows" which are currently being accessed. When the RDRAM is accessed from the CPU (RCP), it is this page which is accessed. When the address being accessed is contained in the page, the data can be immediately accessed, but if it is not, it becomes necessary to wait for the "row" corresponding to the address being accessed to become accessible and to then access the data, which causes a delay.

Because of this feature of RDRAM, it is better to divide the Z buffer and the frame buffer into separate banks.

When the RDP is rendering graphics under conditions in which the Z buffer is used, the Z buffer and frame buffer are alternately accessed as follows:

  1. Read frame buffer
  2. Read Z buffer
  3. Write to frame buffer (if necessary)
  4. Write to Z buffer (if necessary)

If the frame buffer and the Z buffer were situated in the same bank the page would have to be modified each time the memory was accessed, causing delays. However, if the Z buffer and frame buffer are divided into separate banks, changes to the pages are decreased, making it harder to decrease the memory access speed.

Additionally, the RDP has a span buffer. Therefore, the RDRAM does not need to be accessed pixel by pixel when the RDP accesses the frame buffer and Z buffer.

top


Q8 Right now, games are being developed that are going to be compatible with Memory Expansion Paks, but are there any libraries available which are compatible with Memory Expansion Paks? Or, can they run on the OS2.0J which at this point is the newest version?

A8 It is called a Memory Expansion Pak, but these are simply 4M expansion RAM and it would be acceptable for you to continue with processing by checking the RAM capacity only.

Please refer to "12. RDRAM Capacity" in the Programming Cautions.

top


Q9 Recently when debugging, a hang-up happened at the following Assembler Code:

MUL.S f0,f0,f2
SUB.S f0,f0,f2
I can understand a hang with division by zero, but I don't really understand why it would with multiplication or subtraction. Please tell me if there is anything that seems to be causing this.

A9 Please look at the fixed decimal point register values f0, f2 if it hangs. On the Partner N64 you can view it with the fpu command or the menu window.

Single precision floating decimal point values have a minimum value of 1.17549435e-38, so if the value is less than this, the operand will be a denormalized number which will cause an unmounted operation exception and it is possible that it could hang.

Also, in the N64, with the floating decimal point control/status register values there are FS=1, V=1, that is, a flash permission on the value that could not be normalized and a permission for the invalid operating exception.

The exceptions under which these occur are:

Also, division by zero exceptions and overflow exceptions are not permitted by default, so when performing calculations which may generate such exceptions, the results will be ±Infinity Symbol, +0, or -0.

That is, in the above instances, it is possible to experience a lock up set off by multiplication. If division by zero exceptions and overflow exceptions are permitted it is possible to determine whether or not these operations were completed. The "fault" in the "demos" would be a good reference concerning methods with this.

top