QA1 | What are the differences between Fast3D / Fast3D.dram / Fast3D.fifo? |
QA2 | Can the microcode be switched in one interrupt? |
QA3 | How do I determine optimum buffer space for the fifo microcode? |
QA4 | What happened to the Z Sort microcode? |
QA5 | Unsupported microcodes... |
QA6 | How do I use the JPEG microcode? |
QA7 | The microcode's rendering processing hangs up... |
QA8 | DEBUG_DL_PRT() (Microcode for F3DEX2 debug) |
Q1 There are currently [three] types of Fast3D microcodes - Fast3D / Fast3D.dram / Fast3D.fifo - but what are the concrete differences between them?
A1 Fast3D / Fast3D.dram / Fast3D.fifo are all full-feature microcodes and they are essentially identical. The place where they differ is in the method of handling data between the RSP and RDP.
Fast3D directly sends the RDP display list created by the RSP to the RDP, using DMEM (the RSP's internal work memory) as a buffer.
In contrast to this, Fast3D.dram and Fast3D.fifo operate by reserving an optional RDP display list buffer (output_buffer) inside RDRAM and having the RDP read this.
Furthermore, while the RDP starts to operate from when RSP has written the entire RDP display list to the output_buffer in Fast3D.dram, in Fast3D.fifo, the RDP operates without waiting for the RSP to finish since the output_buffer is used in a FIFO fashion.
Since, with Fast3D, if you want to overlay RDP processing, since the buffer memory space in DMEM is small, data cannot be added to the RSP and you have to wait for the RSP.
By reserving a big enough output_buffer with Fast3D.dram, the RSP wait is eliminated, but the total RCP processing is not very efficient, consisting of (RSP processing time) + (RDP processing time), and there are problems that develop in a debugging application, so that only the fifo is supported in the upwardly compatible F3DEX series currently released.
The transfer of data between the RSP and RDP is most efficient with Fast3D.fifo. In addition, with respect to the size of the output_buffer and performance, you can temporarily secure an output_buffer size (around 128Kbyte~256Kbyte) and if it isn't enough from a performance standpoint, you can add to it, or if it is sufficient for performance, you can eliminate some of it to find the optimum size for each game. (Use the upwardly compatible gspF3DEX.fifo.)
Q2 Can the microcode be switched in one interrupt (VI retrace)? For instance, if the player is calculated and rendered using the F3DLX.Rej.fifo microcode, can the ground be drawn with F3DEX.fifo? Since it is processing in one interrupt, I don't know when the switch should be made, or how it should be made. Can it even be done?
A2 It can be done. Microcode switches during one interrupt include sound->graphics switching by making the graphics task yield using the scheduler, and graphics->graphics switching within the graphics task. The content of your question deals with graphics->graphics switching, and up until now, you have been switching tasks using the CPU. However, since there are functions which load microcode from the F3DEX1.21 microcode group into the display list, you should be able to switch relatively easily. (Of course, too much loading results in a lot of overhead.)
Q3 I am using the fifo microcode, but how should I determine the buffer size in the main memory to which the RSP writes temporarily? Assuming that I can optimize this buffer, on what kind of data should I base the determination?
A3 Around 100KB are reserved for Super Mario64, but, this could not generally be called the optimum buffer size for fifo. It is more efficient from a speed standpoint to find a size in which the RSP is not caused to wait by the RDP.
However, since that much buffer space is rather large, it is clear that a lot of memory must be reserved in RDRAM.
In other words, this is determined according to these trade-offs, but decide on a tentative buffer amount (128Kbytes, etc.). Then, indicate about how much would be required for the RCP to operate at the same time that a video image that you wish to display is being output.
The buffer size must then be changed based on this to find the optimum size.
Incidentally, following are examples for the F3DEX2 series (excerpted from the F3DEX2 readme file).
gspF3DEX2(.NoN) 0x410 Bytes gspF3DEX2.Rej 0x600 Bytes gspF3DLX2.Rej 0x600 Bytes gspL3DEX2 0x540 Bytes gspS2DEX2 0x800 Bytes
Q4 What is the current status of the Z Sort microcode?
A4 Its release has been announced. It is now a matter of distributing the Z Sort microcode, but the Z Sort microcode libraries have not yet been completed. Consequently, the microcode still hasn't been officially released.
There is no official release schedule for the Z Sort microcode. There are various reasons for this, including the following.
The current latest version 0.34 is available as a beta evaluation version at NTSC-ONLINE.
Note: While you can expect that using this microcode and library will be effective for games which currently cause bottlenecks in the RDP, please note that it will have no effect on those with bottlenecks in the CPU and/or RSP. In addition, even if the RDP processing is affected, this cannot be guaranteed to improve the overall performance of the game application. Please, keep these things in mind when using and evaluating this microcode.
Since the current version 0.34 is a beta version, please be forewarned that the specifications may vastly differ in the future. Now, while the majority of this is outside the scope of support, it would be extremely helpful if you would contact NTSC regarding bugs, etc.
Q5 Which microcodes are currently officially supported?
A5 As of February 1, 1999
(Officially supported microcodes)
(Not supported)
(Supported, but as beta versions) ... accepting test reports, bugs, and requests
Q6 Where can I find the "JPEG Microcode" mentioned in the All Manual?
A6 The JPEG microcode is currently in beta version release. The SGI version has been uploaded to NTSC-ONLINE and can be found there.
Q7 By slightly modifying the program, the microcode will either do rendering processing well or it will hang up.
A7 There are two causes for the program hanging while waiting for microcode processing to finish.
One is that there is a problem in the contents of the display list which is created, causing the RCP to crash.
The other is that the contents of the display list were changed before rendering processing by the RSP was completely done, causing an improper display list to be processed.
Now, program so that the vertex data and matrix data used by the display list are saved until processing by the RSP is completed.
Q8 When I use the F2DEX2 ver. 2.08 debug microcode, the address of the display list currently being processed is returned by the macro DEBUG_DL_PTR(), but can I use this even when rspboot or the audio microcode is loaded?
A8 F3DEX2d writes the physical address of each GBI to the DMEM work memory before processing that GBI. Since this processing is not performed by any other microcodes, it won't make any difference even if that work memory is read by DBUG_DL_PTR().