The standard gspFast3D microcode contains very precise subpixel x,y calculations for antialiasing and precise s,t calculations for large screen area textures. This precision is required for terrain or background polygons that are large.
This microcode is full featured, including lighting, clipping, texture coordinate generation (reflection mapping).
The geometry microcode has a local vertex cache. Loading a block of verticies can amortize the cost of per vertex calculations (transformation, lighting, texture coordinate computation).
Careful organization of the database can minimize these calculations. In general, it is best to load the vertex cache with as many vertices as possible, then render all the geometry which uses those vertices.
For non-dynamic lighting effects, lighting computations can be calculated at model time, then rendered with simple Gouraud shading.
The gspFast3D microcode does not have enough instruction space to hold lighting and clipping code. It swaps them in from the dram using a least recently used algorithm. Since lighting occurs during vertex load and clipping occurs during polygon drawing, there are natural blocks of work following each ucode load. Loading just a few vertices and then drawing a small number of triangles will cause the gspFast3D microcode loading to "thrash"
Note: We have not seen performance degradation due to this swap in any games. Large block DMA transfers (such as microcode loads) are very efficient.
The cost of geometric processing in the RSP is listed below in the order of decreasing performance.
When possible, use textures to represent complex geometry. The RCP is designed to draw high-quality textured primitives. Achieving complexity by using additional geometry will always be slower than using textures.
When objects get far away or have rapid animation, you can render it with less LOD without noticeable loss of detail.