First big optimisation in sprite functions.
It’s been a while that I’m working on various optimisation to some functions to copy tables of data. I found out that the C compiler wasnt optimizing it at all. Here is the original piece of code :
counter = data[frame].spriteNum; for(i=0; i<counter; i++) { //x = data[frame].data[i].nameLow & 0x0f; //y = data[frame].data[i].nameLow >> 4; spriteData.data[i+offset].HPos = data[frame].data[i].HPos + 0x76; spriteData.data[i+offset].VPos = data[frame].data[i].VPos + 0x80; spriteData.data[i+offset].nameLow = data[frame].data[i].nameLow; spriteData.data[i+offset].priority = data[frame].data[i].priority; spriteData.data[i+offset].color = data[frame].data[i].color; } // set big sprite spriteData.prop[offset].properties = 0xaa; spriteData.prop[offset+1].properties = 0xaa;
This was consuming almost 25% of the cpu time between each Vblank. The new code is going about 30 times faster :
heroPreparedSpriteData *myData; OBJECTData *myObjectData; OBJECTData *currentSpriteData OBJECTProp *currentSpriteProp; // init with base address myData = data; // set to current frame for(i=0; i<frame; i++) { myData++; } // init with base address currentSpriteData = (OBJECTData*) &spriteData.data; currentSpriteProp = (OBJECTProp*) &spriteData.prop; // set to current frame for(i=0; i<offset; i++) { currentSpriteData ++; } counter = myData->spriteNum; // init start of the data array // we assume we always starts at 0 index myObjectData = (OBJECTData*) &(myData->data); for(i=0; i<counter; i++) { currentSpriteData->HPos = myObjectData->HPos + 0x76; currentSpriteData->VPos = myObjectData->VPos + 0x80; currentSpriteData->nameLow = myObjectData->nameLow; currentSpriteData->priority = myObjectData->priority; currentSpriteData->color = myObjectData->color; // update myObjectData Adress myObjectData++; currentSpriteData ++; } // set big sprite for the 8 first sprites currentSpriteProp->properties = 0xaa; currentSpriteProp++; currentSpriteProp->properties = 0xaa;
C compilers seems to something really handle the table really badly.
So now I’m going to finish writing the sprite routines since now I have acceptable performance going on. Why acceptable ? Because I’m sure there is a relly more effecient way to perform all this. It’s basically just copying data, so I should get myDta in ROM and DMA it to RAM and from there just updating some values like HPos and VPos.
I keep you updated anyway …
See ya, Lint
September 17th, 2008 at 10:29 am
indeed, i’m a bit curious to know why you haven’t used a DMA transfer …
Another trick you could use to boost performance would be to keep a bitvector of which sprite have changed and which have not (oh, unless you mostly have sprites that need to follow a scrolling screen, in which case their coordinates require an update almost every frame and it might not be worth the hassle of flagging updated sprites)
September 17th, 2008 at 10:31 am
also, you could easily replace
// init with base address
myData = data;
// set to current frame
for(i=0; i<frame; i++) {
myData++;
}
with
myData = data+frame;
that’s the very definition of adding an integer to a pointer.
September 17th, 2008 at 1:47 pm
yes and no … This is really depending to the CPU and compiler. When I do : myData = data+frame; It’s calling a C function that multiply frame with the size of the data since there is no operation availble on the 65816 to do the multiplication. Now it’s true there is a 3 specialized registers on the SNES CPU that allow to do multiplication. I can maybe try that. The multiplication function offered by the compiler is really not efficient at all.
September 22nd, 2008 at 3:06 pm
even less efficient than adding N times the value X to get N*X ?
September 22nd, 2008 at 3:08 pm
yes really … totally inefficient function.