It’s been a while that I’m working on various optimisation to some functions to copy tables of data. I found out that the C compiler wasnt optimizing it at all. Here is the original piece of code :
counter = data[frame].spriteNum;
for(i=0; i<counter; i++) {
//x = data[frame].data[i].nameLow & 0x0f;
//y = data[frame].data[i].nameLow >> 4;
spriteData.data[i+offset].HPos =
data[frame].data[i].HPos + 0x76;
spriteData.data[i+offset].VPos =
data[frame].data[i].VPos + 0x80;
spriteData.data[i+offset].nameLow =
data[frame].data[i].nameLow;
spriteData.data[i+offset].priority =
data[frame].data[i].priority;
spriteData.data[i+offset].color =
data[frame].data[i].color;
}
// set big sprite
spriteData.prop[offset].properties = 0xaa;
spriteData.prop[offset+1].properties = 0xaa;
This was consuming almost 25% of the cpu time between each Vblank. The new code is going about 30 times faster :
heroPreparedSpriteData *myData;
OBJECTData *myObjectData;
OBJECTData *currentSpriteData
OBJECTProp *currentSpriteProp;
// init with base address
myData = data;
// set to current frame
for(i=0; i<frame; i++) {
myData++;
}
// init with base address
currentSpriteData = (OBJECTData*) &spriteData.data;
currentSpriteProp = (OBJECTProp*) &spriteData.prop;
// set to current frame
for(i=0; i<offset; i++) {
currentSpriteData ++;
}
counter = myData->spriteNum;
// init start of the data array
// we assume we always starts at 0 index
myObjectData = (OBJECTData*) &(myData->data);
for(i=0; i<counter; i++) {
currentSpriteData->HPos = myObjectData->HPos + 0x76;
currentSpriteData->VPos = myObjectData->VPos + 0x80;
currentSpriteData->nameLow = myObjectData->nameLow;
currentSpriteData->priority = myObjectData->priority;
currentSpriteData->color = myObjectData->color;
// update myObjectData Adress
myObjectData++;
currentSpriteData ++;
}
// set big sprite for the 8 first sprites
currentSpriteProp->properties = 0xaa;
currentSpriteProp++;
currentSpriteProp->properties = 0xaa;
C compilers seems to something really handle the table really badly.
So now I’m going to finish writing the sprite routines since now I have acceptable performance going on. Why acceptable ? Because I’m sure there is a relly more effecient way to perform all this. It’s basically just copying data, so I should get myDta in ROM and DMA it to RAM and from there just updating some values like HPos and VPos.
I keep you updated anyway …
See ya, Lint