First big optimisation in sprite functions.

It’s been a while that I’m working on various optimisation to some functions to copy tables of data. I found out that the C compiler wasnt optimizing it at all. Here is the original piece of code :

    counter = data[frame].spriteNum;
    for(i=0; i<counter; i++) {
        //x = data[frame].data[i].nameLow & 0x0f;
        //y = data[frame].data[i].nameLow >> 4;

        spriteData.data[i+offset].HPos =
            data[frame].data[i].HPos + 0x76;
        spriteData.data[i+offset].VPos =
            data[frame].data[i].VPos + 0x80;
        spriteData.data[i+offset].nameLow =
            data[frame].data[i].nameLow;
        spriteData.data[i+offset].priority =
            data[frame].data[i].priority;
        spriteData.data[i+offset].color =
            data[frame].data[i].color;
    }

    // set big sprite
    spriteData.prop[offset].properties = 0xaa;
    spriteData.prop[offset+1].properties = 0xaa;

This was consuming almost 25% of the cpu time between each Vblank. The new code is going about 30 times faster :

    heroPreparedSpriteData *myData;
    OBJECTData *myObjectData;
    OBJECTData *currentSpriteData
    OBJECTProp *currentSpriteProp;

    // init with base address
    myData = data;
    // set to current frame
    for(i=0; i<frame; i++) {
        myData++;
    }

    // init with base address
    currentSpriteData = (OBJECTData*) &spriteData.data;
    currentSpriteProp = (OBJECTProp*) &spriteData.prop;
    // set to current frame
    for(i=0; i<offset; i++) {
        currentSpriteData ++;
    }

    counter = myData->spriteNum;

    // init start of the data array
    // we assume we always starts at 0 index
    myObjectData = (OBJECTData*) &(myData->data);

    for(i=0; i<counter; i++) {
        currentSpriteData->HPos = myObjectData->HPos + 0x76;
	currentSpriteData->VPos = myObjectData->VPos + 0x80;
	currentSpriteData->nameLow = myObjectData->nameLow;
	currentSpriteData->priority = myObjectData->priority;
	currentSpriteData->color = myObjectData->color;

	// update myObjectData Adress
	myObjectData++;
	currentSpriteData ++;
    }

    // set big sprite for the 8 first sprites
    currentSpriteProp->properties = 0xaa;
    currentSpriteProp++;
    currentSpriteProp->properties = 0xaa;

C compilers seems to something really handle the table really badly.

So now I’m going to finish writing the sprite routines since now I have acceptable performance going on. Why acceptable ? Because I’m sure there is a relly more effecient way to perform all this. It’s basically just copying data, so I should get myDta in ROM and DMA it to RAM and from there just updating some values like HPos and VPos.

I keep you updated anyway …

See ya, Lint

5 Responses to “First big optimisation in sprite functions.”

  1. sylvainulg Says:

    indeed, i’m a bit curious to know why you haven’t used a DMA transfer …
    Another trick you could use to boost performance would be to keep a bitvector of which sprite have changed and which have not (oh, unless you mostly have sprites that need to follow a scrolling screen, in which case their coordinates require an update almost every frame and it might not be worth the hassle of flagging updated sprites)

  2. sylvainulg Says:

    also, you could easily replace

    // init with base address
    myData = data;
    // set to current frame
    for(i=0; i<frame; i++) {
    myData++;
    }
    with
    myData = data+frame;

    that’s the very definition of adding an integer to a pointer.

  3. lint Says:

    yes and no … This is really depending to the CPU and compiler. When I do : myData = data+frame; It’s calling a C function that multiply frame with the size of the data since there is no operation availble on the 65816 to do the multiplication. Now it’s true there is a 3 specialized registers on the SNES CPU that allow to do multiplication. I can maybe try that. The multiplication function offered by the compiler is really not efficient at all.

  4. sylvainulg Says:

    even less efficient than adding N times the value X to get N*X ?

  5. lint Says:

    yes really … totally inefficient function.