Store sprites in-memory in a way that 0-alpha pixels are not stored.
Examples:
store by scanline
store by tile (ex, 8x8 pixels)
This should be done in a way to also enable mutltithread draw later
Have you considered using a third party library? e.g. SDL supports both software and hardware accelerated blitting. For software surfaces this is done using RLE compression for faster drawing / less memory.
It may be possible to just use the surface / RLE blit part of the library without replacing the existing DirectX code.
The current version we are looking at implementing is actually a form of RLE per scanline (just for anything with 0-alpha, because the run length for almost everything else is just 1 pixel, so its not really worth spending cycles on even considering anything that isnt 0-alpha and thus never displayed anyways), but unfortunately george's existing code has so much other stuff baked into the image storage classes that switching to a 3rd party library is not really feasible.