The CPU vs GPU XBOX360 speed disparity.

Today and yesterday I mostly worked on class work. Actually I mostly worked on a single assignment :/

However, I did find time to write the terrain chunk drawing loop, which implicitly builds a quadtree and tests at each recursive level the nodes of the tree against the camera frustum.

I discovered quickly that testing whether every single chunk is in the camera frustum or not is very expensive — 8 ms, which is half the time I have to do all draw thread computation at 60 fps.

In my previous terrain, where I used a quadtree for each chunk, the number of tests to see if a chunk was in the frustum was bounded, since the quadtree didn’t need to recurse all of the way, if the entire terrain was visible, since further blocks would simply be drawn larger.

In the new chunk LOD method, in the worst case, the entire terrain is visible and all blocks need to be drawn and tested, which takes 8 milliseconds to check, using 64×64 chunks and a 1024×1024 terrain heightmap.

On the XBOX (using only 1 core), using XNA’s BoundingSphere and BoundingFrustum classes, I can do about 40,000 sphere/frustum tests per second (600-700 per frame).

The GPU can test whether a triangle is in the frustum on the order of 100,000,000 times per second (over 1 million per frame).

The terrain chunks I’m using are 64×64 = 4096 triangles each. It is actually faster to send thousands of triangles to the GPU than to perform a single intersection test on the CPU. Unless of course, you’re GPU bound and have some CPU cycles to spare.

Exactly how my situation will turn out is to be determined, but I thought I’d emphasize just how large the processing speed difference is for this type of operation, between the CPU and the GPU on the XBOX with XNA.

In an effort to minimize the chunk drawing loop CPU time, I’ve also built a ReusableMinHeap class, which is a garbage-free min heap, which sorts the chunk draw calls for me from closest to furthest — to minimize overdraw. So far, it performs roughly the same as C#’s Array.Sort, which uses quick sort.

I’ll be happy to share the min heap class when the competition is over. Someone remind me.

By the way, here’s what the chunk-LOD terrain looks like so far:

The current state of the terrain as of March 19, 2011. Using 64x64 chunks and a 1024x1024 heightmap.

This entry was posted in Coding, DBP2011, XNA and tagged , , , . Bookmark the permalink.

One Response to The CPU vs GPU XBOX360 speed disparity.

  1. Daniel says:

    Looks awesome! I’ll have to check back to see how you do :)

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>