Exponential shadow map filtering (in log space).

It turns out that I was performing a linear filter on my shadow map depths, but I should have been doing the shadow map prefiltering in log space. Whoops. Actually, there seems to be a lot of confusion about how … Continue reading

1 Comment

Exponential shadow mapping drawbacks.

I created a model that could demonstrate some harder cases for the various shadow mapping filters that I experimented with the other day. This model (which is two copies of a sort of church-like building), casts shadows on itself (where … Continue reading

4 Comments

Cascading shadow mapping and ESM.

I implemented shadow cascades today. Below is a screenshot of the current shadows, using two cascades. Note the artifacts that I’ve highlighted with green arrows. About 1/3 of the depth range is represented by the first cascade (and 2/3 in … Continue reading

3 Comments

Shadow map filtering experimentation.

Edit: I demonstrated some harder cases for the important filters here. I also used a brighter light so that it’s easier to see the shadow edges   Over the past two weeks I’ve been working on various parts of my … Continue reading

3 Comments

HDRBlendable Xbox 360 performance and dual paraboloid point light shadow optimization.

While working on dual paraboloid point light shadows, at first I used two R32 textures for the shadow maps, one map per hemisphere of the point light. Here’s a quick glimpse at what point light shadows are currently looking like … Continue reading

127 Comments

Deferred lighting.

My forward rendered lighting setup almost suited my purposes, but it turned out that the limitations of the XBOX CPU limited me to far too few terrain chunks, and therefore far too few lights. To this end, I’ve been exploring … Continue reading

6 Comments

“Game3″ post-mortem. (Or how I learned to stop worrying and love development outside of the DBP competition.)

This game was called “Game3″ just because this is the 3rd XNA project I’ve opened on this computer, and that was the default name. I’ve put in somewhere between 500-900 hours into this project (schoolwork being mixed in there makes … Continue reading

7 Comments

My own physics or BEPU?

Norbo (the creator of the BEPU physics engine) dropped by the day before last to let me know that I should expect much higher performance than what I was seeing in my BEPU tests. He suggested that I should be … Continue reading

1 Comment

Physics engine integrated, and sphere-terrain collision done.

Terrain collision was a little harder than I thought it would be. This is owing to the fact that the terrain is a concave body, and many contacts are possible with a single sphere and the terrain. Contrast this with … Continue reading

3 Comments

Summer classes start today.

Well, I have class again, which means I have to reduce the work on this project slightly. Hopefully there won’t be any serious tests until a week or two after the competition deadline.

Posted in Uncategorized | 3 Comments

The physics engine (made from scratch!).

In the last few days I wrote a physics engine from scratch using a technique that no other 3D XNA physics engine (that I know of) uses yet: Speculative contact continuous collision detection. I’ve been looking for a good physics … Continue reading

2 Comments

Terrain changes (and the first video capture).

If you’re like me then you’re tired of seeing the terrain, and you want to see the gameplay. Sure, I want to work on nothing but the gameplay at this point. However, the gameplay takes place at night — and … Continue reading

1 Comment

VS2010 corruption woes and a new GPU.

I bought a new GPU today, and after installing it and restarting, visual studio decided to stop working, producing random errors (e.g. “Unable to start program. Cannot find file specified.” when trying to run the project). After wasting an hour … Continue reading

1 Comment

Status update: Point light stress testing.

Yesterday I added over 1000 (one thousand) moving point lights to the terrain. On the PC this was fine. The scene partitioning used by my forward rendering setup uses a lot of CPU time though, so on the XBOX this … Continue reading

4 Comments

Last post before my next exam (in 4 days).

I have my last exam on the 28th, so I won’t post again until after that is done. In the meantime, have a look at the current state of the game below. PS3 shaders can use a maximum of 512 … Continue reading

2 Comments

Texturing steep terrain slopes.

In a GPU Gems 3 article (skip to section 1.5) they solve the issue of texturing terrain at various slopes by using triplanar texturing. The following image illustrates the problem. I’ve done something similar to solve the problem of stretching on very … Continue reading

3 Comments

Slope based terrain texturing.

Today I worked on getting the terrain texturing to match the new procedurally created level from yesterday. In the image above, a dead grass texture is applied to flat portions of terrain, a green texture on the edges at intermediate … Continue reading

1 Comment

Procedural level creation.

I spent today creating a method for procedurally building the terrain in a game level. The framework for procedural placement of various objects (e.g. trees) is started, but I won’t have time to finish implementing that today. As usual I … Continue reading

1 Comment

First enemy: Doll.

I haven’t posted in a while since I’ve been working on physics code. Progress on things that I can actually post screenshots of has been slow. Before next month I’m going to buy a video card that is fast enough … Continue reading

4 Comments

Optimizing terrain culling.

The code that culls terrain chunks was using a 3D bounding box vs camera frustum test for each terrain chunk, which was incredibly slow. On the XBOX 360 this was taking about 4ms for ~200 intersection tests. At 60FPS you … Continue reading

3 Comments

Trees and tree collision.

I made this tree last night, and started working on extending the collision testing system (which currently only tests for collisions against the terrain). I do UV mapping in 3D Coat. You might notice that I didn’t map one of … Continue reading

1 Comment

Experimenting with a darker lighting setup.

I bought a new monitor yesterday, and although I have three LCD screens that I’ve been using for testing, none of them had very good contrast or color. I calibrated this new LED/LCD monitor and started doing some testing in … Continue reading

4 Comments

Dozens of point lights with forward rendering.

I’ve spent the last few days working on spatial partitioning, object pooling code (and fighting with stackoverflow about it), and other necessary performance optimization code. Most AAA titles seem to support something like 5-6 dynamic lights. I’m sure you’ve noticed … Continue reading

2 Comments

Avoiding garbage collections on the compact CLR.

I haven’t posted in a few days because I’ve been working on almost purely programming related tasks, which don’t lead to pretty screenshots. I’ve been improving the efficiency of the spatial partitioning scheme, and terrain draw args sorting.

Today though, I (re)wrote an object pool class.

Garbage collections are expensive on the compact CLR, they occur whenever 1MB of garbage is created, and cause noticeable stutter in the game when they happen. For this reason, I simply try not to generate any garbage.

Why we can’t just choose to have collections occur every 16MB or something, I don’t know.

I’ve written custom pools in various places in this project, because creating an efficient generic pool is tricky. Have a look at swampthingtom’s generic object pool. It uses a struct as a container for each pooled item and can only pool types that meet the new() constraint (public parameterless constructor, and no abstract classes allowed).

Having to write a pool for every type though is slowing me down, so I spent some time today thinking about a good design for a generic object pool.

Giving up abstract classes (which is what swampthingtom does with his pool) is going a little too far for me. Also, a significant performance cost is associated with the use of that pool, which makes it less than ideal if you are using it to allocate hundreds of objects or more.

Here’s another approach, which is minimalist, and still uses the new() constraint. This pool is clearly faster, but offers no way to iterate over over the collection of active pooled objects, and again, no abstract classes or constructors with parameters allowed. It’s also basically just an unsafe version of Stack ;)

So I wrote a pool that partitions an array into two parts, the first part for live objects, the second for dead objects. Then I effectively do what the compact CLR does with dead objects — sweep across the dead objects once in a while, and compact them into the second part of the array. As for object initialization, I pass the pool a Func delegate that does the initialization. Delegates are a little slow, but in this case they are called rarely — in fact, they should be called never, if I pick the right pool sizes. The code is barely tested so far and might benefit from another small optimization or two, but I’ll be sure to post it soonish. As usual, I’m also not fond of the idea of posting good code this far from the competition deadline.

By the way, I fought with stackoverflow to try to get some discussion going about ways to iterate over the pool without generating garbage. Most people did not want to hear about avoiding garbage creation. Eventually Eric Lippert came to the rescue and defended my question.

Posted in Coding, DBP2011, XNA | Tagged , , , | 1 Comment

Erratic development speed.

I have a priority queue of game features that need to be completed. E.g.: [-3] Pure black rolling fog to simplify fog shader. [4] Dual paraboid point light shadow map implementation [2] Terrain diffuse lighting based lightning effect. [6] Moon … Continue reading

Leave a comment

Sky effect and bloom filter.

I’ve added a bloom filter, which every big title seems to use these days. The bloom filter takes the brightest parts of the image, blurs them, and mixes the result back into the scene. This makes lights glow, softens bright … Continue reading

1 Comment

The importance of assertion statements.

When implementing a complex system, where there are many variables/objects, I can’t stress enough how important it is to have some sort of assertion tests in your code.
In C# we can write assertion tests that are only run when compiled in Debug mode like so:

public void ReturnPointLight(PointLight3D light)
{
#if DEBUG
    // Assertion tests.

    Debug.Assert(light == _pointLightPool[light.PoolIndex]);
    Debug.Assert(_pointLightAvailable[light.PoolIndex] == false);

    // Make sure that light is not already in available stack.
    foreach (int index in _pointLightAvailableIndices)
    {
        Debug.Assert(index != light.PoolIndex);
    }

    _pointLightAvailable[light.PoolIndex] = true;
#endif
    _pointLightAvailableIndices.Push(light.PoolIndex);
}

That is an excerpt from my LightingManager class, which has a PointLight pool built in.

That function returns a point light to the pool when it is no longer needed. This is roughly the equivalent of doing free(myPointLightInstance) in C or C++ when the instance is no longer being used. Sadly, the compact CLR that the XBOX uses has a terrible garbage collection implementation, so using object pools and allocating and freeing objects from the pools manually is a necessity.
As you can see in this function, several assertions are done.

  • Assert that the light instances’ index matches the index of the internal pool.
  • Assert that the light is actually not available (because the function is meant to make unavailable lights available).
  • Assert that the light is not already stated as being available in the availability stack.

These checks are a bit slow, so I run them only when the DEBUG flag is set.
Edit: Using the DEBUG conditional is not necessary if you only write Debug.Assert() since Debug methods are only called if the DEBUG flag is set automatically, but are useful if you’re writing other code, as I have above.
And actually, the bool[] _pointLightAvailable array is only used for debugging/testing. Setting up some decent sanity checks light this throughout my code makes development much faster in the long run, because I can be confident that things are doing what they are supposed to.

With the lighting manager in particular, it would be easy for the manager to do something unwanted, and cause the lights to LOOK wrong, although the program will not crash. In this type of case, it’s hard to know what’s causing the problem without lengthy code tracing/debugging if you don’t have some pre/post condition assertions in the code.

Posted in Coding, DBP2011, XNA | Tagged | 4 Comments

Terrain point light progress.

I’ve built a GGR (Geographical Grid Registration) component for the terrain, which will permit better draw call arrangement and in particular, better effect state management when drawing many point lights — which is a goal. Since the terrain is arranged … Continue reading

3 Comments

Terrain lighting progress.

In order to light the terrain, normals must be generated from which we can measure the angle of a light ray’s incidence with the terrain geometry, so that we can decide how bright a terrain triangle will be. For the … Continue reading

1 Comment

Fixing up cracks in the terrain.

Since the terrain is made up of varying level of detail chunks, when two chunks of different detail levels meet, their edges do not match up perfectly. This results in gaps/cracks in the terrain. To fix this, I’ve created a … Continue reading

1 Comment

The CPU vs GPU XBOX360 speed disparity.

Today and yesterday I mostly worked on class work. Actually I mostly worked on a single assignment :/ However, I did find time to write the terrain chunk drawing loop, which implicitly builds a quadtree and tests at each recursive … Continue reading

1 Comment

Terrain started, and quadtree vs. block based LOD.

I put pathfinding on hold because when implementing a hexagonal coordinate based path finding algorithm, I realized that there would be various difficulties with dealing with constructs imposed on the world such as the terrain grid or quadtree and any … Continue reading

2 Comments

Path finding: What would Kirk do?

In my previous post about the challenge of path finding on the XBOX with XNA, my A* algorithm benchmarks were abysmally slow. Recall from my previous post that pathfinding was done in 3D, including diagonal movement, allowing 26 possible directions … Continue reading

3 Comments

Every day can’t be a game development day.

My reality is that I can’t make every single day a game development day. Today I had to catch up with class work, and run various personal errands. To maintain momentum though, I did find time to do a little … Continue reading

Leave a comment

A 64GB SSD Is Not Large Enough For Indie Game Developer’s Toolset

  I bought a Kingston 64GB SSD last week and the enormous performance improvement compared to my spinning HDD has been nothing short of breathtaking. When I purchased it I thought that 64GB would be plenty of room for just … Continue reading

1 Comment

The Challenge of Efficient Path Finding on the XBOX360

Yesterday, a friend taking an AI course showed me his midterm exam, and I noticed that there was an A* related question. This got me thinking about A* and after 6 hours of (trying to) make an SMT solver for … Continue reading

6 Comments

GUI Elements: Bars and Notifications.

Today I made a notification manager and some notifiers. This allows the game to intelligently queue notifications like notifying the user that she has a new ability or that an event is about to occur. There are now notifications at … Continue reading

82 Comments

HUD GUI: Health/Energy orbs.

I created the 2D overlay, the 3D holder tentacles, and I diverged somewhat from the draft designs seen in my previous post. The orb contents swirls and is composed of multiple layers — I’ve made numerous optimizations to make this … Continue reading

1 Comment

HUD GUI: Health & Energy Indicators

I actually have several possible indicators drawn out including bars, circular/spherical orbs and diegetic displays. The diegetic display would be a glowing bar drawn on the character in the game world, visible only if a 3rd person camera is always … Continue reading

5 Comments

Menu system started.

I’ve done a menu in XNA before, so working from that code and the gamestate management sample as examples of how to proceed, I built the following menu. Getting all of the animations and transitions right was a surprisingly long … Continue reading

2 Comments

DBP 2011 Entry Started

For the past two years I’ve been writing down specific ideas about a game that I’ve wanted to play for years, but no one has ever made. I have documented everything from user interface design to various algorithms to network … Continue reading

2 Comments

Fields vs. properties performance on the Xbox 360.

I took Sam Allen’s performance test and ran a similar test (below) on the Xbox 360 with XNA.

static string _backing; // Backing store for property
static string Property  // Getter and setter
{
    get
    {
        return _backing;
    }
    set
    {
        _backing = value;
    }
}
static string Field;    // Static field

public FieldPropertyTest(Game game)
    : base(game)
{

    long[] results1 = new long[10];
    long[] results2 = new long[10];

    Property = "string";
    Field = "string";
    const int m = 1000000;
    for (int x = 0; x < 10; x++) // Ten tests
    {
        Stopwatch s1 = new Stopwatch();
        s1.Start();
        for (int i = 0; i < m; i++) // Test property
        {
            Property = "string";
            //if (Property == "cat")
            //{
            //}
        }
        s1.Stop();
        Stopwatch s2 = new Stopwatch();
        s2.Start();
        for (int i = 0; i < m; i++) // Test field
        {
            Field = "string";
            //if (Field == "cat")
            //{
            //}
        }
        s2.Stop();
        results1[x] = s1.ElapsedMilliseconds;
        results2[x] = s2.ElapsedMilliseconds;
    }
}

The results:

Get:
Property: 341 ms
Field: 307 ms

Set:
Property: 85 ms
Field: 66 ms

This performance difference from the more robust .NET CLR used in Allen’s tests is owing to the fact that the .NET Compact CLR (which the Xbox 360 uses) doesn’t inline functions at all — to my knowledge.

Of course, the small size of the numbers above is good news — it took 10 million gets on a field to produce a 34ms difference.
It’s hard to imagine a scenario where using a field in place of a property would be a relatively large enough performance gain to affect your framerate.

There are also the usual caveats with micro-benchmarks like these, and the difference may be smaller (or larger) than what you see above, in real-world uses.

So, I’d take Jeff Atwood’s advice and simply always use (auto-)properties, unless you have a very tight performance sensitive loop that is accessing many properties.

Edit:
The NetCF team blog states that getters/setters are inlined, which is probably a source of some confusion. It’s also possible that my results above are wrong, if I’ve fallen into a micro benchmarking trap. Let me know if you find some results opposing what I’ve written above.

Posted in Coding, XNA | Tagged , , | 4 Comments

Occurences of one string in another.

The string “ABC” occurs in “ABBC” twice, if you remove any characters you wish from “ABBC”.

The following algorithm finds such occurences in O(n) time given any two strings.
One assumption worth noting for the O(n) time bound is that the size of the alphabet is limited to some constant size.

Nested loops in this algorithm scream O(n^2)!! However, those loops are bounded by a constant that is proportional to the size of the alphabet, squared. Since the alphabet is bounded by a constant (by my assumption above), these nested loops will be bounded by a constant — so they don’t take us out of O(n) time.

__author__ = "Kris Olhovsky"
__date__ = "$Dec 29, 2009 6:54:11 PM$"

class Char:
    def __init__(self, char):
        self.char = char
        self.paths = {} # Key is length of paths, value is the # of paths.

    ''' Return the number of paths to this char of length len. '''
    def retrieve_matches(self, len):
        count = 0
        for c in self.paths:
            if len in self.paths[c]:
                count += self.paths[c][len]
                self.paths[c][len] = 0 # Delete recorded matches.
        return count

    ''' Copies paths from char c to this char, adding 1 to the
        length of every path. Paths of length greater than len are ignored. '''
    def add_paths(self, c, len):
        temp = {} # Avoid altering map during iteration.
        for char in c.paths:
            for length in c.paths[char]:
                if length + 1 <= len:
                    if c not in temp:
                        temp[c] = {}
                    if length + 1 not in temp[c]:
                        temp[c][length+1] = 0
                    temp[c][length+1] += c.paths[char][length]

        for k in temp: # Apply changes to map.
            for j in temp[k]:
                if k not in self.paths:
                    self.paths[c] = {}
                if j not in self.paths[k]:
                    self.paths[k][j] = 0
                self.paths[k][j] += temp[k][j]

''' Returns the number of occurences of string s1 in string s2. '''
def subseq(s1, s2):
    if s1 == "": # Special case defined by the problem.
        return 1

    prev = {}   # Maps a char to a dict of chars that appear one index to
                # the left of that char anywhere in s1.

    chars = {}  # All of the chars that appear in s1.

    for c in s1: # Initialize maps.
        prev[c] = {}
        chars[c] = Char(c)

    i = 1
    while i < len(s1): # Populate maps with data from s1.
        prev[s1[i]][s1[i-1]] = chars[s1[i-1]]
        i += 1

    start = Char('start')
    chars[s1[0]].paths[start] = {}
    chars[s1[0]].paths[start][0] = 0
    occurences = 0 # Record number of occurences of s1 in s2.
    for char in s2:

        if char in chars: # A char in s2 matches a char in s1.
            if char in prev[char]: # If same exists, prioritize this.
                chars[char].add_paths(prev[char][char], len(s1) - 1)
            for c in prev[char]:
                if c != char:
                    chars[char].add_paths(prev[char][c], len(s1) - 1)
            if char == s1[0]: # First char in s1 always starts a new path.
                chars[s1[0]].paths[start][0] += 1

            if char == s1[-1]: # Update occurences when last char of s1 seen.
                occurences += chars[char].retrieve_matches(len(s1) - 1)

    return occurences

Here are some example usages:

if __name__ == "__main__":
    s1 = "ABC"
    s2 = "ABBC"
    print subseq(s1, s2)

    s1 = "ABCB"
    s2 = "ABBCBBCB"
    print subseq(s1, s2)

    s1 = "ABCB"
    s2 = "ABCB"
    print subseq(s1, s2)

    s1 = "ABBB"
    s2 = "ABBBBB"
    print subseq(s1, s2)

    s1 = "ABBB"
    s2 = "ABABB"
    print subseq(s1, s2)
Posted in Coding, Misc | Tagged , | Leave a comment

Extract Longest Non-Decreasing Sequence From Any Sequence

I wrote some python code that extracts the longest non-decreasing subsequence from any given sequence.

This runs in O(n log n) time, and uses O(n) memory.

''' An item in the final sequence, used to form a linked list. '''
class SeqItem():
    val = 0      # This item's value.
    prev = None  # The value before this one.
    def __init__(self, val, prev):
        self.val = val
        self.prev = prev

''' Extract longest non-decreasing subsequence from sequence seq.'''
def extract_sorted(seq):
    subseqs = [SeqItem(seq[0], None)] # Track decreasing subsequences in seq.
    result_list = [subseqs[0]]
    for i in range(1, len(seq)):
        result = search_insert(subseqs, seq[i], 0, len(subseqs))

    # Build Python list from custom linked list:
    final_list = []
    result = subseqs[-1] # Longest nondecreasing subsequence is found by
                         # traversing the linked list backwards starting from
                         # the final smallest value in the last nonincreasing
                         # subsequence found.
    while(result != None and result.val != None):
        final_list.append(result.val)
        result = result.prev # Walk backwards through longest sequence.

    final_list.reverse()
    return final_list

''' Seq tracks the smallest value of each nonincreasing subsequence constructed.
Find smallest item in seq that is greater than search_val.
If such a value does not exist, append search_val to seq, creating the beginning
of a new nonincreasing subsequence.
If such a value does exist, replace the value in seq at that position, and
search_val will be considered the new candidate for the longest subseq if
a value in the following nonincreasing subsequence is added.
Seq is guaranteed to be in increasing sorted order.
Returns the index of the element in seq that should be added to results. '''
def search_insert(seq, search_val, start, end):
    median = (start + end)/2

    if end - start < 2: # End of the search.
        if seq[start].val > search_val:
            if start > 0:
                new_item = SeqItem(search_val, seq[start - 1])
            else:
                new_item = SeqItem(search_val, None)

            seq[start] = new_item
            return new_item
        else: # seq[start].val <= search_val
            if start + 1 < len(seq):
                new_item = SeqItem(search_val, seq[start])
                seq[start + 1] = new_item
                return new_item
            else:
                new_item = SeqItem(search_val, seq[start])
                seq.append(new_item)
                return new_item

    if search_val < seq[median].val: # Search left side
        return search_insert(seq, search_val, start, median)
    else: #search_val >= seq[median].val: # Search right side
        return search_insert(seq, search_val, median, end)

And here is an example usage:

import random
if __name__ == '__main__':
    seq = []
    for i in range(100000):
        seq.append(int(random.random() * 1000))

    print extract_sorted(seq)
Posted in Coding, Misc | Tagged , | 9 Comments

Foreach through non-primitive types creates garbage.

Time and again I have seen code like this used in XNA tutorials.

effectDrawBlock.Begin();
foreach (EffectPass pass in effectDrawBlock.CurrentTechnique.Passes)
{
    pass.Begin(); 

    gd.DrawIndexedPrimitives(PrimitiveType.TriangleStrip, 0, 0, 35, 0, 70); 

    pass.End();
}
effectDrawBlock.End();

When you use a foreach over an array of ints (or any value type), no garbage is generated, it’s fast, readable, and often preferable to a for loop.

However, in the above code, “EffectPass pass in” creates a managed effect pass object, used to iterate through the collection. That object then needs to be handled by the garbage collector later.

The fix is to use a for loop when iterating over non-primitive types.
In my case, by changing my code to something more like what you see below, I was able to reduce the number of managed objects generated by my code from 2500/sec to 200/sec as measured by the XNA Framework Remote Perf Monitor.

effectDrawBlock.Begin();
for (int i = 0; i < effectDrawBlock.CurrentTechnique.Passes.Count; i++)
{
    effectDrawBlock.CurrentTechnique.Passes[i].Begin();

    gd.DrawIndexedPrimitives(PrimitiveType.TriangleStrip, 0, 0, 35, 0, 70);

    effectDrawBlock.CurrentTechnique.Passes[i].End();
}
effectDrawBlock.End();
Posted in Coding, XNA | Tagged , | 1 Comment

ToString Garbage Creation in C#

I profiled my XNA game project today using the XNA Framework Remote Perf Monitor and discovered that I was generating about 8000 more manage objects per second than I was expecting to.

It turns out that this code was generating 7400 managed objects per second:

            text[0] = "FPS: " + fps.FPS.ToString();
            text[1] = "charPos: " + charPos.ToString();
            text[2] = "tLOD: " + tLOD.ToString();
            text[3] = "tTris: " + tTris.ToString();
            text[4] = "tBlocksDrawn: " + tBlocksDrawn.ToString();
            text[5] = "tBlocksCulled: " + tBlocksCulled.ToString();
            text[6] = "drawCodeTime: " + drawCodeTime.ToString();

A complicated optimization for code that isn’t even going to get compiled into the release build would be silly. The easiest fix here is to only periodically update the content of these strings (say, once per second, instead of once per frame.)
Also, some of the garbage generation could be reduced by separating the arrays into one array for the static strings “drawCodeTime: ” for example and then drawCodeTime.ToString() in a separate string. That would still leave thousands of objects per second created, sadly.

If you have a game that relies on many string conversions, your best bet is to use StringBuilder objects (which have predefined size) and replace subsections of those strings with the new string using the .Replace method that StringBuilders have built-in.

Posted in Coding, XNA | Tagged , , | 2 Comments

Convert greyscale images to Alpha8 format with a Custom Content Processor in XNA 3.1.

I noticed that at least two terrain engine examples in XNA are reading heightmap images into 4 channel textures instead of single channel textures.

To create a custom content processor that will permit you to convert any Texture2D compatible input format into an Alpha8 format texture, do the following:

Right click your game solution -> add new item
Under XNA 3.1 select Custom Content Pipeline, and replace the default class in this solution with this code:

using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.Xna.Framework.Content.Pipeline;
using Microsoft.Xna.Framework.Content.Pipeline.Processors;
using Microsoft.Xna.Framework.Content.Pipeline.Graphics;
using Microsoft.Xna.Framework.Graphics.PackedVector;
using Microsoft.Xna.Framework;
using System.ComponentModel;

namespace ContentPipelineExtension1
{
    [ContentProcessor]
    [DesignTimeVisible(true)]
    class HeightMapTextureProcessor : ContentProcessor
    {
        ///

        /// Process converts displacement maps to Alpha8 textures.
        /// 

        public override TextureContent Process(TextureContent input,
            ContentProcessorContext context)
        {
            // Convert to data that we can work with:
            input.ConvertBitmapType(typeof(PixelBitmapContent));

            // Select first mipmap, there should only be one:
            MipmapChain mipmapChain = input.Faces[0];

            // There should only be one bitmap, but it doesnt hurt to write this loop:
            foreach (PixelBitmapContent bitmap in mipmapChain)
            {
                for (int x = 0; x < bitmap.Width; x++)
                {
                    for (int y = 0; y < bitmap.Height; y++)
                    {
                        Vector4 pixel = bitmap.GetPixel(x, y);

                        // Move R channel to A channel:
                        bitmap.SetPixel(x, y, new Vector4(0, 0, 0, pixel.X));
                    }
                }
            }

            // Alpha8 contains values from 0 - 1 in the W channel.
            input.ConvertBitmapType(typeof(PixelBitmapContent));

            input.GenerateMipmaps(false);
            return input;
        }
    }
}

Now add a reference to your CustomContentPipeline1 solution in the “Content” module in your game solution.

Right click on the heightmap file(s) that you wish to be converted to Alpha8 and select the HeightMapTextureProcessor as the content processor.

That’s it!

Posted in Coding, XNA | Tagged , , , | 1 Comment

2D CDF 9/7 Wavelet Transform in Python

As promised, here is an implementation of the Cohen-Daubechies-Feauveau 9 tap / 7 tap wavelet transform on a 2D signal in Python. This is the same transform used in the JPEG2000 codec.

'''
2D CDF 9/7 Wavelet Forward and Inverse Transform (lifting implementation)

This code is provided "as is" and is given for educational purposes.
2008 - Kris Olhovsky - code.inquiries@olhovsky.com
'''

from PIL import Image # Part of the standard Python Library

''' Example matrix as a list of lists: '''
mat4x4 = [
         [0,   1,  2,  3], # Row 1
         [4,   5,  6,  7], # Row 2
         [8,   9, 10, 11], # Row 3
         [12, 13, 14, 15], # Row 4
         ]                 # We don't do anything with this matrix.
                           # It's just here for clarification.

def fwt97_2d(m, nlevels=1):
    ''' Perform the CDF 9/7 transform on a 2D matrix signal m.
    nlevel is the desired number of times to recursively transform the
    signal. '''

    w = len(m[0])
    h = len(m)
    for i in range(nlevels):
        m = fwt97(m, w, h) # cols
        m = fwt97(m, w, h) # rows
        w /= 2
        h /= 2

    return m

def iwt97_2d(m, nlevels=1):
    ''' Inverse CDF 9/7 transform on a 2D matrix signal m.
        nlevels must be the same as the nlevels used to perform the fwt.
    '''

    w = len(m[0])
    h = len(m)

    # Find starting size of m:
    for i in range(nlevels-1):
        h /= 2
        w /= 2

    for i in range(nlevels):
        m = iwt97(m, w, h) # rows
        m = iwt97(m, w, h) # cols
        h *= 2
        w *= 2

    return m

def fwt97(s, width, height):
    ''' Forward Cohen-Daubechies-Feauveau 9 tap / 7 tap wavelet transform
    performed on all columns of the 2D n*n matrix signal s via lifting.
    The returned result is s, the modified input matrix.
    The highpass and lowpass results are stored on the left half and right
    half of s respectively, after the matrix is transposed. '''

    # 9/7 Coefficients:
    a1 = -1.586134342
    a2 = -0.05298011854
    a3 = 0.8829110762
    a4 = 0.4435068522

    # Scale coeff:
    k1 = 0.81289306611596146 # 1/1.230174104914
    k2 = 0.61508705245700002 # 1.230174104914/2
    # Another k used by P. Getreuer is 1.1496043988602418

    for col in range(width): # Do the 1D transform on all cols:
        ''' Core 1D lifting process in this loop. '''
        ''' Lifting is done on the cols. '''

        # Predict 1. y1
        for row in range(1, height-1, 2):
            s[row][col] += a1 * (s[row-1][col] + s[row+1][col])
        s[height-1][col] += 2 * a1 * s[height-2][col] # Symmetric extension

        # Update 1. y0
        for row in range(2, height, 2):
            s[row][col] += a2 * (s[row-1][col] + s[row+1][col])
        s[0][col] +=  2 * a2 * s[1][col] # Symmetric extension

        # Predict 2.
        for row in range(1, height-1, 2):
            s[row][col] += a3 * (s[row-1][col] + s[row+1][col])
        s[height-1][col] += 2 * a3 * s[height-2][col]

        # Update 2.
        for row in range(2, height, 2):
            s[row][col] += a4 * (s[row-1][col] + s[row+1][col])
        s[0][col] += 2 * a4 * s[1][col]

    # de-interleave
    temp_bank = [[0]*width for i in range(height)]
    for row in range(height):
        for col in range(width):
            # k1 and k2 scale the vals
            # simultaneously transpose the matrix when deinterleaving
            if row % 2 == 0: # even
                temp_bank[col][row/2] = k1 * s[row][col]
            else:            # odd
                temp_bank[col][row/2 + height/2] = k2 * s[row][col]

    # write temp_bank to s:
    for row in range(width):
        for col in range(height):
            s[row][col] = temp_bank[row][col]

    return s

def iwt97(s, width, height):
    ''' Inverse CDF 9/7. '''

    # 9/7 inverse coefficients:
    a1 = 1.586134342
    a2 = 0.05298011854
    a3 = -0.8829110762
    a4 = -0.4435068522

    # Inverse scale coeffs:
    k1 = 1.230174104914
    k2 = 1.6257861322319229

    # Interleave:
    temp_bank = [[0]*width for i in range(height)]
    for col in range(width/2):
        for row in range(height):
            # k1 and k2 scale the vals
            # simultaneously transpose the matrix when interleaving
            temp_bank[col * 2][row] = k1 * s[row][col]
            temp_bank[col * 2 + 1][row] = k2 * s[row][col + width/2]

    # write temp_bank to s:
    for row in range(width):
        for col in range(height):
            s[row][col] = temp_bank[row][col]

    for col in range(width): # Do the 1D transform on all cols:
        ''' Perform the inverse 1D transform. '''

        # Inverse update 2.
        for row in range(2, height, 2):
            s[row][col] += a4 * (s[row-1][col] + s[row+1][col])
        s[0][col] += 2 * a4 * s[1][col]

        # Inverse predict 2.
        for row in range(1, height-1, 2):
            s[row][col] += a3 * (s[row-1][col] + s[row+1][col])
        s[height-1][col] += 2 * a3 * s[height-2][col]

        # Inverse update 1.
        for row in range(2, height, 2):
            s[row][col] += a2 * (s[row-1][col] + s[row+1][col])
        s[0][col] +=  2 * a2 * s[1][col] # Symmetric extension

        # Inverse predict 1.
        for row in range(1, height-1, 2):
            s[row][col] += a1 * (s[row-1][col] + s[row+1][col])
        s[height-1][col] += 2 * a1 * s[height-2][col] # Symmetric extension

    return s

def seq_to_img(m, pix):
    ''' Copy matrix m to pixel buffer pix.
    Assumes m has the same number of rows and cols as pix. '''
    for row in range(len(m)):
        for col in range(len(m[row])):
            pix[col,row] = m[row][col]

And here is an example usage:

if __name__ == "__main__":
    # Load image.
    im = Image.open("test1_512.png") # Must be a single band image! (grey)

    # Create an image buffer object for fast access.
    pix = im.load()

    # Convert the 2d image to a 1d sequence:
    m = list(im.getdata())

    # Convert the 1d sequence to a 2d matrix.
    # Each sublist represents a row. Access is done via m[row][col].
    m = [m[i:i+im.size[0]] for i in range(0, len(m), im.size[0])]

    # Cast every item in the list to a float:
    for row in range(0, len(m)):
        for col in range(0, len(m[0])):
            m[row][col] = float(m[row][col])

    # Perform a forward CDF 9/7 transform on the image:
    m = fwt97_2d(m, 3)

    seq_to_img(m, pix) # Convert the list of lists matrix to an image.
    im.save("test1_512_fwt.png") # Save the transformed image.

    # Perform an inverse transform:
    m = iwt97_2d(m, 3)

    seq_to_img(m, pix) # Convert the inverse list of lists matrix to an image.
    im.save("test1_512_iwt.png") # Save the inverse transformation.

This is a first step in a larger task of compressing a certain type of vertex data, which can be decompressed entirely by a GPU via an implementation written in shader code.

The reason for a Python implementation is to create a working, easy to understand version of the implementation. Not only for myself, but for the interwebs.

The Python code does use “height” and “width” variables, although in order to use rectangular images, some adjustments need to be made. This code also assumes that your images have width and height of the form 2^n. E.g. 1024 x 1024, 512 x 512, etc.

Sample input image for transforming.

Sample input image for transforming.

Image after a single level CDF 9/7 transform.

Image after a single level CDF 9/7 transform.

2 level CDF 9/7 wavelet transformed image

2 level CDF 9/7 wavelet transformed image

Posted in Coding | Tagged , , , , , , | 11 Comments