<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Olhovsky</title>
	<atom:link href="http://www.olhovsky.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.olhovsky.com</link>
	<description>Programming, meet art.</description>
	<lastBuildDate>Mon, 23 Jan 2012 05:19:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
		<item>
		<title>Exponential shadow map filtering (in log space).</title>
		<link>http://www.olhovsky.com/2011/07/exponential-shadow-map-filtering-in-hlsl/</link>
		<comments>http://www.olhovsky.com/2011/07/exponential-shadow-map-filtering-in-hlsl/#comments</comments>
		<pubDate>Wed, 20 Jul 2011 21:20:11 +0000</pubDate>
		<dc:creator>olhovsky</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[ESM]]></category>
		<category><![CDATA[exponential shadow mapping]]></category>
		<category><![CDATA[hlsl]]></category>
		<category><![CDATA[shadow mapping]]></category>

		<guid isPermaLink="false">http://www.olhovsky.com/?p=888</guid>
		<description><![CDATA[It turns out that I was performing a linear filter on my shadow map depths, but I should have been doing the shadow map prefiltering in log space. Whoops. Actually, there seems to be a lot of confusion about how &#8230; <a href="http://www.olhovsky.com/2011/07/exponential-shadow-map-filtering-in-hlsl/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>It turns out that I was performing a linear filter on my shadow map depths, but I should have been doing the shadow map prefiltering in log space. Whoops.</p>
<p>Actually, there seems to be a lot of confusion about how to filter in log space when outputting linear depth, and why you need to. Let me clear things up.</p>
<p>There are two approaches to exponential shadow mapping:</p>
<p><strong>1. Output exponential light depth map, prefilter linearly.</strong></p>
<p>You can create a depth map for the light by outputting <em>exp(depth)</em> and then perform the typical gaussian/box/tent prefiltering as normal. In this case, when drawing the light, the ESM filtering is done like so:</p>
<pre>float occluder = tex2D(shadowMap, texCoords);
float lit = occluder / exp(c*reciever);</pre>
<p>The advantage of this method is that the prefiltering is less expensive than the next method. Hardware bilinear, anisotropic and mip filtering all automatically filter the shadow map correctly.</p>
<p>The disadvantage of this method is that it is precision hungry, because <em>exp(depth)</em> varies quickly. A 32 bit floating point texture allows a value of <em>c</em>=88 before overflow errors start to occur. 16 bits is not enough precision with this method in my experience as contact light leaking is very problematic.</p>
<p>&nbsp;</p>
<p><strong>2. Output linear depth, prefilter in log space.</strong></p>
<p>You can create a depth map for the light by outputting <em>depth</em> as is, and then perform a gaussian/box/tent prefilter in log space. In this case, the when drawing the light, the ESM filtering is done like so:</p>
<pre>float occluder = tex2D(shadowMap, texCoords);
float lit = exp(c*(occluder - reciever));</pre>
<p>The advantages of this method:</p>
<ul>
<li>Less precision is required to store depth, which now varies linearly. So we only need 16 bits to store depth!</li>
<li>The ESM filtering when drawing the light is slightly faster as we can remove a division.</li>
</ul>
<p>The disadvantages:</p>
<ul>
<li>Hardware bilinear and anisotropic filtering will introduce some error, although it is generally close enough &#8212; the artifact is just a little bit of shadow overdarkening.</li>
<li>Prefiltering must be done in log space, which is slower (see below).</li>
<li>If mipmaps are used, they must be generated using filtering in log space as well.</li>
</ul>
<p>Overall, these two methods represent a tradeoff between memory and ALU, with method 1 requiring more memory and less ALU overall.</p>
<p>So how and why do we filter in log space?</p>
<p>Consider for example that we wish to average two values in the shadow map (like in a 2&#215;2 separable gaussian or box blur). Then we are averaging the two values <em>exp(d1)</em> and <em>exp(d2)</em>.</p>
<p>Using gaussian/box blur weights <em>w1</em> and <em>w2</em>, we then have:</p>
<p style="padding-left: 30px; text-align: center;"><em>w1*exp(d1)</em> + <em>w2*exp(d2)</em></p>
<p style="padding-left: 30px; text-align: center;"><em>exp(d1) * (w1 + w2*exp(d2 &#8211; d1))</em></p>
<p style="padding-left: 30px; text-align: center;"><em>exp(d1) * exp(log(w1 + w2*(exp(d2-d1))))</em></p>
<p style="padding-left: 30px; text-align: center;"><em>exp(d2 + log(w1 + w2*exp(d2-d1)))</em></p>
<p>So now the sum of the two exponentials is written as one exponential. Taking the log of the previous statement, we can perform the averaging of the box/gaussian blur by working on the exponentials argument only:</p>
<p style="text-align: center;"><em>d2 + log(w1 + w2*exp(d2-d1))</em></p>
<p style="text-align: left;">So we filter the arguments of the exponentials, and then go back to exponential space when actually drawing the light, using <em>exp(c*(occluder &#8211; reciever)) </em>as we saw earlier.</p>
<p style="text-align: left;">I&#8217;ve generalized the above reasoning for two arguments to arbitrarily many arguments in the following code that can perform a box blur in log space, using an HLSL pixel shader. To perform a Gaussian blur instead, just replace the constant sample weights and 1.0 with the appropriate Gaussian weights.</p>
<pre>sampler TextureSampler : register(s0);
#define SAMPLE_COUNT 3
float2 Offsets[SAMPLE_COUNT];

float log_space(float w0, float d1, float w1, float d2){
	return (d1 + log(w0 + (w1 * exp(d2 - d1))));
}

float4 Blur(float2 texCoord : TEXCOORD0) : COLOR0
{
	float v, B, B2;
	float w = (1.0/SAMPLE_COUNT);

	B = tex2D(TextureSampler, texCoord + Offsets[0]);
	B2 = tex2D(TextureSampler, texCoord + Offsets[0]);
	v = log_conv(w, B, w, B2);

	for(int i = 2; i &lt; SAMPLE_COUNT; i++)
	{
		B = tex2D(TextureSampler, texCoord + Offsets[i]);
		v = log_conv(1.0, v, w, B);
	}

	return v;
}</pre>
<p>So what does all this extra work get us? The error introduced by filtering linearly instead of in log space was so small in my tests, that I couldn&#8217;t produce a screenshot that clearly demonstrates it.</p>
<p><a href="http://www.gamedev.net/topic/554566-esm-blocky-blur/page__view__findpost__p__4566159">Here is the thread that inspired me to try filtering in log space</a>, and also a clear difference between linear and log filtering is demonstrated.</p>
<p>I found that on the Xbox 360, log filtering didn&#8217;t cost anything extra, as the prefiltering step was texture bandwidth bound anyway &#8212; so I&#8217;ll leave it in place for now.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olhovsky.com/2011/07/exponential-shadow-map-filtering-in-hlsl/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Exponential shadow mapping drawbacks.</title>
		<link>http://www.olhovsky.com/2011/07/exponential-shadow-mapping-drawbacks/</link>
		<comments>http://www.olhovsky.com/2011/07/exponential-shadow-mapping-drawbacks/#comments</comments>
		<pubDate>Tue, 19 Jul 2011 13:10:38 +0000</pubDate>
		<dc:creator>olhovsky</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[XNA]]></category>
		<category><![CDATA[ESM]]></category>
		<category><![CDATA[PCF]]></category>
		<category><![CDATA[shadow mapping]]></category>

		<guid isPermaLink="false">http://www.olhovsky.com/?p=874</guid>
		<description><![CDATA[I created a model that could demonstrate some harder cases for the various shadow mapping filters that I experimented with the other day. This model (which is two copies of a sort of church-like building), casts shadows on itself (where &#8230; <a href="http://www.olhovsky.com/2011/07/exponential-shadow-mapping-drawbacks/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I created a model that could demonstrate some harder cases for the various <a title="Shadow map filtering experimentation." href="http://www.olhovsky.com/2011/07/shadow-map-filtering-experimentation/">shadow mapping filters that I experimented with</a> the other day.</p>
<div id="attachment_875" class="wp-caption aligncenter" style="width: 775px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_churches.png"><img class="size-full wp-image-875" title="july15_churches" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_churches.png" alt="" width="765" height="572" /></a><p class="wp-caption-text">Raytraced render of the new test model.</p></div>
<p>This model (which is two copies of a sort of church-like building), casts shadows on itself (where we will see contact light leaking), and the windows provide various sized gaps in the self shadowing (where we will see ESM&#8217;s oversized shadows drown out the gaps).</p>
<p>&nbsp;</p>
<p>Here is a render of the model(s) in the game engine, viewed from the inside:</p>
<div id="attachment_877" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_esm_back.png"><img class="size-large wp-image-877" title="july15_esm_back" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_esm_back-1024x574.png" alt="" width="640" height="358" /></a><p class="wp-caption-text">1 tap ESM, 1024x1024 shadow map with 2 cascades, 4xMSAA + 3x3 box blur</p></div>
<p>Notice how light leaks through at the corners of the church in the image above.</p>
<p>By rescaling the ESM results further (and throwing away more of the blur), we can reduce this artifact:</p>
<p><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_clamped_esm_back.png"><img class="aligncenter size-large wp-image-878" title="july15_clamped_esm_back" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_clamped_esm_back-1024x574.png" alt="" width="640" height="358" /></a>However this makes the shadows even larger (&#8220;fatter&#8221;), and reduces the size that a window can be before light shows through. (The two small windows are already missing in the shadow.)</p>
<div id="attachment_876" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_clamped_esm_front.png"><img class="size-large wp-image-876" title="july15_clamped_esm_front" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_clamped_esm_front-1024x574.png" alt="" width="640" height="358" /></a><p class="wp-caption-text">1 tap ESM with 1024x1024 shadow map, 2 cascades, 4xMSAA + 3x3 box blur</p></div>
<p>Notice the lack of shadow at the top of the church, near roof edge. Overall I would say that this is not too noticeable. However, this is a best case, where the church is quite large relative to the size of the scene (the ground edge in the background stops just before the depth range ends).</p>
<div id="attachment_879" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_PCF_4_dithered.png"><img class="size-large wp-image-879" title="july15_PCF_4_dithered" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_PCF_4_dithered-1024x574.png" alt="" width="640" height="358" /></a><p class="wp-caption-text">4 tap dithered PCF, 1024x1024 shadow map, 2 cascades.</p></div>
<p>When using 4 tap dithered PCF, there is no leaking and the small windows let some light through. Notice that the large windows on the right also now allow light through. Of course, the shadow edges are not as soft, and the shadows shimmer during movement, and the dithering pattern changes during camera movement.</p>
<div id="attachment_880" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_PCF_4_dithered_front.png"><img class="size-large wp-image-880" title="july15_PCF_4_dithered_front" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_PCF_4_dithered_front-1024x574.png" alt="" width="640" height="358" /></a><p class="wp-caption-text">4 tap dithered PCF</p></div>
<p>With 4 tap dithered PCF, self shadowing is improved. Notice that the roof now begins to self shadow the building. Also notice slight peter-panning in the roof shadows due to the roof offset being large, as slope-scale biasing must add more bias to the roof than other parts of the building (because the roof is at a large angle to the light).</p>
<div id="attachment_881" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_PCF_4_disc_back.png"><img class="size-large wp-image-881" title="july15_PCF_4_disc_back" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_PCF_4_disc_back-1024x574.png" alt="" width="640" height="358" /></a><p class="wp-caption-text">4 tap PCF with 50px sampling area random disc</p></div>
<p>Above the shadows error is unstructured compared to the regular dithering pattern, and the sampling area is larger so the shadows appear softer. This sampling method mostly hides the underlying shadow map, although the dithering pattern changes during camera movement. It&#8217;s far from ideal, but may be better in some cases, where self shadowing is important, and contact light leaking is important. Adding two more cascades would sharpen the shadow edges while completely hiding the underlying shadow map.</p>
<p>Exponential Variance Shadow Mapping also solves these issues at 4x the memory usage, while still producing smooth shadows. I may experiment with EVSM, although I doubt that it is an affordable technique on the Xbox 360 unless significant compromises are made.</p>
<div id="attachment_882" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_PCF_4_disc_front.png"><img class="size-large wp-image-882" title="july15_PCF_4_disc_front" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july15_PCF_4_disc_front-1024x574.png" alt="" width="640" height="358" /></a><p class="wp-caption-text">4 tap PCF with random sampling from 50px area disc </p></div>
<p>In the image above, we can see that there are shadow acne problems due to too little bias while at the same time the top roof shadows are peter-panning due to too much bias. Slope-scale bias is already implemented here, and only surfaces facing the light are drawn. Using more cascades, or a larger shadow map, or a smaller sampling area would solve this problem.</p>
<p>Final decisions on which shadow technique combinations will be used will be made when the scene is more complete, and left over performance is more limited.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olhovsky.com/2011/07/exponential-shadow-mapping-drawbacks/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Cascading shadow mapping and ESM.</title>
		<link>http://www.olhovsky.com/2011/07/adding-cascading-shadow-mapping-to-esm/</link>
		<comments>http://www.olhovsky.com/2011/07/adding-cascading-shadow-mapping-to-esm/#comments</comments>
		<pubDate>Tue, 12 Jul 2011 16:40:24 +0000</pubDate>
		<dc:creator>olhovsky</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[XNA]]></category>
		<category><![CDATA[cascaded shadow maps]]></category>
		<category><![CDATA[CSM]]></category>
		<category><![CDATA[ESM]]></category>
		<category><![CDATA[exponential shadow maps]]></category>
		<category><![CDATA[PSSM]]></category>
		<category><![CDATA[shadows]]></category>

		<guid isPermaLink="false">http://www.olhovsky.com/?p=844</guid>
		<description><![CDATA[I implemented shadow cascades today. Below is a screenshot of the current shadows, using two cascades. Note the artifacts that I&#8217;ve highlighted with green arrows. About 1/3 of the depth range is represented by the first cascade (and 2/3 in &#8230; <a href="http://www.olhovsky.com/2011/07/adding-cascading-shadow-mapping-to-esm/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I implemented shadow cascades today. Below is a screenshot of the current shadows, using two cascades. Note the artifacts that I&#8217;ve highlighted with green arrows.</p>
<div id="attachment_845" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july12_ESM_CSM.png"><img class="size-large wp-image-845" title="july12_ESM_CSM" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july12_ESM_CSM-1024x456.png" alt="" width="640" height="285" /></a><p class="wp-caption-text">1 tap ESM filtering, 2 cascades encoded into a 512x512 shadow map, and 5x5 box blur prefiltering of shadow map. Shadow map is rendered using 4xMSAA.</p></div>
<p>About 1/3 of the depth range is represented by the first cascade (and 2/3 in the second cascade). The difference in shadow size between these two cascades is much bigger than expected.</p>
<p>To reduce the artifacts, we can linearly interpolate between the cascades. <a href="http://msdn.microsoft.com/en-us/library/ee416307(v=vs.85).aspx">This article</a> suggests this method (look at figure 9). A region where blending will be done is marked below in light red.</p>
<div id="attachment_847" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july12_ESM_CSM_blend_region.png"><img class="size-large wp-image-847" title="july12_ESM_CSM_blend_region" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july12_ESM_CSM_blend_region-1024x462.png" alt="" width="640" height="288" /></a><p class="wp-caption-text">Blend region where shadow cascades will be linearly interpolated.</p></div>
<p>I use a conditional shader branch so that samples from both cascades are only taken in the blend region. This means that reducing the blend region <em>in screen space </em>will reduce computation time. Two ways to reduce the blend region are:</p>
<ul>
<li>Reduce the blend region width.</li>
<li>Increase the cascade split distance from the camera.</li>
</ul>
<p>Note that increasing the cascade split distance will also reduce the resolution difference between cascades, and therefore make the cascade discontinuity smaller.</p>
<div id="attachment_848" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july12_ESM_CSM_blending.png"><img class="size-large wp-image-848" title="july12_ESM_CSM_blending" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july12_ESM_CSM_blending-1024x462.png" alt="" width="640" height="288" /></a><p class="wp-caption-text">Blending between shadow cascades is performed on the previously shown blend region. Blending artifacts are highlighted with green arrows.</p></div>
<p>Click the above image to enlarge it. The middle green arrow highlights a particularly bad case, where a very long shadow is in the blend region. Increasing the blend region would make the artifact larger, but less noticeable.</p>
<p>Without modifying the cascades, blending between cascades can result in seams at the blend region:</p>
<div id="attachment_854" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july07_ESM_CSM_seam.png"><img class="size-large wp-image-854" title="july07_ESM_CSM_seam" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/july07_ESM_CSM_seam-1024x456.png" alt="" width="640" height="285" /></a><p class="wp-caption-text">A seam results from selection between both cascades.</p></div>
<p>Thankfully though, these seams can be avoided by making the cascades overlap (i.e their depth bounds should overlap with one another).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olhovsky.com/2011/07/adding-cascading-shadow-mapping-to-esm/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Shadow map filtering experimentation.</title>
		<link>http://www.olhovsky.com/2011/07/shadow-map-filtering-experimentation/</link>
		<comments>http://www.olhovsky.com/2011/07/shadow-map-filtering-experimentation/#comments</comments>
		<pubDate>Sat, 09 Jul 2011 04:40:50 +0000</pubDate>
		<dc:creator>olhovsky</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[XNA]]></category>
		<category><![CDATA[ESM]]></category>
		<category><![CDATA[exponential shadow mapping]]></category>
		<category><![CDATA[PCF]]></category>
		<category><![CDATA[PCSS]]></category>
		<category><![CDATA[percentage closer filtering]]></category>
		<category><![CDATA[percentage closer soft shadows]]></category>
		<category><![CDATA[shadows]]></category>

		<guid isPermaLink="false">http://www.olhovsky.com/?p=813</guid>
		<description><![CDATA[Edit: I demonstrated some harder cases for the important filters here. I also used a brighter light so that it&#8217;s easier to see the shadow edges &#160; Over the past two weeks I&#8217;ve been working on various parts of my &#8230; <a href="http://www.olhovsky.com/2011/07/shadow-map-filtering-experimentation/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>Edit: I demonstrated some harder cases for the important filters <a title="Exponential shadow mapping drawbacks." href="http://www.olhovsky.com/2011/07/exponential-shadow-mapping-drawbacks/">here</a>. I also used a brighter light so that it&#8217;s easier to see the shadow edges <img src='http://www.olhovsky.com/wp/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </strong></p>
<p>&nbsp;</p>
<p>Over the past two weeks I&#8217;ve been working on various parts of my game engine&#8217;s shadow mapping, mostly experimenting with different shadow filtering techniques.</p>
<div id="attachment_815" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/no_shadow_filtering.png"><img class="size-large wp-image-815" title="no_shadow_filtering" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/no_shadow_filtering-1024x459.png" alt="" width="640" height="286" /></a><p class="wp-caption-text">No shadow filtering.</p></div>
<div id="attachment_814" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/no_shadow_filtering_many_shadows.png"><img class="size-large wp-image-814" title="no_shadow_filtering_many_shadows" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/no_shadow_filtering_many_shadows-1024x525.png" alt="" width="640" height="328" /></a><p class="wp-caption-text">No shadow filtering.</p></div>
<p>For reference, above are two images of a simple test scene without any shadow map filtering applied.</p>
<p>I started with basic 4 tap PCF, using a simple averaging of the samples. See this filtering below.</p>
<div id="attachment_817" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/pcf_4_tap_simple_average.png"><img class="size-large wp-image-817" title="pcf_4_tap_simple_average" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/pcf_4_tap_simple_average-1024x389.png" alt="" width="640" height="243" /></a><p class="wp-caption-text">4 tap PCF taking the mean of the samples.</p></div>
<p>Then I moved on to edge-tap smoothing of the 4 tap PCF filtering method:</p>
<div id="attachment_818" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/pcf_4_tap_edge_tap_smoothing.png"><img class="size-large wp-image-818" title="pcf_4_tap_edge_tap_smoothing" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/pcf_4_tap_edge_tap_smoothing-1024x346.png" alt="" width="640" height="216" /></a><p class="wp-caption-text">4 tap PCF with edge tap smoothing (a linear blending between sample values).</p></div>
<p>Then I experimented with using more than 4 samples:</p>
<div id="attachment_819" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/PCF_7_tap_edge_smoothing.png"><img class="size-large wp-image-819" title="PCF_7_tap_edge_smoothing" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/PCF_7_tap_edge_smoothing-1024x454.png" alt="" width="640" height="283" /></a><p class="wp-caption-text">7 tap PCF with edge tap smoothing.</p></div>
<p>Below is an implementation of the dithering idea in the article <a href="http://http.developer.nvidia.com/GPUGems/gpugems_ch11.html">Shadow Map Antialiasing</a> from GPU Gems 1.</p>
<div id="attachment_816" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/pcf_4_tap_dithered.png"><img class="size-large wp-image-816" title="pcf_4_tap_dithered" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/pcf_4_tap_dithered-1024x396.png" alt="" width="640" height="247" /></a><p class="wp-caption-text">4 tap PCF with screenspace dithering.</p></div>
<p>The GPU Gems 1 method above selects shadow sampling positions based on the fragment&#8217;s current screenspace coordinates. So I tried rotating the 4 samples based on the current fragment position (in screen space) instead. See the image below.</p>
<div id="attachment_820" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/pcf_4_tap_screenspace_rotated_cross.png"><img class="size-large wp-image-820" title="pcf_4_tap_screenspace_rotated_cross" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/pcf_4_tap_screenspace_rotated_cross-1024x449.png" alt="" width="640" height="280" /></a><p class="wp-caption-text">4 tap PCF with screenspace based rotations.</p></div>
<div id="attachment_822" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/PCF_image_space_disc_8_tap.png"><img class="size-large wp-image-822" title="PCF_image_space_disc_8_tap" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/PCF_image_space_disc_8_tap-1024x426.png" alt="" width="640" height="266" /></a><p class="wp-caption-text">8 tap PCF using rotated poisson disc with screenspace based rotations.</p></div>
<p>I also tried doing the rotations based on object space instead:</p>
<div id="attachment_821" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/PCF_object_space_disc.png"><img class="size-large wp-image-821" title="PCF_object_space_disc" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/PCF_object_space_disc-1024x427.png" alt="" width="640" height="266" /></a><p class="wp-caption-text">4 tap PCF with object space based rotations.</p></div>
<p>Then I tried using a small lookup texture (32&#215;32) with random 2&#215;2 matrix rotations, used to offset the 4 samples in random directions based on a screenspace sampling of the random noise texture. See below.</p>
<div id="attachment_824" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/pcf_4_tap_random_202_area.png"><img class="size-large wp-image-824" title="pcf_4_tap_random_202_area" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/pcf_4_tap_random_202_area-1024x412.png" alt="" width="640" height="257" /></a><p class="wp-caption-text">4 tap PCF with 202 pixel sampling area, sampling chosen by taking rotations from a lookup texture. </p></div>
<div id="attachment_825" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/pcf_4_tap_random_50_area.png"><img class="size-large wp-image-825" title="pcf_4_tap_random_50_area" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/pcf_4_tap_random_50_area-1024x375.png" alt="" width="640" height="234" /></a><p class="wp-caption-text">4 tap PCF with 50 pixel sampling area, sampling chosen by taking rotations from a lookup texture. </p></div>
<div id="attachment_826" class="wp-caption aligncenter" style="width: 1006px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/pcf_4_tap_random_50_area_many_shadows.png"><img class="size-full wp-image-826" title="pcf_4_tap_random_50_area_many_shadows" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/pcf_4_tap_random_50_area_many_shadows.png" alt="" width="996" height="535" /></a><p class="wp-caption-text">4 tap PCF with 50 pixel sampling area, sampling chosen by taking rotations from a lookup texture. </p></div>
<p>It&#8217;s worth noting that even though the last two methods shown are the same and both use 4 samples, the first method chooses samples from a larger area than the second. This makes the penumbra larger, hides the underlying shadow map better, but also costs more texture bandwidth due to worse cache coherency. The 202 sampling area PCF requires a large bias, to the point where even with slope-scale biasing, I couldn&#8217;t choose a small enough bias to avoid shadow peter-panning completely.</p>
<p>The last method shown actually produces pretty good looking shadows, and is quite efficient. It also allows for an easily modifiable sampling area, which could be very useful &#8212; more on this later.</p>
<p>None of these filtering methods produced particularly smooth looking static shadows. Except for the last method displayed, all of these shadows &#8220;shimmer&#8221; horribly when the camera or light source moves. The rotated disc method shadows also shimmer, but the effect is not nearly as bad as the other PCF sampling methods. And even though they are all pretty fast, they require fairly large shadow maps. All of the images so far have used 2048&#215;2048 shadow maps.</p>
<p>Next I started implementing exponential shadow maps (ESMs). These allow us to blur the shadow map or otherwise filter it before drawing the light, and then we can take a single sample from the shadow map, using an approximation of the shadow penumbra. This method avoids biasing problems that large penumbra PCF sampling has. It also requires less texture bandwidth.</p>
<p>I started by using 2xMSAA when creating the shadow map (which effectively blurs the shadow map edges). A 2048&#215;2048 shadow map is used, and 2xMSAA doubles the size of the texture at creation time. This makes this texture 32MB in total, which is far larger than the Xbox 360&#8242;s 10MB eDRAM, and therefore far too large to be practical on the Xbox 360. A 5&#215;5 seperable gaussian blur is also applied to the shadow map before drawing the light.</p>
<div id="attachment_829" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/ESM_1_tap_2048_5tap_gaussian.png"><img class="size-large wp-image-829" title="ESM_1_tap_2048_5tap_gaussian" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/ESM_1_tap_2048_5tap_gaussian-1024x390.png" alt="" width="640" height="243" /></a><p class="wp-caption-text">1 tap ESM with a small shadow map and 4xMSAA + 5x5 gaussian blur.</p></div>
<p>I also tried taking 4 samples of the shadow map, creating the shadow map with 4xMSAA, and not even bothering with a prefiltering step (so there is no gaussian blurring of the shadow map). This is potentially a faster method, and is the method that I am currently using for point light shadows (which are less dominant lights than the directional light shown in all of the screenshots so far).</p>
<div id="attachment_828" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/ESM_2048_4xMSAA_4taps.png"><img class="size-large wp-image-828" title="ESM_2048_4xMSAA_4taps" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/ESM_2048_4xMSAA_4taps-1024x426.png" alt="" width="640" height="266" /></a><p class="wp-caption-text">4 tap ESM filtering with 4xMSAA used when creating shadow map. No shadow map prefiltering. </p></div>
<p>Since the results of the ESM with a gaussian blur prefilter are quite good, I started reducing the shadow map size. Below is a 512&#215;512 shadow map with 4xMSAA (which means that the texture easily fits into the XBox EDRAM and therefore 4xMSAA incurs very little texture bandwidth cost). A 5&#215;5 separable gaussian blur is applied so that a single sample is enough to produce a smooth penumbra.</p>
<div id="attachment_827" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/esm_1_tap_512_4xMSAA_5x5_gaussian.png"><img class="size-large wp-image-827" title="esm_1_tap_512_4xMSAA_5x5_gaussian" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/07/esm_1_tap_512_4xMSAA_5x5_gaussian-1024x386.png" alt="" width="640" height="241" /></a><p class="wp-caption-text">1 tap ESM filtering, 2xMSAA + 5x5 gaussian prefiltering of shadow map.</p></div>
<p>This technique is fast and looks good. One problem with it though, is that the penumbra size is fixed, which is not realistic. In real life, light is scattered around, and light sources are not ideal points or directional lights. Percentage closer soft shadow (PCSS) approximate real life shadow penumbras by determining where the penumbra should be sharper or softer. Shadow penumbras are typically sharper close to the occluding object, and softer further from the object &#8212; instead of soft everywhere like in the image above.</p>
<p>The problem with a pre-filtered shadow map using one ESM tap, is that you cannot change the size of the penumbra by expanding the sampling kernel size (as there is only 1 sample) and the shadow map prefiltering used a fixed size kernel (a 5&#215;5 Gaussian in this case). This is something that is easier with PCF, but as we saw, PCF does not look good with only a few samples. PCSS can be expensive, and is only an approximation &#8212; and not a very good one when there are multiple layered occluders. So this is a problem that I&#8217;ll defer until later. Possible solutions include varying the ESM exponent to vary the penumbra size, or combining PCF sampling techniques shown earlier with ESM to vary the kernel size.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olhovsky.com/2011/07/shadow-map-filtering-experimentation/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>HDRBlendable Xbox 360 performance and dual paraboloid point light shadow optimization.</title>
		<link>http://www.olhovsky.com/2011/06/hdrblendable-xbox-360-performance/</link>
		<comments>http://www.olhovsky.com/2011/06/hdrblendable-xbox-360-performance/#comments</comments>
		<pubDate>Thu, 09 Jun 2011 03:34:23 +0000</pubDate>
		<dc:creator>olhovsky</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[XNA]]></category>
		<category><![CDATA[deferred]]></category>
		<category><![CDATA[lighting]]></category>
		<category><![CDATA[shadows]]></category>

		<guid isPermaLink="false">http://www.olhovsky.com/?p=788</guid>
		<description><![CDATA[While working on dual paraboloid point light shadows, at first I used two R32 textures for the shadow maps, one map per hemisphere of the point light. Here&#8217;s a quick glimpse at what point light shadows are currently looking like &#8230; <a href="http://www.olhovsky.com/2011/06/hdrblendable-xbox-360-performance/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>While working on dual paraboloid point light shadows, at first I used two R32 textures for the shadow maps, one map per hemisphere of the point light.</p>
<p>Here&#8217;s a quick glimpse at what point light shadows are currently looking like with a deferred rendering setup, using a small 7 tap PCF filter and a 512&#215;512 shadow map. The model used is <a href="http://www.crytek.com/cryengine/cryengine3/downloads">Crytek&#8217;s Sponza model</a>.</p>
<div id="attachment_791" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/06/june_08_point_light_shadows.jpg"><img class="size-medium wp-image-791" title="june_08_point_light_shadows" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/06/june_08_point_light_shadows-300x186.jpg" alt="" width="300" height="186" /></a><p class="wp-caption-text">Point light shadows using dual paraboloid shadow mapping.</p></div>
<p>I wanted to save some texture bandwidth, so I tried to put one shadow map in the R channel of a R16G16 texture, and then the other shadow map in the G channel. However, you cannot use ColorWriteChannels on a non-blendable texture format in XNA (and HalfSingle textures are not blendable).</p>
<p>So I tried using two channels of a A2R10G10B10 texture instead (i.e. HDRBlendable on the Xbox). This worked, except that it didn&#8217;t give me the bandwidth savings I was hoping for.  It turns out that on the xbox, this format is <strong>stored as 32bpp in the EDRAM, but actually resolves to HalfVector4 (R16G16B16A16), 64bpp, when copying to system memory</strong>. So I was using the same amount of texture bandwidth in the end anyway, because the HDRBlendable format expands to twice the size on the xbox on resolve.</p>
<p>Instead, I&#8217;m experimenting with packing two float values into 32bpp 8bit/channel texture. The cons are:</p>
<ul>
<li>Packing/unpacking costs time in the pixel shader.</li>
<li>Must switch blend state to mask the color channel when switching to drawing the next shadow map.</li>
<li>Lower precision (although this is not a problem at all if the point lights are small enough, and the fixed decimal point place is chosen well).</li>
</ul>
<p>The pros are:</p>
<ul>
<li>Half the texture bandwidth used to resolve the shadow map from the EDRAM to system memory.</li>
<li>Avoid having to switch render targets when drawing the next shadow map.</li>
<li>Save a little bit of texture bandwidth when reading from the shadow map (when drawing the light), because some lookups will use both channels at the same time. Mostly the R channel will have totally separate lookup locations from the G channel since the channels represent opposite hemispheres, but in some places (along the seams?) caching will save a little bit of bandwidth.</li>
</ul>
<p>Whether using two textures or one texture with packing is better depends mostly on how much of a bottleneck texture bandwidth is vs fill rate.</p>
<p>There is yet another option.</p>
<p>For small shadow casting point lights, 8 bits of depth precision might be enough. Then up to four shadow maps can be stored in a single texture, without any extra encoding ALU costs. This might allow for many small shadow casting point lights.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olhovsky.com/2011/06/hdrblendable-xbox-360-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Deferred lighting.</title>
		<link>http://www.olhovsky.com/2011/05/deferred-lighting/</link>
		<comments>http://www.olhovsky.com/2011/05/deferred-lighting/#comments</comments>
		<pubDate>Tue, 31 May 2011 15:34:46 +0000</pubDate>
		<dc:creator>olhovsky</dc:creator>
				<category><![CDATA[XNA]]></category>
		<category><![CDATA[deferred]]></category>
		<category><![CDATA[light pre pass]]></category>
		<category><![CDATA[lighting]]></category>

		<guid isPermaLink="false">http://www.olhovsky.com/?p=761</guid>
		<description><![CDATA[My forward rendered lighting setup almost suited my purposes, but it turned out that the limitations of the XBOX CPU limited me to far too few terrain chunks, and therefore far too few lights. To this end, I&#8217;ve been exploring &#8230; <a href="http://www.olhovsky.com/2011/05/deferred-lighting/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>My <a title="Experimenting with a darker lighting setup." href="http://www.olhovsky.com/2011/04/some-point-light-testing/">forward rendered lighting setup</a> almost suited my purposes, but it turned out that the limitations of the XBOX CPU limited me to far too few terrain chunks, and therefore far <a title="Status update: Point light stress testing." href="http://www.olhovsky.com/2011/05/status-update-point-light-stress-testing/">too few lights</a>.</p>
<div id="attachment_655" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/05/april01_point_lights2.png"><img class="size-medium wp-image-655" title="april01_point_lights2" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/05/april01_point_lights2-300x175.png" alt="" width="300" height="175" /></a><p class="wp-caption-text">Maximum of 7 lights touching each chunk in the terrain shader, but chunks were far too large due to CPU limitations on the XBOX.</p></div>
<p>To this end, I&#8217;ve been exploring some deferred lighting options.</p>
<p>For the Xbox 360, an attractive option is light-pre-pass, which suits the Xbox much moreso than a fully deferred setup, as LPP uses less intermediate texture buffering, and therefore avoids Xbox 360 texture bandwidth limitations, and avoids predicated tiling issues.</p>
<p>J. Coluna has a good LPP example, and <a href="http://jcoluna.wordpress.com/2011/04/27/xna-4-0-light-pre-pass-casting-shadows/#comment-98">he was wondering</a> how to deal with the problem of being limited to 32 bit render target formats on the Xbox 360, yet wanting to store specular light data in the light buffer (for which I use HDRBlendable, on the xbox this format stores 10 bits for RGB and 2 bits for alpha).</p>
<div>
<p>If we had a 64 bit rendertarget, we&#8217;d have 16 bits for each channel, and (monochrome) specular color could be stored in the alpha.</p>
<p>On the xbox we have only 2 bits in the alpha.</p>
<p>I am currently not using a dedicated specular channel. Instead I&#8217;m accumulating  specular in the light buffer RGB channels. This means that you can’t use  a specular strength map, and it means that you can more easily exceed  the HDR “overbright” limits, since specular highlights are often close  to full brightness (or at least, they are always brighter than the diffuse color). However, it does allow you to have colored specular  highlights, which a single specular channel does not.</p>
<p>One idea I’ve been considering to work around this, is to pack the  specular into the 1010102 format using the 2 bit alpha, plus 1 bit in R  and 1 bit in B. Then, I would apply half of the specular to the RGB  channels, and store the other half in RBA (4 packed bits total).</p>
<p>Then your specular strength maps can at most reduce specular highlights by half.</p>
<p>However, if we are willing to sacrifice some range on the specular  highlights, and use only 6 bits for specular, we can apply 25% of the  specular to the RGB channels, and then store the remaining 75% of the  specular packed in RBA (4 bits worth). This would allow specular  strength maps to reduce specular highlights by up to 75%, which is  probably enough for decent results.</p>
<p>One drawback to this method is that we lose 1 bit of red and 1 bit of  blue channel information. This means that for red and blue, lights can  more easily reach the HDR overbright limit.</p>
<p>Which method suits your purpose depends on your application.</p>
<p>Part of the benefit of the LPP method, is that we can avoid much of  the texture bandwidth requirements that a full defferred setup requires —  making this a better fit for the XBOX. For this reason, I’ll be  avoiding using a second texture to store specular.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.olhovsky.com/2011/05/deferred-lighting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8220;Game3&#8243; post-mortem. (Or how I learned to stop worrying and love development outside of the DBP competition.)</title>
		<link>http://www.olhovsky.com/2011/05/game3-post-mortem-or-how-i-learned-to-stop-worrying-and-love-development-outside-of-the-dbp-competition/</link>
		<comments>http://www.olhovsky.com/2011/05/game3-post-mortem-or-how-i-learned-to-stop-worrying-and-love-development-outside-of-the-dbp-competition/#comments</comments>
		<pubDate>Thu, 26 May 2011 09:54:05 +0000</pubDate>
		<dc:creator>olhovsky</dc:creator>
				<category><![CDATA[DBP2011]]></category>
		<category><![CDATA[XNA]]></category>
		<category><![CDATA[dream build play]]></category>

		<guid isPermaLink="false">http://www.olhovsky.com/?p=742</guid>
		<description><![CDATA[This game was called &#8220;Game3&#8243; just because this is the 3rd XNA project I&#8217;ve opened on this computer, and that was the default name. I&#8217;ve put in somewhere between 500-900 hours into this project (schoolwork being mixed in there makes &#8230; <a href="http://www.olhovsky.com/2011/05/game3-post-mortem-or-how-i-learned-to-stop-worrying-and-love-development-outside-of-the-dbp-competition/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This game was called &#8220;Game3&#8243; just because this is the 3rd XNA project I&#8217;ve opened on this computer, and that was the default name.</p>
<p>I&#8217;ve put in somewhere between 500-900 hours into this project (schoolwork being mixed in there makes it hard to estimate), and I think that I&#8217;ve been very productive in that time. The project is now 8300 lines of code, in about 187 class files.</p>
<p>So, why didn&#8217;t I finish in time for <a href="http://www.dreambuildplay.com/">DBP</a>? Here are some of the mistakes I made:</p>
<ul>
<li>Overambitious project goals.</li>
<li>Picked wrong light rendering design for the design of this game.</li>
<li>Underestimated the complexity of implementing some design ideas, such as the terrain, and the lighting set up.</li>
<li>Focused too much on building clean, pretty code, instead of hacking together a finished game.</li>
<li>Spent too much time building the game engine, and too little time building the actual game logic.</li>
<li>Spent two weeks building <a title="The physics engine (made from scratch!)." href="http://www.olhovsky.com/2011/05/physics-engine/">my own physics engine</a> from scratch, instead of using an existing engine.</li>
</ul>
<p>Here&#8217;s a composite of a few screenshots of my Git commit history:</p>
<div id="attachment_744" class="wp-caption aligncenter" style="width: 81px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/05/may26_commit_history.png"><img class="size-medium wp-image-744" title="may26_commit_history" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/05/may26_commit_history-71x300.png" alt="" width="71" height="300" /></a><p class="wp-caption-text"> (Click to enlarge.)</p></div>
<p>I tried to choose a game concept that was as simple as possible, while still maintaining some of the key elements from my 100+ page outline of a much more complex game.</p>
<p>That said, I was still far too ambitious for a 3 month project. I think that any 3D game that can be competitive with a 2D game is going to take a lot of extra time to polish well. For example, I didn&#8217;t realize how much time triplanar normal mapping would take to get right. It&#8217;s simple in theory, but getting the slope angles just right, getting the transitions between texturing planes just right, mapping the normal maps onto the correct plane, etc, etc, all took a lot of extra time.</p>
<p>Before deciding to do this project in first person 3D, I read that many people avoid indie 3D games because creating the artwork takes so much longer (or they can&#8217;t do it themselves). As one example, it took me about 45 minutes to create the <a title="First enemy: Doll." href="http://www.olhovsky.com/2011/04/first-enemy-doll/">doll enemy</a> model (pretty quick, no?). I figured, well, I can create a 3D model of an enemy in ~45 minutes, so 3D artwork is not a big deal &#8212; I&#8217;ll just go make the next <a href="http://en.wikipedia.org/wiki/Halo:_Reach">Halo</a> now!</p>
<p>In reality, the artwork is no big deal. It&#8217;s really everything but the artwork that makes 3D much more work than 2D, especially from a first person perspective. When I say &#8220;artwork&#8221; I mean 3D models, textures, texture mapping, animations.</p>
<p>In a first person perspective, you can see thousands of meters into the distance, but also if the player looks straight down at the ground, since only a 1-2 meters are viewable, we expect to see a great deal of detail. Managing detail at close and far distances well is an order of magnitude harder than managing detail in a 2D game. In a 2D game, or even in a 3D game with a relatively fixed camera perspective (e.g. Starcraft or Diablo), you know how much geometry can be viewable in the worst case, and there&#8217;s much less detail scaling needed.</p>
<p>As an example of detail scaling, getting terrain detail to scale well in this game was done over the course of many weeks. Indeed, it&#8217;s still not perfect. If you&#8217;ve tried your hand at realtime 3D drawing, then you&#8217;ve probably done a terrain. However, you may not have encountered the mess of fading textures and specular/normal/emmissive maps between detail levels. I used geometry instancing for the distant chunks, which meant no lights in the distance in my lighting model. Things like this added up quickly.</p>
<p>I thought a bit about what the title of this post should be. A reference to <a href="http://en.wikipedia.org/wiki/Duke_Nukem_Forever#Development">Duke Nukem Forever</a> felt appropriate at first, until I realized that it&#8217;s probably not fair to compare my 3 months of development to 14 years of off-and-on DNF development. Hell, even if I am writing DNF-level vaporware, at least Duke Nukem Forever is finally being released, right?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olhovsky.com/2011/05/game3-post-mortem-or-how-i-learned-to-stop-worrying-and-love-development-outside-of-the-dbp-competition/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>My own physics or BEPU?</title>
		<link>http://www.olhovsky.com/2011/05/my-own-physics-or-bepu/</link>
		<comments>http://www.olhovsky.com/2011/05/my-own-physics-or-bepu/#comments</comments>
		<pubDate>Sun, 22 May 2011 13:26:24 +0000</pubDate>
		<dc:creator>olhovsky</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[DBP2011]]></category>
		<category><![CDATA[XNA]]></category>

		<guid isPermaLink="false">http://www.olhovsky.com/?p=719</guid>
		<description><![CDATA[Norbo (the creator of the BEPU physics engine) dropped by the day before last to let me know that I should expect much higher performance than what I was seeing in my BEPU tests. He suggested that I should be &#8230; <a href="http://www.olhovsky.com/2011/05/my-own-physics-or-bepu/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Norbo (the creator of the <a href="http://bepu.squarespace.com/">BEPU physics engine</a>) <a title="Physics engine integrated, and sphere-terrain collision done." href="http://www.olhovsky.com/2011/05/physics-engine-integrated-and-sphere-terrain-collision-done/">dropped by the day before last</a> to let me know that I should expect much higher performance than what I was seeing in my BEPU tests. He suggested that I should be seeing about 25ms per frame on the XBOX, to simulate 400 spheres colliding against a terrain. I thought it took more like 150ms.</p>
<div id="attachment_720" class="wp-caption aligncenter" style="width: 282px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/05/Futurama_morbo.jpg"><img class="size-full wp-image-720" title="Futurama's Morbo" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/05/Futurama_morbo.jpg" alt="" width="272" height="307" /></a><p class="wp-caption-text">This is Morbo, but that&#39;s who I think of whenever I see Norbo&#39;s name. Also I like pictures in my posts.</p></div>
<p>&nbsp;</p>
<p>The following code creates 392 spheres. I replaced the code in the BEPU terrain demo that adds boxes to the terrain with this code, to do a quick benchmark of spheres on a terrain.</p>
<pre class="brush:c#">Random rand = new Random();

for (int i = 0; i &lt; 7; i++)
{
    for (int j = 0; j &lt; 7; j++)
    {
        for (int k = 0; k &lt; 8; k++)
        {
            Vector3 position =
                new Vector3((float) rand.NextDouble()*10 + i*128,
                    400 + -j*2,
                    (float) rand.NextDouble()*10 + k*128);
            float radius = (float) rand.NextDouble() + 1;
            Space.Add(new Sphere(position,
                radius,
                radius * radius * radius));
        }
    }
}</pre>
<p>First, I want to point out that BEPU is an awesome project, and is generally very efficient from what I&#8217;ve been able to measure.</p>
<p>My physics simulation is much, much more limited in scope compare to BEPU. I don&#8217;t support any type of constraints, and I don&#8217;t have accurate restitution or friction.</p>
<p>In the BEPU terrain demo, on my E5300 2.6Ghz, overclocked to 3.0Ghz, on a single thread, the 392 previously mentioned spheres start the simulation updates off at about 1-2ms (when all spheres are in the air) until all of the spheres hit the terrain at which point the simulation takes 7-17ms per update, until all of the spheres reach a resting (inactive) state, where the simulation takes 2ms per update.</p>
<p>On the XBOX 360, this same simulation starts at 7ms per update, and then as soon as the spheres hit the terrain, the simulation jumps to 60ms per update, and steadily increases to well over 100ms per update over the course of the next 1-2 minutes, until it eventually starts to speed back up as spheres start to become inactive.</p>
<p><strong>Edit:</strong> Testing with a much flatter terrain resulted in update times of around 30ms per frame with the default setup. Excellent! BEPU totally wins then.</p>
<p>Under similar conditions in my game (so the terrain is a different shape), my simulation takes 30ms per update on the XBOX 360, when all 400 spheres are in the air, falling at high speed under gravity. It <em>starts</em> slow compared to BEPU because I always sphere-cast spheres against the terrain no matter what, since they usually will intersect the terrain in the case of this game. The simulation then continues to about 35ms when the spheres hit the terrain, and updates continue to take about 30-35ms as the spheres eventually slow down.</p>
<p>Edit: some simple optimizations reduced this time to 21ms.</p>
<p>Of this 35 ms, the breakdown is:</p>
<ul>
<li> The sweep and prune broadphase takes 3ms per update.</li>
<li>The sphere-terrain contact point detection/generation takes 7ms per  update, and the sphere-sphere contact point generation takes 23ms per  update.</li>
<li>The collision solver takes a total of about 3ms</li>
</ul>
<p>Implementing a resting/inactive state would greatly reduce the 31ms of contact point generation after 30-60 seconds into the simulation, as most of the spheres would no longer need to generate a new contact points unless they started moving again. Implementing this optimization is not a priority right now, but I&#8217;m very curious to see what can be done with that.</p>
<p>In my simulation, 7 contact points (this is somewhat arbitrary) are generated for every sphere with the terrain no matter what position and velocity the sphere has. This is a place for some optimization if desired.</p>
<p>I generate a completely new set of contacts every single update. I think BEPU might persist contacts across multiple frames. I&#8217;m not sure if it&#8217;s practical for me to do this or not &#8212; I haven&#8217;t given it much thought yet. I&#8217;m also not sure how this affects BEPU performance.</p>
<div id="attachment_727" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/05/may22_physics_benchmark.png"><img class="size-large wp-image-727" title="may22_physics_benchmark" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/05/may22_physics_benchmark-1024x576.png" alt="" width="640" height="360" /></a><p class="wp-caption-text">A screenshot of the physics simulation running on the XBOX 360, while the white spheres are still rolling around on the terrain.</p></div>
<p>In the end, the difference between BEPU and my physics for this somewhat unscientific preliminary comparison seems to indicate that BEPU is about 3x slower for 400 active spheres colliding with a terrain at not very high speed. (In my simulation ~1m radius spheres travelled at speeds in the range of 1m/s &#8211; 20m/s).</p>
<p>Again, my simulation is limited in what it can simulate compared to BEPU, but it looks like in the special case of my game, where I want many spheres with only sphere-sphere and sphere-terrain contact, and I don&#8217;t care about accurate rotation and friction simulation, it looks like my physics simulator might make more sense.</p>
<p>I&#8217;m going to do some more tests, and get in touch with Norbo to see if there&#8217;s something I&#8217;m missing here, as I&#8217;d prefer to use the much more flexible BEPU if I can.</p>
<p><del datetime="2011-05-22T15:10:52+00:00">If I had to take sides right this moment though, then I&#8217;d say that it looks like I&#8217;m stuck with my own physics simulator.</del></p>
<p>Edit: BEPU may have won. I might be able to get a slight performance improvement over BEPU with a lot of effort, but BEPU&#8217;s performance is excellent under the new test conditions, and has way more functionality than I have implemented. The only remaining question mark is the large difference in speed between different test conditions in BEPU (placement of spheres and terrain shape), where in my simulation, things are pretty a constant 30ms per update (on XBOX) across the board for 400 spheres with the same tests used with BEPU.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olhovsky.com/2011/05/my-own-physics-or-bepu/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Physics engine integrated, and sphere-terrain collision done.</title>
		<link>http://www.olhovsky.com/2011/05/physics-engine-integrated-and-sphere-terrain-collision-done/</link>
		<comments>http://www.olhovsky.com/2011/05/physics-engine-integrated-and-sphere-terrain-collision-done/#comments</comments>
		<pubDate>Tue, 17 May 2011 09:45:11 +0000</pubDate>
		<dc:creator>olhovsky</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[DBP2011]]></category>
		<category><![CDATA[XNA]]></category>
		<category><![CDATA[dream build play]]></category>
		<category><![CDATA[physics]]></category>

		<guid isPermaLink="false">http://www.olhovsky.com/?p=706</guid>
		<description><![CDATA[Terrain collision was a little harder than I thought it would be. This is owing to the fact that the terrain is a concave body, and many contacts are possible with a single sphere and the terrain. Contrast this with &#8230; <a href="http://www.olhovsky.com/2011/05/physics-engine-integrated-and-sphere-terrain-collision-done/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Terrain collision was a little harder than I thought it would be. This is owing to the fact that the terrain is a concave body, and many contacts are possible with a single sphere and the terrain. Contrast this with two convex bodies (which many physics engines use exclusively) where only a single contact point is possible, and that point is also the closest point between the two objects.</p>
<p>Anyways, terrain-sphere collision works now, and the new terrain engine is integrated into the game. Simulating 400 spheres on the PC takes only ~3ms per update (single threaded). This should make 400 spheres viable on the XBOX 360, which will have about 33ms per update on a CPU core dedicated to physics. This is about a 5-10x speed increase over the performace I was seeing with the default BEPU physics setup.</p>
<div id="attachment_711" class="wp-caption aligncenter" style="width: 650px"><a href="http://www.olhovsky.com/wp/wp-content/uploads/2011/05/may17_physics_integrated1.png"><img class="size-large wp-image-711" title="may17_physics_integrated" src="http://www.olhovsky.com/wp/wp-content/uploads/2011/05/may17_physics_integrated1-1024x598.png" alt="Sphere-terrain collision." width="640" height="373" /></a><p class="wp-caption-text">Sphere-terrain collision.</p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.olhovsky.com/2011/05/physics-engine-integrated-and-sphere-terrain-collision-done/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Summer classes start today.</title>
		<link>http://www.olhovsky.com/2011/05/summer-classes-start-today/</link>
		<comments>http://www.olhovsky.com/2011/05/summer-classes-start-today/#comments</comments>
		<pubDate>Mon, 16 May 2011 01:05:29 +0000</pubDate>
		<dc:creator>olhovsky</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.olhovsky.com/?p=685</guid>
		<description><![CDATA[Well, I have class again, which means I have to reduce the work on this project slightly. Hopefully there won&#8217;t be any serious tests until a week or two after the competition deadline.]]></description>
			<content:encoded><![CDATA[<p>Well, I have class again, which means I have to reduce the work on this project slightly. Hopefully there won&#8217;t be any serious tests until a week or two after the competition deadline.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.olhovsky.com/2011/05/summer-classes-start-today/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

