Per vertex ambient occlusion

Dynasty · **Joined:** Tue Feb 05, 2013 5:44 pm **Posts:** 33

I attempted to implement ambient occlusion on a per vertex basis for Ace of Spades maps. Unfortunately, it seems that the current raycasting is just too slow to be of much use. Ace of Spades maps can range from 2.8 - 5.7 million vertices.

I attempted to cast rays at every vertex position. The algorithm I used to determine which directions to cast rays is described in http://www.xsi-blog.com/archives/115

Picture of points distributed evenly on a unit hemisphere generated by the algorithm:

I modified it like this so that it works on a hemisphere:

Code:

def pointsOnHemisphere(N):
    old_N = N
    N = 2 * N
    pts = []
    inc = math.pi * (3 - math.sqrt(5))
    off = 2 / float(N)
    for k in range(old_N, N):
        y = k * off - 1 + (off / 2)
        r = math.sqrt(1 - y*y)
        phi = k * inc
        pts.append([math.cos(phi)*r, y, math.sin(phi)*r])
    return pts

So my final conclusion is that AO is just too slow to be used on large maps. Of course, my C++ skills are currently lacking, and there is probably a lot of room for optimization. If anyone else wants to take a stab at the problem, please go ahead.

David Williams · **Joined:** Sun May 04, 2008 6:35 pm **Posts:** 1827

We have a 'TestRaycast' unit test in PolyVox so I just did some very quick calculations. It seems PolyVox can perform something in the order of 1 million ray casts per second. You have (say) 3 million vertices and I reckon you want to cast approx 100 rays for each. So that's 300 million rays = 300 seconds = 5 minutes.

So yes, it does seem like this approach is too slow for your needs. Threading might help as might optimisations to the raycast but probably not enough.

A different algorithm then? AO changes slowly, so what if you only cast 10 rays per vertex but then average together the results of neighbouring vertices? Or maybe 100 rays per vertex was too much anyway?

One very fake approach I have thought about in the past is to threshold the volume so that all voxels are just solid (black) or empty (white), reduce the resolution, and then blur it. This should result in darker voxels near the walls and lighter ones further away. No physical basis though.

Otherwise you could try just casting a one (or very few) rays per voxel in an upwards direction. This would give you shadows at least but not real AO.

So yeah... I don't have any firm answers here as AO is an area which needs some research. We didn't get it into Voxeliens but we'll come back to it for Cubiquity I think.

Dynasty · **Joined:** Tue Feb 05, 2013 5:44 pm **Posts:** 33

I briefly read through some of the material related to ambient occlusion and cone tracing. I don't really see how implementing AO on the CPU is viable.

You've said before that the current AO occlusion algorithm is inefficient. How can it be made better? I don't really see any way this can be done in reasonable amount of time on many different types of hardware.

Dynasty · **Joined:** Tue Feb 05, 2013 5:44 pm **Posts:** 33

I've been thinking about this problem a lot and I think that I have an elegant solution for per-face ambient occlusion.

Here is an illustration of the approach in 2D. It can be pretty easily extended to 3D.

Create a hemisphere radiating outward from the face you want to test. In the picture above, the pink dot is the voxel being tested. The black voxels are solid.

Let the number of blue "surface" voxels be s.

Project the solid voxels onto the surface of the hemisphere. The orange voxels are those which fall into the "shadow" of the solid black voxels. The important thing to realize is that for every voxel in the green area, we pre-compute which surface voxels will be marked as orange.

Let the number of orange "shadow" voxels be r.

Then the ambient occlusion multiplier is 1-(r/s)

Using the 2D example above, the ambient occlusion multiplier is 1-(6/41)=0.85 which means that 85% of ambient light will reach that face.

I'm not sure how fast this algorithm will be compared to the current one implemented.

Edit: per-vertex AO could be possibly implemented by subdividing the grid. Also if you have an suggestions on how to determine which surface voxels fall into the shadow that would be awesome.

Edit 2: This is most likely a numerical approximation to an integral that happens to work very well with voxel data. The more subdivisions made, the more accurate the result will be. Only one subdivision should be needed to achieve per-vertex AO!

Edit 3: Also the current architecture of CubicSurfaceExtracter, etc is not particularly conducive to implementing this algorithm.

Dynasty · **Joined:** Tue Feb 05, 2013 5:44 pm **Posts:** 33

The algorithm I discussed previously probably has no advantage over the ray casting method. The primary bottleneck in the current algorithm is the large number of reads/writes made to memory. In the current implementation, anytime the raycaster wants to know if a voxel is solid or not it needs to pass an object to a callback, which in turn needs to get the pointer to the SimpleVolume, and then needs to get the specific voxel object, which calls another function which checks if the block is solid, and then even that block may call another function....

To reduce memory requests I suggest that a boolean array be created which marks with a simple true and false whether or not a block is solid. Hopefully parts of this array will be stored in the CPU cache.

Also the path of all the rays needs to be computed only once. This means that all of those floating point operations in the AO calculator can essentially be eliminated. Hopefully these arrays of relative points will also be stored in the CPU cache.

Dynasty · **Joined:** Tue Feb 05, 2013 5:44 pm **Posts:** 33

Relevant discussion on Build and Shoot: http://www.buildandshoot.com/viewtopic. ... 57&p=14321

I'm Yourself on the Build and Shoot forums by the way.

David Williams · **Joined:** Sun May 04, 2008 6:35 pm **Posts:** 1827

Dynasty wrote:

I briefly read through some of the material related to ambient occlusion and cone tracing. I don't really see how implementing AO on the CPU is viable.

Well it's certainly an open research problem, and as I said before we didn't manage to get it working well enough for Voxeliens. So I can't really give you an answer. But I do still think it has potential and hope to do some more work in this area in the coming year.

Dynasty wrote:

I've been thinking about this problem a lot and I think that I have an elegant solution for per-face ambient occlusion.
.
.
.

I think my gut feeling is that this is quite difficult. I think you'll still need to touch a lot of voxel data and (as you note below) it's these memory accesses which can be quite slow.

Dynasty wrote:

Edit 3: Also the current architecture of CubicSurfaceExtracter, etc is not particularly conducive to implementing this algorithm.

I wouldn't really expect to implement AO within the CubicSurfaceExtracter. If you want per-vertex AO I'd run the CubicSurfaceExtracter first and then compute AO value for each of the vertices, or if you want per-voxel AO I'd compute that before running the CubicSurfaceExtracter. But per face AO? I'm not exactly sure but I guess after extraction.

That said, there's another kind of AO which I think does belong in the CubicSurfaceExtractor. You show it in one of the screenshots on the Build 'n' shoot thread. It's where extreamely local AO is computer per-vertex from just the immediate surrounding voxels.

This is the image I mean:

http://i.imgur.com/no0YrjN.png

I've been thinking about this for a while so I added a task so that it doesn't get forgotten. It won't happen soon though:

https://bitbucket.org/volumesoffun/poly ... al-ambient

Dynasty wrote:

The algorithm I discussed previously probably has no advantage over the ray casting method. The primary bottleneck in the current algorithm is the large number of reads/writes made to memory. In the current implementation, anytime the raycaster wants to know if a voxel is solid or not it needs to pass an object to a callback, which in turn needs to get the pointer to the SimpleVolume, and then needs to get the specific voxel object, which calls another function which checks if the block is solid, and then even that block may call another function....

Yes, this could probably be made faster. But it's a trade off between speed and flexibility really, as the raycast is also useful for other task such as picking. I believe the callbacks should be inlined (I did do some testing here) and it's probably the memory access which is slow... but really some more profiling is needed.

Dynasty wrote:

To reduce memory requests I suggest that a boolean array be created which marks with a simple true and false whether or not a block is solid. Hopefully parts of this array will be stored in the CPU cache.

You can copy your volume data into a seperate volume with just one byte per voxel but PolyVox won't let you store a single bit per voxel. You'd need you own representation for that. Your new volume could also be downsampled which would speed things up a lot. When we were doing the experiments in Voxeliens we downsampled by a factor of four in each direction.

Dynasty wrote:

Also the path of all the rays needs to be computed only once.

Yes, it should be possible to do something like this. You can precompute the path for say 100 rays and then reuse these same rays for every vertex but with different starting points. You can also trace rays in parallel, so that you start say 10,000 rays at different points in the scene (all pointing in the same direction), compute the next step, and then apply that step to all rays. There are some research papers on this - try searching for 'Parallel Ray Traversal'.

Dynasty · **Joined:** Tue Feb 05, 2013 5:44 pm **Posts:** 33

I implemented the fake minecraft AO and I'm very pleased with the results.

Pic:

A problem that I'm currently having, which is especially noticeable when the AO effect is turned up.

I assume that if I change the order of the vertices that the problem will be fixed. I'm still trying to figure out a way to fix this.

Edit: Another problem has appeared. For colors close to white, the float used for material doesn't have enough precision to achieve what I want. I've been using bit operation tricks so far to store the color and AO shading data in a single uint32_t. When this number gets converted to blue, I assume that the bits representing the blue channel get cut off. When it converts back, it sets the blue channel to zero. Can I modify PositionMaterialNormal to use a uint32_t? I was amble to find it defined in the header file VertexTypes.h but I can't find the implementation anywhere.

David Williams · **Joined:** Sun May 04, 2008 6:35 pm **Posts:** 1827

Wow, that AO looks very nice! By 'fake minecraft AO' I assume you are referring to a propagation approach rather than one based on raycasting?

Dynasty wrote:

A problem that I'm currently having, which is especially noticeable when the AO effect is turned up.
.
.
.
I assume that if I change the order of the vertices that the problem will be fixed.

I don't think so, I think that will just make the diagonal lines go in the other direction?

I think this problem is conceptually similar to having a quad in 3D space where the four corners do not all lie on the same plane. In this case you cannot triagnulate the quad without a fold appearing in the middle. So you need to make sure that your four vertices do lie on a plane.

Your case is similar, exept we're talking about the RGB colour space rather than the XYZ physical space. The colours have been assigned to the four vertices of the quad in such a way that it cannot be split in to two triangles without a 'fold' in the colours.

I'm not exactly sure how to fix it... it would need some thought :-)

Dynasty wrote:

Can I modify PositionMaterialNormal to use a uint32_t? I was able to find it defined in the header file VertexTypes.h but I can't find the implementation anywhere

There is no seperate implementation - it's right there in the header ;-)

Have a look at the Cubiquity branch:

https://bitbucket.org/volumesoffun/poly ... ty-version

PositionMaterial is templatised on MaterialType but you could just use uint32_t instead. You'll need to update the surface extractor so look at the Cubiquity version of CubicSurfaceExtractor for inspiration.

Dynasty · **Joined:** Tue Feb 05, 2013 5:44 pm **Posts:** 33

David Williams wrote:

Wow, that AO looks very nice! By 'fake minecraft AO' I assume you are referring to a propagation approach rather than one based on raycasting?

Yeah, that is what I am referring to.

David Williams wrote:

I don't think so, I think that will just make the diagonal lines go in the other direction?

I think this problem is conceptually similar to having a quad in 3D space where the four corners do not all lie on the same plane. In this case you cannot triagnulate the quad without a fold appearing in the middle. So you need to make sure that your four vertices do lie on a plane.

Your case is similar, exept we're talking about the RGB colour space rather than the XYZ physical space. The colours have been assigned to the four vertices of the quad in such a way that it cannot be split in to two triangles without a 'fold' in the colours.

I'm not exactly sure how to fix it... it would need some thought :-)

The fix for this was actually pretty easy. Here is the idea behind it:

Each is decomposed into two separate triangles. There are two possible different ways you can partition the face into two triangles. For each diagonal of the square, take the absolute difference in lighting levels between the two vertices on the diagonal. You should partition the face so that the set of diagonals with the lowest absolute difference in lighting form the hypotenuse of the triangles.

For example, consider this picture:

The numbers at each corner are the lighting levels at that vertex. You should choose to split the quad along the red line because

abs(10-9) < abs(1-13)

Per vertex ambient occlusion

Who is online