Ok, I can expand on my previous answer now...
PolyVox treats the representation of the volume separate from the surface extraction code, meaning that more than one surface extractor is available and other volume representations can be added in the future. I think it's useful to keep this distinction when looking at improvements which can be made.
Focusing on the volume representation, the biggest problem is currently the rather poor compression rate. This isn't completely accidental, as one of the advantages of keeping the data largely uncompressed is that it can be modified very quickly without needing to decode any complex data structures. However, the memory quickly gets but of hand and it's clear people want much larger volumes.
My proposed solution is to scrap the block sharing mechanism and implement a simple, fast, RLE compression of the blocks. However, I will keep a pool of recently used blocks (perhaps 10 or so) in an uncompressed state, because if they have been written to recently then there's a good chance they will be written to again soon. This can all be done without modifying the Volume class interface, and I intend to implement something like this in the next month or so.
The next step (which is what you really want) is to allow blocks to drop out of memory completely, so that you can have infinite terrain with only the nearby part in memory. This probably will need some changes to the Volume class interface, possibly so that some callback mechanism can be implemented to allow user code to provide data in the case that it is not in memory (possibly be reading it from disk, maybe generating it procedurally, etc). It's probably not that hard to implement, but may need some experimentation to find the best approach. Unfortunatly I simply have no need for it at the moment as I'm not expecting to work with volumes of more than 1024^3 or so, and I do have to prioritise work based around what I will use/test.
You also mention using an octree - this is of course a good structure from the point of view of data compression and for obtaining lower resolution data for lower LODs. The down sides are that it's probably slower to modify as changes need to be propergated up the tree, and also it is slower to access neighbouring voxels of the one you are currently in (as the neighbours may be in a different node). This last property is important because both the marching cubes algorithm and the surface normal calculations rely on accessing neighbouring voxels quickly. Overall the octree might be a good choice of data structure but I'd need to experiment to see how these different aspects balance out.
Moving onto the surface extraction, I did actually experiment with generating the surface at different LOD levels in the past. It's possible to do it at the moment by keeping a second volume at half the resolution of the first one, and simply running the surface extractor over it as usual. The problem is that there are usually seams between the high resolution mesh and the low resolution mesh. My understanding of Eric Lengyel's work is that he resolves this by examining both volumes at the same time whilst performing the extraction - i.e. you need both the high and low resolution data in memory at this time, so it's not particularly a memory saving technique. But it does fix the seams between LOD levels in a very elegant and robust way.
I should say that I haven't read Eric's thesis, my comments are just based on skimming it and the discussions I have read about it so appologies for any misinformation. It's also worth noting that Eric said he plans to release his lookup tables at some point in the future:
http://www.gamedev.net/topic/591926-mar ... try4753715If he does so then it could well pave the way for his algorithm to be implemented in PolyVox.
Right, I've written a lot. I hope I don't sound to negative, you raise some excellent and interesting points and describe some features which I would like to implement. It's just matter of limited time and prioritizing what I work on. I'll update the Wiki todo list with some of the things mentioned above.