Coplanar face merging - OPTIMIZED!

onimatrix · **Joined:** Wed Dec 08, 2010 5:54 am **Posts:** 27

Hi!

I made a coplanar face merging function that takes your favorite cubic extracted mesh and simplifies it leaving an average of 17% of the poly count.

This helps a lot both when rendering and when serializing the mesh to a file (So the mesh extractor doesn't slow things down).

As examples, I have this pics to show you:

This mesh here was composed of 3120 triangles, now it's made of 501 quads. (16% ratio)

This other mesh was composed of 5274 triangles, now it's made of 947 quads. (17% ratio)

I know that comparing tris with quads is unfair, but it's still a great way to minimize game data (And lighting fast, 2 seconds to merge 5000 quads).

Well, here's the code for you all =D
Please note that, while you'll still be using the vertex data in the surface mesh, the indices are now filling a vector<Quad>.
Also, I'm using PositionMaterialNormal vertexType. Sorry

DoCoplanarMerge returns the number of successful merging operations done in that pass. I call it until no more merging can be done

.h

Code:

struct Quad
{
   uint32_t s_idx[4];
};

vector<Quad>      c_quads;

void OptimizeMesh(void);
int DoCoplanarMerge(void);

.cpp

Code:

   void PolyvoxModel::OptimizeMesh(void){
      
      const vector<uint32_t>& vecIndices = c_surfaceMesh->getIndices();
      const vector<PolyVox::PositionMaterialNormal>& vecVertices = c_surfaceMesh->getVertices();

      // FOR EVERY TWO TRIS, MAKE A QUAD
      for(uint32_t l_triWalk = 0; l_triWalk < vecIndices.size(); l_triWalk += 6)
      {
         uint32_t l_triA = l_triWalk;
         uint32_t l_triB = l_triWalk + 3;

         if ((vecIndices[l_triA + 1] == vecIndices[l_triB]) &&
            (vecIndices[l_triA + 2] == vecIndices[l_triB + 2]))
         {
            Quad l_newQuad;

            l_newQuad.s_idx[0] = vecIndices[l_triA];
            l_newQuad.s_idx[1] = vecIndices[l_triA + 1];
            l_newQuad.s_idx[2] = vecIndices[l_triB + 1];
            l_newQuad.s_idx[3] = vecIndices[l_triA + 2];

            c_quads.push_back(l_newQuad);
         }
         else if (   (vecIndices[l_triA + 1] == vecIndices[l_triB + 1]) &&
                  (vecIndices[l_triA + 2] == vecIndices[l_triB]))
         {
            Quad l_newQuad;

            l_newQuad.s_idx[0] = vecIndices[l_triA];
            l_newQuad.s_idx[1] = vecIndices[l_triA + 1];
            l_newQuad.s_idx[2] = vecIndices[l_triB + 2];
            l_newQuad.s_idx[3] = vecIndices[l_triA + 2];

            c_quads.push_back(l_newQuad);
         }
      }

      int l_totalMerges = 0;
      int l_passMerges = 0;
      do{
         l_passMerges = DoCoplanarMerge();
         l_totalMerges += l_passMerges;
      } while(l_passMerges);

      PrepareOptimizedDisplayList();
   }

int PolyvoxModel::DoCoplanarMerge(void){

      const vector<PolyVox::PositionMaterialNormal>& vecVertices = c_surfaceMesh->getVertices();

      // MERGE ADYACENT QUADS
      vector<Quad> l_quadsByPlane[6];
      enum{
         eUp, eDown, eLeft, eRight, eForward, eBackward, ePlaneCount
      };

      for (int l_quadWalk = 0; l_quadWalk < c_quads.size(); l_quadWalk++){

         PolyVox::Vector3DFloat l_normal = vecVertices[c_quads[l_quadWalk].s_idx[0]].getNormal();

         if (l_normal.getX() == 1)      l_quadsByPlane[eRight].push_back(c_quads[l_quadWalk]);
         else if (l_normal.getX() == -1) l_quadsByPlane[eLeft].push_back(c_quads[l_quadWalk]);
         else if (l_normal.getY() == 1)   l_quadsByPlane[eUp].push_back(c_quads[l_quadWalk]);
         else if (l_normal.getY() == -1) l_quadsByPlane[eDown].push_back(c_quads[l_quadWalk]);
         else if (l_normal.getZ() == 1)   l_quadsByPlane[eForward].push_back(c_quads[l_quadWalk]);
         else if (l_normal.getZ() == -1) l_quadsByPlane[eBackward].push_back(c_quads[l_quadWalk]);
      }

      c_quads.clear();

      // CHECK IF ANY TWO QUADS SHARE TWO VERTICES, MERGE THEM!

      int l_mergedQuads = 0;

      for (int l_planeWalk = 0; l_planeWalk < ePlaneCount; l_planeWalk++){

         bool* l_alreadyMerged = new bool[l_quadsByPlane[l_planeWalk].size()];
         memset(l_alreadyMerged, 0, l_quadsByPlane[l_planeWalk].size());
         
         // MERGE AND PUSH ADJACENT QUADS
         for (int l_currentQuadWalk = 0; l_currentQuadWalk < l_quadsByPlane[l_planeWalk].size(); l_currentQuadWalk++){
            
            if (l_alreadyMerged[l_currentQuadWalk]) continue;

            Quad l_currentQuad = l_quadsByPlane[l_planeWalk][l_currentQuadWalk];
            PolyVox::Vector3DFloat l_currentPos[4];

            for (int l_vWalk = 0; l_vWalk < 4; l_vWalk++){
               l_currentPos[l_vWalk] = vecVertices[l_currentQuad.s_idx[l_vWalk]].getPosition();
            }

            for (int l_compareQuadWalk = 0; l_compareQuadWalk < l_quadsByPlane[l_planeWalk].size(); l_compareQuadWalk++){
            
               if (l_currentQuadWalk == l_compareQuadWalk) continue;
               if (l_alreadyMerged[l_currentQuadWalk]) break;
               if (l_alreadyMerged[l_compareQuadWalk]) continue;
               
               Quad l_compareQuad = l_quadsByPlane[l_planeWalk][l_compareQuadWalk];
               PolyVox::Vector3DFloat l_comparePos[4];

               for (int l_vWalk = 0; l_vWalk < 4; l_vWalk++){
                  l_comparePos[l_vWalk] = vecVertices[l_compareQuad.s_idx[l_vWalk]].getPosition();
               }

               if ((l_currentPos[2] == l_comparePos[1]) &&
                  (l_currentPos[3] == l_comparePos[0])){

                  Quad l_mergedQuad;
                  l_mergedQuad.s_idx[0] = l_currentQuad.s_idx[0];
                  l_mergedQuad.s_idx[1] = l_currentQuad.s_idx[1];
                  l_mergedQuad.s_idx[2] = l_compareQuad.s_idx[2];
                  l_mergedQuad.s_idx[3] = l_compareQuad.s_idx[3];

                  c_quads.push_back(l_mergedQuad);

                  l_alreadyMerged[l_currentQuadWalk] = true;
                  l_alreadyMerged[l_compareQuadWalk] = true;
               }
               else if ((l_currentPos[1] == l_comparePos[0]) &&
                     (l_currentPos[2] == l_comparePos[3])){

                  Quad l_mergedQuad;
                  l_mergedQuad.s_idx[0] = l_currentQuad.s_idx[0];
                  l_mergedQuad.s_idx[1] = l_compareQuad.s_idx[1];
                  l_mergedQuad.s_idx[2] = l_compareQuad.s_idx[2];
                  l_mergedQuad.s_idx[3] = l_currentQuad.s_idx[3];

                  c_quads.push_back(l_mergedQuad);

                  l_alreadyMerged[l_currentQuadWalk] = true;
                  l_alreadyMerged[l_compareQuadWalk] = true;
               }
               else if ((l_currentPos[0] == l_comparePos[3]) &&
                     (l_currentPos[1] == l_comparePos[2])){

                  Quad l_mergedQuad;
                  l_mergedQuad.s_idx[0] = l_compareQuad.s_idx[0];
                  l_mergedQuad.s_idx[1] = l_compareQuad.s_idx[1];
                  l_mergedQuad.s_idx[2] = l_currentQuad.s_idx[2];
                  l_mergedQuad.s_idx[3] = l_currentQuad.s_idx[3];

                  c_quads.push_back(l_mergedQuad);

                  l_alreadyMerged[l_currentQuadWalk] = true;
                  l_alreadyMerged[l_compareQuadWalk] = true;
               }
               else if ((l_currentPos[0] == l_comparePos[1]) &&
                     (l_currentPos[3] == l_comparePos[2])){

                  Quad l_mergedQuad;
                  l_mergedQuad.s_idx[0] = l_compareQuad.s_idx[0];
                  l_mergedQuad.s_idx[1] = l_currentQuad.s_idx[1];
                  l_mergedQuad.s_idx[2] = l_currentQuad.s_idx[2];
                  l_mergedQuad.s_idx[3] = l_compareQuad.s_idx[3];

                  c_quads.push_back(l_mergedQuad);

                  l_alreadyMerged[l_currentQuadWalk] = true;
                  l_alreadyMerged[l_compareQuadWalk] = true;
               }
            }
         }


         // PUSH LONELY QUADS
         for (int l_currentQuadWalk = 0; l_currentQuadWalk < l_quadsByPlane[l_planeWalk].size(); l_currentQuadWalk++){
            if (!l_alreadyMerged[l_currentQuadWalk]){

               c_quads.push_back(l_quadsByPlane[l_planeWalk][l_currentQuadWalk]);
            }
            else{
               l_mergedQuads++;
            }
         }

         delete[] l_alreadyMerged;
      }   

      return l_mergedQuads;
   }

onimatrix · **Joined:** Wed Dec 08, 2010 5:54 am **Posts:** 27

I'm aware that keeping lonely quads in another vector until the end of the passes would make it somewhat faster. That will be optimization number one I guess =p

EDIT: I have done that just now. It creates some artifacts (It doubles the amount of quads), but takes 30% less time than the other version. I'll make it toggleable, it's speed against memory and performance.

The code that you need to add/modify is this:

.h

Code:

vector<Quad> c_lonelyQuads;

.cpp

Code:

c_lonelyQuads.clear();

int l_totalMerges = 0;
int l_passMerges = 0;
do{
   l_passMerges = DoCoplanarMerge();
   l_totalMerges += l_passMerges;
} while(l_passMerges);

// Although c_quads should be empty by now, we append c_lonelyQuads to it
c_quads.reserve(c_quads.size() + c_lonelyQuads.size());
c_quads.insert(c_quads.end(), c_lonelyQuads.begin(), c_lonelyQuads.end());

----------------------------------------------------

// PUSH LONELY QUADS
for (int l_currentQuadWalk = 0; l_currentQuadWalk < l_quadsByPlane[l_planeWalk].size(); l_currentQuadWalk++){
   if (l_alreadyMerged.find(l_currentQuadWalk) == l_alreadyMerged.end()){
      c_lonelyQuads.push_back(l_quadsByPlane[l_planeWalk][l_currentQuadWalk]);
   }
   else{
      l_mergedQuads++;
   }
}

onimatrix · **Joined:** Wed Dec 08, 2010 5:54 am **Posts:** 27

Another thing that's missing is the purge of non used vertices... but I should get to work =p

AndiNo · **Joined:** Sun Oct 03, 2010 10:13 pm **Posts:** 73

This sounds very interesting, please keep us updated about any development.
However from what I know all Ogre users won't be able to use this, as Ogre has no quad support. Correct me if I'm wrong though. But I guess it wouldn't be that hard to just output merged triangles instead of quads.

onimatrix · **Joined:** Wed Dec 08, 2010 5:54 am **Posts:** 27

I'm not using Ogre so I wouldn't know.
Surely it must have quad support, it's one huge SDK :S

The thing with merged triangles is that it makes my head hurt in funny places.
IMO, quads are more well behaved and simplify a lot of stuff.

Still, converting the resulting quads to triangles again would be trivial (Although rather time consuming).

David Williams · **Joined:** Sun May 04, 2008 6:35 pm **Posts:** 1827

Great work, onimatrix! Those are some really nice results.

onimatrix wrote:

Altough it's not very optimized yet, it took 11 seconds to generate the volume, the terrain, the mesh and the optimized mesh...)

I'm curious, how is the total time split between those tasks? In particular the mesh generation vs the mesh simplification? Which one is faster?

Regarding quads vs triangles, Ogre does indeed have support for rendering quads via the ManualObject class:

http://www.ogre3d.org/docs/api/html/classOgre_1_1ManualObject.html#a1ac70dbc6f35180d5c8bab671c3f2257

There is clearly some desire in the community to have triangle reduction available for the cubic surface extractor, and as I've mentioned there is already some code available for the Marching Cubes surface extractor. Adapting this code for the cubic surface extractor is definatly on my todo list...

onimatrix · **Joined:** Wed Dec 08, 2010 5:54 am **Posts:** 27

Quote:

I'm curious, how is the total time split between those tasks? In particular the mesh generation vs the mesh simplification? Which one is faster?

Well, obviously it depends on the source volume, but the simplification takes 60% avg of the time. Right now, I do it to reduce filesize and augment rendering performance.

Also, I feel like that map<Quad> l_alreadyMerged coud be faster simply with an old fashioned fixed sized array. map::find works on logarithmic time

David Williams · **Joined:** Sun May 04, 2008 6:35 pm **Posts:** 1827

onimatrix wrote:

Well, obviously it depends on the source volume, but the simplification takes 60% avg of the time. Right now, I do it to reduce filesize and augment rendering performance.

Ok, so it slightly more than doubles the total time to extract the mesh. I think that's a pretty good tradeoff for the improvement you get, and I'm sure it can be optimised further.

onimatrix · **Joined:** Wed Dec 08, 2010 5:54 am **Posts:** 27

I added some profiling to the code.

The new average for this algorithm is 85% of the chunk generation time.
It's a lot, basically. I'll try and optimize it now, it's unusable right now for any serious business :S

EDIT:
Well, I finally optimized this little sucker =)
Looks like std::map is a time eating machine as this numbers will tell.
Both profiling sessions were made without the c_lonelyQuads array.

With std::map::find
Time to merge tris: 0.006500
Time to merge 3380 quads: 12.534206
Time to merge 1374 quads: 6.232852
Time to merge 444 quads: 4.126423
Time to merge 152 quads: 3.380110
Time to merge 6 quads: 2.602364
Time to merge 0 quads: 2.426862
Total merges: 5356
Time to optimize mesh: 31.317020

With a freaking stone age bool array
Time to merge tris: 0.006767
Time to merge 3380 quads: 0.349637
Time to merge 1374 quads: 0.323783
Time to merge 444 quads: 0.336214
Time to merge 152 quads: 0.342179
Time to merge 6 quads: 0.368082
Time to merge 0 quads: 0.371735
Total merges: 5356
Time to optimize mesh: 2.105889

I'll update the OP with the functional code

onimatrix · **Joined:** Wed Dec 08, 2010 5:54 am **Posts:** 27

I changed PositionMaterialNormal vertices for PositionMaterial ones, as a mean to use CubicSurfaceExtractor and get rid of the artifacts in the edges of the volume.

Then, after merging the tris, I calculate the normals by hand (Which takes no time at all) and propagate this data to the merged quads.

This added a lot of missing geometry, resulting in almost 300% the time while merging.

Still, it's worth it.

If you need the code, ask me for it. I'll release a new version of the code in the OP when something awesome happens to it.

Coplanar face merging - OPTIMIZED!

Who is online