I added some profiling to the code.
The new average for this algorithm is 85% of the chunk generation time.
It's a lot, basically. I'll try and optimize it now, it's unusable right now for any serious business :S
EDIT:
Well, I finally optimized this little sucker =)
Looks like std::map is a time eating machine as this numbers will tell.
Both profiling sessions were made without the c_lonelyQuads array.
With std::map::findTime to merge tris: 0.006500
Time to merge 3380 quads: 12.534206
Time to merge 1374 quads: 6.232852
Time to merge 444 quads: 4.126423
Time to merge 152 quads: 3.380110
Time to merge 6 quads: 2.602364
Time to merge 0 quads: 2.426862
Total merges: 5356
Time to optimize mesh: 31.317020
With a freaking stone age bool arrayTime to merge tris: 0.006767
Time to merge 3380 quads: 0.349637
Time to merge 1374 quads: 0.323783
Time to merge 444 quads: 0.336214
Time to merge 152 quads: 0.342179
Time to merge 6 quads: 0.368082
Time to merge 0 quads: 0.371735
Total merges: 5356
Time to optimize mesh: 2.105889
I'll update the OP with the functional code
