Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

tugrul512bit/FastCollisionDetectionLib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastCollisionDetectionLib

C++ fast collision detection for uniform(and non-uniform)-distributed AABB particles using adaptive grid with implicit vectorization.

Non-Sparse Adaptive Grid

  • 10 million dynamic particles AABB collision check per second against static grid of 8000 particles
  • 1000x speedup against naive brute-force algorithm for 40k particles (static vs static), with uniform-distribution in range [0 - 1]).
    • teapot-in-stadium problem is partially solved by "adaptive" grid:
    • 290x speedup when half of AABBs are 10x further than each other [0-1] and [10-11]
    • 230x speedup when half of AABBs are 10x far and a single AABB 100x far: [0-1] x N/2, [10-11] x N/2, [100-101] x1
  • Produced collision list does not contain duplicate pairs of collisions
  • Particle data is not touched, work done only on pointers internally
  • Currently it is adaptive, but needs optimizations on memory handling.
    • On every cell-overflow, it stretches the cell to AABB of all particles and converts to a grid of 4x4x4 cells each with 4 capacity
  • Implementation of IParticle is an AABB (axis-aligned bounding box) model
    • In user defined particle (box as example here), methods (getMinX/Y/Z and getMaxX/Y/Z) must return AABB corners of the underlying user-particle

Sparse - Linear - Adaptive Grid

  • More than 2000x speedup against naive brute-force, in single-thread for 10000 particles
  • 60 FPS for 20000 particles and ~25000 collision pairs on 2.1GHz FX8150 single-thread
    • 100+ FPS for new CPUs
  • Better performance stability compared to non-sparse version
  • Better SIMD support on all-pairs computation method using tiled-computing
  • Non-zero based object-id values supported (getId() method in IParticle interface)
// prepare memory pool
FastColDetLib::MemoryPool memPool;

// map grid to a volume of cube between corners of (0,0,0) and (10005,10005,10005)
FastColDetLib::AdaptiveGridV2 grid(memPool,0,0,0,10005,10005,10005);

// implement IParticle<float>
struct AABBofPointCloud: public FastColDetLib::IParticle<float>
{
   ...
   	const CoordType getMaxX()const {return xmax;}
	const CoordType getMaxY()const {return ymax;}
	const CoordType getMaxZ()const {return zmax;}
	const CoordType getMinX()const {return xmin;}
	const CoordType getMinY()const {return ymin;}
	const CoordType getMinZ()const {return zmin;}
	const int getId()const {return id;}
    ...
};

// initialize AABB vector
std::vector<AABBofPointCloud> AABBs;

while(simulation)
{
    // clear tree data
    grid.clear();
    
    // add particles that implement IParticle<float> into grid
    grid.addParticles(N,AABBs.data());
    
    // build tree
    grid.buildTree();
    
    // compute all-pairs collision array 
    // 60FPS on FX8150 2.1GHz single-thread for 20000 particles with less than 29000 collisions
    // 100FPS on Xeon Gold 5215 2.9GHz single-thread for 20000 particles with less than 29000 collisions
    std::vector<std::pair<int,int>> vec = grid2_0.findCollisionsAll();
    
    // the vec contains id-values of particles that have their AABBs collide so that you can do further fine-grained collision checks between them
}

Multithreaded Tree of Sparse - Linear - Adaptive Grid

  • 60 FPS for 40000 particles' AABB all-pair computations
  • Bottlenecked by RAM bandwidth and mutex-array locking throughput
  • Only zero-based object-id values supported
  • Work load is balanced on particles, not volumes, this makes better distribution of particles on threads but causes duplicated work due to merging of results from all leaf nodes
    // 7 threads with load-balancing by a tree, mapped to (0,0,0)-(10005,10005,10005) region
    AdaptiveGridTree<7> test(0,0,0,10005,10005,10005);

    for(int i=0;i<100;i++)
    {
		size_t nano;
		{
			FastColDetLib::Bench bench(&nano);
			
			// clear contents of memory pool of adaptive grid tree
			test.clear();
			
			// AABBs is a vector of objects that implements IParticle<float> interface 
	                // (only zero-based object-id values supported from getId() method of IParticle<float>)
			test.addParticles(N,AABBs.data()); 
			
			// non-duplicate pairs of collisions 
			const auto coll = test.computeAllPairs();
			std::cout<<"c="<<coll.size()<<std::endl;
		}
		std::cout<<"t="<<nano<<std::endl;
    }

For details, please visit https://github.com/tugrul512bit/FastCollisionDetectionLib/wiki wiki page.

Working demo for non-sparse (old version) adaptive grid (requires linking pthread for header and gomp/fopenmp for this demo):

#include"FastCollisionDetectionLib.h"

// user-object, with IParticle<coordinate_type> interface for querying AABB within the API
class Box: public FastColDetLib::IParticle<float>
{
public:
	Box(float xPrm=0, float yPrm=0, float zPrm=0, float x2Prm=0, float y2Prm=0, float z2Prm=0, size_t idPrm=0)
	{
		x=xPrm;
		y=yPrm;
		z=zPrm;

		x2=x2Prm;
		y2=y2Prm;
		z2=z2Prm;
		id=idPrm;
	}

	const float getMaxX() const { return x>=x2?x:x2;}
	const float getMaxY() const { return y>=y2?y:y2;}
	const float getMaxZ() const { return z>=z2?z:z2;}

	const float getMinX() const { return x>=x2?x2:x;}
	const float getMinY() const { return y>=y2?y2:y;}
	const float getMinZ() const { return z>=z2?z2:z;}
	const int getId() const {return id;}

	~Box(){}

private:
	float x,y,z,x2,y2,z2;
	size_t id;
};

#include<iostream>
#include<omp.h>
#include"Generator.h"
int main()
{

	oofrng::Generator<64> gen;

	{
		// d^3 number of particles
		const int d = 20;
		const int n=d*d*d;
		std::cout<<"n="<<n<<std::endl;
		std::vector<Box> box(n);
		for(int i=0;i<n;i++)
		{
			auto x1 = gen.generate1Float()*d;
			auto y1 = gen.generate1Float()*d;
			auto z1 = gen.generate1Float()*d;

			float dx = 0.05f;
			float dx2 = 0.05f;
			box[i]=Box(x1,y1,z1,x1+dx2+dx*gen.generate1Float(),y1+dx2+dx*gen.generate1Float(),z1+dx2+dx*gen.generate1Float(),i);
		}

		// thread-pool releases itself once it is out of scope
		// single instance of thread pool can be used for multiple adaptive grids
		FastColDetLib::ThreadPool<float> thr;
		FastColDetLib::AdaptiveGrid<float> grid(thr,-1,-1,-1,d+1,d+1,d+1);
		FastColDetLib::BruteForce<float> bruteForce;
		std::vector<FastColDetLib::CollisionPair<float>> coll3D,coll3Dbrute;


		std::cout<<"add"<<std::endl;
		bruteForce.add(&box[0],n);
		std::cout<<"compute"<<std::endl;
		bool bfor = true;


		for(int k=0;k<40;k++)
		{
			size_t nano1,nano2;

			{
				{
					FastColDetLib::Bench bench(&nano1);
					{
						FastColDetLib::Bench bench(&nano1);
						grid.clear();
					}
					std::cout<<"grid-static-object-clear: "<<nano1/1000000000.0<<" s "<<std::endl;
					{
						FastColDetLib::Bench bench(&nano1);
						grid.add(box.data(),n);
					}
					std::cout<<"grid-static-object-add ("<<n<<" particles AABB): "<<nano1/1000000000.0<<" s "<<std::endl;
					{
						FastColDetLib::Bench bench(&nano1);
						coll3D = grid.getCollisions();
					}
					std::cout<<"grid-static-object-compute: "<<nano1/1000000000.0<<" s "<<coll3D.size()<<std::endl;
				}
				std::cout<<"grid-static-total: "<<nano1/1000000000.0<<" s "<<coll3D.size()<<std::endl;
				std::vector<float> rand(1000000*3);
				gen.generate(rand.data(),1000000*3);
				{
					std::mutex mut;
					std::vector<FastColDetLib::IParticle<float>*> res;
					{
						FastColDetLib::Bench bench(&nano1);
						#pragma omp parallel for
						for(int j=0;j<100;j++)
						{
							std::vector<FastColDetLib::IParticle<float>*> resTmp;
							for(int i=0;i<1000;i++)
							{
								auto x = rand[i*3]*d;
								auto y = rand[i*3+1]*d;
								auto z = rand[i*3+2]*d;
								auto item = Box(x,y,z,x+0.25,y+0.25,z+0.25);
								auto collisions = grid.getDynamicCollisionListFor(&item);
								std::copy(collisions.begin(),collisions.end(),std::back_inserter(resTmp));
							}
							std::lock_guard<std::mutex> lg(mut);
							std::copy(resTmp.begin(),resTmp.end(),std::back_inserter(res));
						}

					}
					std::cout<<"grid-compute-dynamic (100k particles AABB): "<<nano1/1000000000.0<<" s "<<res.size()<<std::endl;
				}

			}


			if(bfor)
			{
				FastColDetLib::Bench bench(&nano2);
				coll3Dbrute = bruteForce.getCollisions();
			}
			if(bfor)
				std::cout<<"Brute-force ("<<n<<" particles AABB): "<<nano2/1000000000.0<<" s "<<coll3Dbrute.size()<<std::endl;
			std::cout<<"------------------------------------------------------------------------------------"<<std::endl;
		}


		if(bfor)
		{
			if(coll3D.size() != coll3Dbrute.size())
			{
				std::cout<<"ERROR: not equal sizes of collision lists"<<std::endl;
				std::cout<<coll3D.size()<<" vs "<< coll3Dbrute.size()<<std::endl;
				const int sz = std::min(coll3D.size(),coll3Dbrute.size());
				const int sz2 = sz<10?sz:10;
				for(int i=0;i<sz2;i++)
				{
					if((coll3D[i].getParticle1()->getId()!=coll3Dbrute[i].getParticle1()->getId()) &&
							(coll3D[i].getParticle2()->getId()!=coll3Dbrute[i].getParticle2()->getId())
							)
					{
						std::cout<<"ERRRROOOR!"<<std::endl;
						std::cout<<coll3D[i].getParticle1()->getId()<<"<-->"<<coll3D[i].getParticle2()->getId()<<"        "<<coll3Dbrute[i].getParticle1()->getId()<<"<-->"<<coll3Dbrute[i].getParticle2()->getId()<<std::endl;
					}
				}
			}
			else
			{
				const size_t sz = coll3D.size();
				for(size_t i=0;i<sz;i++)
				{

					if((coll3D[i].getParticle1()->getId()!=coll3Dbrute[i].getParticle1()->getId()) &&
							(coll3D[i].getParticle2()->getId()!=coll3Dbrute[i].getParticle2()->getId())
							)
					{
						std::cout<<"ERRRROOOR!"<<std::endl;
						std::cout<<coll3D[i].getParticle1()->getId()<<"<-->"<<coll3D[i].getParticle2()->getId()<<"        "<<coll3Dbrute[i].getParticle1()->getId()<<"<-->"<<coll3Dbrute[i].getParticle2()->getId()<<std::endl;
					}
				}
			}
		}
		std::cout<<"--"<<std::endl;


	}
	return 0;
}

output for FX8150 at 2.1GHz:

n=8000
add
compute
grid-static-object-clear: 6.4614e-05 s 
grid-static-object-add (8000 particles AABB): 0.00656692 s 
grid-static-object-compute: 0.00195991 s 16
grid-static-total: 0.00866529 s 16
grid-compute-dynamic (100k particles AABB): 0.0137853 s 3600
Brute-force (8000 particles AABB): 0.541069 s 16
------------------------------------------------------------------------------------
grid-static-object-clear: 0.00143487 s 
grid-static-object-add (8000 particles AABB): 0.00523593 s 
grid-static-object-compute: 0.00182263 s 16
grid-static-total: 0.00853913 s 16
grid-compute-dynamic (100k particles AABB): 0.00835323 s 4300
Brute-force (8000 particles AABB): 0.541662 s 16
------------------------------------------------------------------------------------
grid-static-object-clear: 0.00161331 s 
grid-static-object-add (8000 particles AABB): 0.00521717 s 
grid-static-object-compute: 0.00180544 s 16
grid-static-total: 0.0086723 s 16
grid-compute-dynamic (100k particles AABB): 0.00864333 s 3200
Brute-force (8000 particles AABB): 0.54128 s 16
------------------------------------------------------------------------------------
grid-static-object-clear: 0.00151348 s 
grid-static-object-add (8000 particles AABB): 0.00524639 s 
grid-static-object-compute: 0.00180144 s 16
grid-static-total: 0.00860804 s 16
grid-compute-dynamic (100k particles AABB): 0.010513 s 3200
Brute-force (8000 particles AABB): 0.541246 s 16
------------------------------------------------------------------------------------
grid-static-object-clear: 0.00160501 s 
grid-static-object-add (8000 particles AABB): 0.00521408 s 
grid-static-object-compute: 0.00176137 s 16
grid-static-total: 0.00862669 s 16
grid-compute-dynamic (100k particles AABB): 0.0120599 s 3000
Brute-force (8000 particles AABB): 0.54391 s 16
------------------------------------------------------------------------------------
grid-static-object-clear: 0.00174257 s 
grid-static-object-add (8000 particles AABB): 0.00548767 s 
grid-static-object-compute: 0.00182023 s 16
grid-static-total: 0.00909577 s 16
grid-compute-dynamic (100k particles AABB): 0.0083235 s 3500
Brute-force (8000 particles AABB): 0.54521 s 16
------------------------------------------------------------------------------------

This is ~8500x performance for checking a particle collision against a grid of static objects and ~50x performance for static-static collision check (or dynamic-dynamic).

More particles:

n=64000
add
compute
grid-static-object-clear: 1.3058e-05 s 
grid-static-object-add (64000 particles AABB): 0.0347275 s 
grid-static-object-compute: 0.0106954 s 114
grid-static-total: 0.0455527 s 114
grid-compute-dynamic (100k particles AABB): 0.0154652 s 3700
Brute-force (64000 particles AABB): 35.473 s 114

777x performance for static vs static collision checking, more than 2000x performance for static vs dynamic.

About

C++ adaptive grid for fast collision detection between AABB particles.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

Morty Proxy This is a proxified and sanitized view of the page, visit original site.