<Problem>
When two cameras look at different parts of a scene, their corresponding depth maps could sometimes overlap. Now technically, if there is overlap, you might be able to identify the overlap and reverse engineer a whole 3D scene. Essentially, match various patches, and based on the geometries, and positions, make it such that all is integrated in one singular reference system.
Pick a handful of distinctive points in each depth map, give every point a short “fingerprint” describing the shape of its immediate neighborhood, and line up points whose fingerprints look most alike. Many of those matches will be wrong, so keep only the pairs whose geometric relationships agree in both maps. If two pairs fit the same distance-and-angle pattern they likely belong to the same rigid object. The surviving few dozen matches are enough to solve for one rigid rotation-and-translation that best aligns the two clouds.
Clouds that lie on a perfectly flat wall are useless to identify, because every neighborhood there looks almost identical, and where more cameras are present matches may happen between cameras looking at completely different scenes. I am now filtering off any flat surfaces, to ensure that such matches don’t happen.
</Problem>
What kind of other approaches may be worth exploring? Been deep into this for the past couple of days, and would love some fresh perspective.