I'm still thinking that a 3D video recording would be the best way to gather this information.
Don't worry about taking stills or optimal placement of the camera for stills, because all the information you need is captured in the process of taking a decent 3D video scan of each room. And each frame of that 3D video captures important information not only from the objects it's closest to, but also of the objects across the room, including which objects may partially or completely occlude other objects.
The geometry of still images are superior to that of videos, assuming proper gear. Thus, optimal frames taken with a good camera, is better than some video capture.
I wonder if techniques like this could work for Gaussian Splattering too?
Also, they mention AR headsets and Drones in the conclusion, which would both be fantastic use cases. High quality inspections could be done by allowing the drone to capture the absolute best angles just-in-time rather than a predetermined path
Most techniques for nerfs also work for the newer variants like guassian splatting.
It's the same fundamental process - optimizing a 3D representation of a scene using gradient descent. Neural networks are a very general-purpose way to represent anything, but that comes at a cost. Using a more 3D-specific representation improves performance.
... that's exactly what I find so exciting about this field, a factor of 3 here and a factor of 5 here and pretty soon I can go out with my lightfield camera and make a 3D model of your place of business to put it into the "metaverse" which you could never afford to do the way they make video games.
From a researcher's personal page[1], they're working on cashing in that GS stuff: estimating pose dynamics by regularizing Gaussians’ motion and rotation with local-rigidity constraints [2]. The videos are impressive[3], and seem to be bringing in the 4th dimension.
Yep, “neural radiance fields”, effectively neural networks can be trained to model how light is emitted and moves in a volume of space to, for instance, take a few pictures of a scene and then be able the synthesize views of the scene. If it can be practically sped up it would be a way to make VR models of spaces without specialist talent in 3-d modeling.
Gaussian splatting is an interesting take on the technique that seems to speed up fitting and rendering, see this article from a couple days ago: https://news.ycombinator.com/item?id=37415478
My understanding is that NeRF¹ is just one (very interesting) approach to this. Another approach would be "Multi-View Stereo (MVS)", which you may be familiar with via software like COLMAP.
Don't worry about taking stills or optimal placement of the camera for stills, because all the information you need is captured in the process of taking a decent 3D video scan of each room. And each frame of that 3D video captures important information not only from the objects it's closest to, but also of the objects across the room, including which objects may partially or completely occlude other objects.