By Team Blue Bird (Bașnic Ioan, Belea Sebastian, Horvath Petra)
INGP is a software based on AI that enables 4 image manipulation methods that alter images at lightning speeds. Compared to their not so old predecessors, it reduces training time in astounding ways. We will focus and dive into 2 of them: NeRF and Gigapixel image.
Architecture
Graphics primitives are 'represented by mathematical functions that parameterize appearance.' The goal is to have high-quality, detailed graphics that are also fast and compact. The finer a grid of data, the more detailed the resulting graphics. However, the finer a grid of data, the more costly. This is how we can reach the main reason for the improvement of this paper compared to its predecessors. In previous work the data was represented through dense grids or multi-resolution grids and octrees, however, most of them were GPU unfriendly or wasted storage space on empty parameters. The solution this paper has come up with is straight up hashing the coordinates and using them directly as an index into a data array, leaving the hash collisions to be figured out by the network.
In a nutshell, for a fully connected neural network m(y; Φ), the most important aspect is to have as much of an efficient data storage and encoding of its inputs y = enc(x; θ) as possible. Their neural network not only has trainable weight parameters Φ, but also trainable encoding parameters θ. These are arranged into L levels, each containing up to T feature vectors with dimensionality F. In this way, the approximation quality and training speed of several applications can be improved, without trading performance for it.
The figure below illustrates the steps performed in multiresolution hash encoding method they proposed:
In a nutshell, for a fully connected neural network m(y; Φ), the most important aspect is to have as much of an efficient data storage and encoding of its inputs y = enc(x; θ) as possible. Their neural network not only has trainable weight parameters Φ, but also trainable encoding parameters θ. These are arranged into L levels, each containing up to T feature vectors with dimensionality F. In this way, the approximation quality and training speed of several applications can be improved, without trading performance for it.
The figure below illustrates the steps performed in multiresolution hash encoding method they proposed:
NeRF
Using some pictures that describe the same object from different angles, the neural network can reconstruct different parts of the image, creating a 3D explorable environment of the described subject. This is not something entirely new, since it has already been put in practice some years ago. The astounding improvement, however, is the speed with which this model gets created.
Only 2 years ago, the original NeRF paper (https://arxiv.org/pdf/2003.08934.pdf) took at least 12 hours to train on the scene with the bulldozer (can be seen in the first image in the paper). This time has now been reduced to about 5 seconds, with real time rendering of the result.
Only 2 years ago, the original NeRF paper (https://arxiv.org/pdf/2003.08934.pdf) took at least 12 hours to train on the scene with the bulldozer (can be seen in the first image in the paper). This time has now been reduced to about 5 seconds, with real time rendering of the result.
Gigapixel image
By inputting any 2D RGB image into the gigapixel image application, the neural network starts training on it. By applying the graphics primitives and the improved hashing algorithms, the neural network starts to mutate the image by translating it into a sparse network. This results in at first a less clear image, which later will start to become more and more clear, more details being added to it as time passes by. At the end, one will be able to still zoom in on the resulting image and analyze the generated details. Finally, astoundingly, the image is more or less compressed, the neural network managing to store it in a few KBs.
Conclusions
The research group demonstrates “Instant Neural Graphics Primitives” (Instant-NGP), a framework that allows a neural network to learn representations of gigapixel images, 3D objects, and NeRFs in seconds.
The consequences of the new research may be huge. In technology, advancements often focus on speed or quality, but rarely are both achieved simultaneously in a way that also reduces the required computational overhead. Not only is the iterative, adaptive encoding method significantly faster, but it can also be performed on a single high-end GPU (NVIDIA RTX 3090) that can be purchased by anyone, rather than an expensive network of super-powerful computers.
The consequences of the new research may be huge. In technology, advancements often focus on speed or quality, but rarely are both achieved simultaneously in a way that also reduces the required computational overhead. Not only is the iterative, adaptive encoding method significantly faster, but it can also be performed on a single high-end GPU (NVIDIA RTX 3090) that can be purchased by anyone, rather than an expensive network of super-powerful computers.
Bibliography:
https://arxiv.org/pdf/2201.05989v1.pdf
https://github.com/NVlabs/instant-ngp
https://www.casualganpapers.com/fastest_nerf_3d_neural_rendering/Instant-Neural-Graphics-Primitives-explained.html
Niciun comentariu:
Trimiteți un comentariu