By Neko Team (Denisa Gal, Ioana Lazar)
Initially, the shape and appearance of the object is represented in a time-invariant
rest pose, which is then modelled to time-dependent deformations.
To achieve this, each 3D point in space is associated with three properties: colour,
density, and a canonical embedding - an encoding which maps a point to a feature
descriptor kept by the network:
This is what helps to match pixels from different viewpoints and at different time
The method’s optimisation comes from minimising three types of losses:
reconstruction losses (colour, silhouette, optical flow), feature registration
losses (enforces 3D point prediction via canonical embeddings), and a
3D cycle-consistency regularisation loss.
BANMo manages to tackle two major challenges: the high volume of input data,
and handling the free movement of the subject and camera without any
assumptions. Moreover, it is capable of improving the reconstruction given
more data.
Given these, it has proven itself better than previous methods, both through
enhanced detailed geometry (which ViSER lacks) and better reconstruction
of motion (which Nerfies fails to render):
Bibliography:
https://paperswithcode.com/paper/banmo-building-animatable-3d-neural-model
https://arxiv.org/pdf/2112.12761v2.pdf
Niciun comentariu:
Trimiteți un comentariu