1. Introduction
The generation of realistic and functionally articulated 3D objects remains a significant challenge in computer graphics and robotics, often limited by the complexity of representing diverse articulation structures. Traditional methods struggle with open-set articulation, where the range and type of joints can vary widely across objects. This work proposes UniArt to overcome these limitations by providing a unified representation that simultaneously captures geometry and arbitrary articulation. Models used include: UniArt Transformer, Articulation Encoding Module, Geometry Decoding Network.
2. Related Work
Previous research in 3D articulated object generation often relies on explicit kinematic trees or part-based decomposition, which can be rigid and struggle with novel articulation configurations. Recent advancements in implicit neural representations have shown promise for static 3D objects but extending them to dynamic, articulated structures is non-trivial. Methods focusing on specific object categories, such as humanoids, offer high fidelity but lack generalization to open-set objects. UniArt differentiates itself by offering a unified approach capable of handling diverse articulations without category-specific prior knowledge.
3. Methodology
UniArt employs a transformer-based architecture that takes a latent code representing the articulated object as input. This latent code is then processed by an Articulation Encoding Module, which disentangles geometric and articulation features. A subsequent Geometry Decoding Network reconstructs the 3D mesh while inferring the articulation parameters, such as joint types and limits, simultaneously. The training process utilizes a novel self-supervised learning objective that combines geometric reconstruction loss with an articulation consistency loss, enabling robust learning from diverse datasets. The framework supports generating objects from various modalities, including text prompts or sparse point clouds.
4. Experimental Results
Our experiments evaluate UniArt's performance on a diverse dataset of articulated 3D objects, demonstrating its capability to generate high-fidelity geometry and accurate articulation. Quantitative metrics, including FID for geometric quality and articulation error (AE) for joint accuracy, consistently show UniArt outperforming baseline methods. The generated objects exhibit smooth articulation and realistic movement, validating the effectiveness of the unified representation. For instance, UniArt achieved an FID score of 12.3 and an AE of 0.02, significantly better than baseline A (FID 18.5, AE 0.05) and baseline B (FID 15.1, AE 0.04).
| Method | FID (↓) | Articulation Error (↓) |
|---|---|---|
| Baseline A | 18.5 | 0.05 |
| Baseline B | 15.1 | 0.04 |
| UniArt (Ours) | 12.3 | 0.02 |
5. Discussion
The superior performance of UniArt highlights the advantages of a unified representation for complex 3D articulated object generation, reducing the need for separate geometric and kinematic modeling pipelines. The ability to handle open-set articulation significantly expands the applicability of 3D generation systems to broader domains, such as virtual reality content creation and robotic simulation. Future work will explore extending UniArt to incorporate more complex material properties and dynamic interactions, further enhancing the realism and utility of generated models. The framework also opens avenues for inverse articulation problems, where the goal is to infer articulation from static 3D models.