AI Tools Directory by Application DomainAI Photo Image Animate

Zero123++: From Single Image to Multi-view with One Click

From a single image, generate multiple images of the object or scene from different angles.

Tags:

Pricing Type

  • Pricing Type: Open Source
  • Price Range Start($):

Introduce Zero123++

Zero123++: From a single image, generate multiple images of the object or scene from different angles.

For example, if you give it a picture of an apple viewed from the front, it can generate for you pictures of the apple viewed from the side, top, and bottom.

And the multi-angle pictures generated look very real and coordinated from all angles.

Users can also control various aspects of generated images in more detail, such as shape, size, etc.

working principle:

1. Conditions and training schemes: Zero123++ uses pre-trained 2D generative models (such as StableDiffusion) as the basis and is fine-tuned through a variety of conditions and training schemes to generate multi-view images.

2. Attention mechanism: It adds an additional conditional branch and modifies the key (K) and value (V) matrix of the self-attention layer to accept additional conditional images. Let the model focus more on important parts of the input image, so that the generated multi-angle images are more accurate.

3. Global conditions: In the original Stable Diffusion model, global conditions mainly come from text embeddings. Zero123++ introduces a trainable linear bootstrapping mechanism (from FlexDiffuse) to incorporate global image conditions into the model while minimizing fine-tuning. In addition to input images, it will also consider other global information (such as text descriptions) to generate images that more meet the requirements.

4. Depth ControlNet: This is a depth control network used to control the geometric structure during the generation process. It controls the geometry of Zero123++ by rendering a normalized linear depth image corresponding to the target RGB image and training ControlNet. For finer control over the shape and structure of generated images.

Reference

GitHub: https://github.com/SUDO-AI-3D/zero123plus
Paper: https://arxiv.org/abs/2310.15110
Demo: https://huggingface.co/spaces/sudo-ai/zero123plus-demo-space

Zero123++: From Single Image to Multi-view with One Click

zero123plus-demo-space

Related