Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation logo
AI Tool Profile

Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation

In the second stage, an audio-driven talking head generation method is employed to produce compelling videos privided the audio generated in the first stage.

Website
github.com
Pricing model
Free
Price start
Contact for pricing

Description of Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation

GitHub Link

The GitHub link is https://github.com/zhichaowang970201/text-to-video

Introduce

This GitHub repository, titled "Text-to-Video," presents a two-stage framework for generating talking-head videos without requiring the person's identity. It includes various components like Text-to-Speech models (Tacotron, VITS, YourTTS, Tortoise), Audio-driven Talking Head Generation methods (Audio2Head, StyleHEAT, SadTalker), and VideoRetalking. The repository provides links to the code and assets for these models, facilitating research and development in zero-shot identity-agnostic talking-head generation. In the second stage, an audio-driven talking head generation method is employed to produce compelling videos privided the audio generated in the first stage.

Content

KDD workshop: Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation

Alternatives & Similar Tools

LongLLaMA-handle very long text contexts, up to 256,000 tokens logo

LongLLaMA is a large language model designed to handle very long text contexts, up to 256,000 tokens. It's based on OpenLLaMA and uses a technique called Focused Transformer (FoT) for training. The repository provides a smaller 3B version of LongLLaMA for free use. It can also be used as a replacement for LLaMA models with shorter contexts.

Compare Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation

Quick compare routes for nearby alternatives.

All compare routes →