Phoneme Hallucinator: One-shot Voice Conversion Via Set Expansion

Pricing Type

Pricing Type: Free
Price Range Start($):

GitHub Link

The GitHub link is https://github.com/PhonemeHallucinator/Phoneme_Hallucinator

Introduce

The GitHub repository “Phoneme_Hallucinator” corresponds to the research paper “Phoneme Hallucinator One-shot Voice Conversion via Set Expansion,” currently under double-blind review. The repository contains audio samples and resources related to the paper.

For using the voice conversion (VC) pipeline, the “Phoneme Hallucinator DEMO.ipynb” notebook can be downloaded and run on Google Colab. To train the model, specific Python packages like Torch, TensorFlow, and others are required. The training data is prepared using WavLM to extract speech representations. Detailed instructions on dataset preparation and training initiation are provided in the repository. Training progress is saved in the “./exp/speech_XXL_cond/” directory.

Objective and subjective evaluations show that textit{Phoneme Hallucinator} outperforms existing VC methods for both intelligibility and speaker similarity.

Content

This is the repository of paper “Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion” under double-blind review. Some audio samples are provided here. Prepare environment. Require Python 3.6.3 and the following packages To prepare the training set, we need to use WavLM to extract speech representations. Go to kNN-VC repo and follow its instructions to extract speech representations. Namely, after placing LibriSpeech dataset in a correct location, run the command: python prematch_dataset.py –librispeech_path /path/to/librispeech/root –out_path /path/where/you/want/outputs/to/go –topk 4 –matching_layer 6 –synthesis_layer 6 Note that we don’t use the “–prematch” option, becuase we only need to extract representations, not to extract and then perform kNN regression. After the above step, you can get a –out_path folder with three subfolders train-clean-100, test-clean and dev-clean where each folder contains the speech representation files (“.pt”). Go to our repo ./dataset/speech.py and change the variables path_to_wavlm_feat and tfrecord_path accordingly. You need to change path_to_wavlm_feat to where the speech representations are stored in the previous step. If tfrecord_path doesn’t exist, our codes will create tfrecords and save them to tfrecord_path before training starts. Note that if you encounter numerical issues (“NaN, INF”) when the training starts, just try re-run the command multiple times. Training los will be saved to ./exp/speech_XXL_cond/.

MACO: A Modality Adversarial and Contrastive Framework for Modality-missing Multi-modal Knowledge Graph Completion

Nevertheless, existing methods emphasize the design of elegant KGC models to facilitate modality interaction, neglecting the real-life problem of missing modalities in KGs.

Metaphor – LLMs Search Engine

Language model powered search

NewsDialogues: Towards Proactive News Grounded Conversation

In this paper, we propose a novel task, Proactive News Grounded Conversation, in which a dialogue system can proactively lead the conversation based on some key topics of the news.

Generalizing Event-Based Motion Deblurring in Real-World Scenarios

Event-based motion deblurring has shown promising results by exploiting low-latency events.

Large-kernel Attention for Efficient and Robust Brain Lesion Segmentation

Vision transformers are effective deep learning models for vision tasks, including medical image segmentation.

Reinforcement Graph Clustering with Unknown Cluster Number

To enable the deep graph clustering algorithms to work without the guidance of the predefined cluster number, we propose a new deep graph clustering method termed Reinforcement Graph Clustering (RGC).

No comments yet, please leave the first one!

No comments...

Hot AI Books

The ChatGPT Millionaire: Making Money Online has never been this EASY

This is the simplest guide on how to make money quickly and easily with ChatGPT (Updated for GPT-4)

The GPT-4 Millionaire: Future of Business Featuring Microsoft 365 Copilot: How to Leverage AI Language Models to Grow Your Company and How AI-driven Language Models Will Revolutionize the Way We Work

The GPT-4 MILLIONAIRE: FUTURE OF BUSINESS Featuring Microsoft 365 Copilot: How to Leverage AI Language Models to Grow Your Company and How AI-driven Language Models Will Revolutionize the Way We Work. Discover the transformative power of GPT-4, a state-of-the-art AI-driven language model, and its integration with Microsoft 365 Copilot

CHATGPT MONEY EXPLOSION UNCOVER THE SECRET AI WEAPON TO SKYROCKET YOUR INCOME: THE ULTIMATE GUIDE TO UNLEASHING THE FULL POTENTIAL OF CHATGPT FOR MASSIVE PROFITS

Revolutionize Your Income Streams with the Ultimate ChatGPT Guide Transform Your Business with AI-Powered Strategies and Unstoppable Profits

The ChatGPT-4 Billionaire: Making Bundles Of Money Online Was Not That Much Easy

In today's world, businesses are spending substantial amounts on content creation, social media marketing, and SEO. With ChatGPT, even if you lack experience, you can excel in these areas. Many businesses are not leveraging ChatGPT yet, creating an opportunity for you to offer similar services at a lower cost with minimal effort. I'll provide you with step-by-step instructions that you can easily replicate. While the market may become saturated in the future, now is the ideal time to get started!

The ChatGPT Millionaire: Easy Way to Make Money Online Using ChatGPT Effectively

This guide is the ultimate resource for making fast and easy money with ChatGPT, now updated for GPT-4.

The ChatGPT Millionaire Guide: How To Earn Money Online & Become A Millionaire Using ChatGPT Making Money Online has never been this EASY