Develop Tools & CodePaper and LLMs

Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion

Objective and subjective evaluations show that \textit{Phoneme Hallucinator} outperforms existing VC methods for both intelligibility and speaker similarity.

Tags:

Pricing Type

  • Pricing Type: Free
  • Price Range Start($):

GitHub Link

The GitHub link is https://github.com/PhonemeHallucinator/Phoneme_Hallucinator

Introduce

The GitHub repository “Phoneme_Hallucinator” corresponds to the research paper “Phoneme Hallucinator One-shot Voice Conversion via Set Expansion,” currently under double-blind review. The repository contains audio samples and resources related to the paper.

For using the voice conversion (VC) pipeline, the “Phoneme Hallucinator DEMO.ipynb” notebook can be downloaded and run on Google Colab. To train the model, specific Python packages like Torch, TensorFlow, and others are required. The training data is prepared using WavLM to extract speech representations. Detailed instructions on dataset preparation and training initiation are provided in the repository. Training progress is saved in the “./exp/speech_XXL_cond/” directory.

Objective and subjective evaluations show that textit{Phoneme Hallucinator} outperforms existing VC methods for both intelligibility and speaker similarity.

Content

This is the repository of paper “Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion” under double-blind review. Some audio samples are provided here. Prepare environment. Require Python 3.6.3 and the following packages To prepare the training set, we need to use WavLM to extract speech representations. Go to kNN-VC repo and follow its instructions to extract speech representations. Namely, after placing LibriSpeech dataset in a correct location, run the command: python prematch_dataset.py –librispeech_path /path/to/librispeech/root –out_path /path/where/you/want/outputs/to/go –topk 4 –matching_layer 6 –synthesis_layer 6 Note that we don’t use the “–prematch” option, becuase we only need to extract representations, not to extract and then perform kNN regression. After the above step, you can get a –out_path folder with three subfolders train-clean-100, test-clean and dev-clean where each folder contains the speech representation files (“.pt”). Go to our repo ./dataset/speech.py and change the variables path_to_wavlm_feat and tfrecord_path accordingly. You need to change path_to_wavlm_feat to where the speech representations are stored in the previous step. If tfrecord_path doesn’t exist, our codes will create tfrecords and save them to tfrecord_path before training starts. Note that if you encounter numerical issues (“NaN, INF”) when the training starts, just try re-run the command multiple times. Training los will be saved to ./exp/speech_XXL_cond/.


Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion

Related