VoiceCraft：在Wild环境中的零样本语音编辑与文本转语音

VoiceCraft作为一个领先的技术，使零样本语音编辑和文本转语音（TTS）在野外数据上成为可能，如有声书、互联网视频和播客等。这一技术的引入，无疑为语音交互提供了新的维度。

home_open_VoiceCraft_1

home_open_VoiceCraft_2

如何运行VoiceCraft

VoiceCraft提供了几种运行方式，用户可以根据需要选择适合的方式来进行语音编辑或TTS推理：

在Google Colab中通过Gradio UI运行更灵活的推理。
使用Docker进行操作。
不使用Docker，在本地运行Gradio。

开发和配置环境

为了使用VoiceCraft进行模型开发，如训练和微调，建议遵循环境设置和训练步骤。此外，你需要设置合适的开发环境，包括安装必要的依赖库和配置Python环境。

conda create -n voicecraft python=3.9.16
conda activate voicecraft

pip install -e git+https://github.com/facebookresearch/audiocraft.git@c5157b5bf14bf83449c17ea1eeb66c19fb4bc7f0#egg=audiocraft
pip install xformers==0.0.22
pip install torchaudio==2.0.2 torch==2.0.1 # this assumes your system is compatible with CUDA 11.7, otherwise checkout https://pytorch.org/get-started/previous-versions/#v201
apt-get install ffmpeg # if you don't already have ffmpeg installed apt-get install espeak-ng # backend for the phonemizer installed below pip install tensorboard==2.16.2 pip install phonemizer==3.2.1 pip install datasets==2.16.0 pip install torchmetrics==0.11.1 pip install huggingface_hub==0.22.2 # install MFA for getting forced-alignment, this could take a few minutes conda install -c conda-forge montreal-forced-aligner=2.2.17 openfst=1.8.2 kaldi=5.5.1068 # install MFA english dictionary and model mfa model download dictionary english_us_arpa mfa model download acoustic english_us_arpa # pip install huggingface_hub # conda install pocl # above gives an warning for installing pocl, not sure if really need this # to run ipynb conda install -n voicecraft ipykernel --no-deps --force-reinstall

conda create -n voicecraft python=3.9.16 conda activate voicecraft pip install -e git+https://github.com/facebookresearch/audiocraft.git@c5157b5bf14bf83449c17ea1eeb66c19fb4bc7f0#egg=audiocraft pip install xformers==0.0.22 pip install torchaudio==2.0.2 torch==2.0.1 # this assumes your system is compatible with CUDA 11.7, otherwise checkout https://pytorch.org/get-started/previous-versions/#v201 apt-get install ffmpeg # if you don’t already have ffmpeg installed apt-get install espeak-ng # backend for the phonemizer installed below pip install tensorboard==2.16.2 pip install phonemizer==3.2.1 pip install datasets==2.16.0 pip install torchmetrics==0.11.1 pip install huggingface_hub==0.22.2 # install MFA for getting forced-alignment, this could take a few minutes conda install -c conda-forge montreal-forced-aligner=2.2.17 openfst=1.8.2 kaldi=5.5.1068 # install MFA english dictionary and model mfa model download dictionary english_us_arpa mfa model download acoustic english_us_arpa # pip install huggingface_hub # conda install pocl # above gives an warning for installing pocl, not sure if really need this # to run ipynb conda install -n voicecraft ipykernel –no-deps –force-reinstall

如何使用VoiceCraft

运行VoiceCraft时，你可以选择需要的模型，加载模型后进行语音转录。此外，你还可以根据需要调整一些参数，以优化输出的语音效果。

conda activate voicecraft
export CUDA_VISIBLE_DEVICES=0
cd ./data
python phonemize_encodec_encode_hf.py \
--dataset_size xs \
--download_to path/to/store_huggingface_downloads \
--save_dir path/to/store_extracted_codes_and_phonemes \
--encodec_model_path path/to/encodec_model \
--mega_batch_size 120 \
--batch_size 32 \
--max_len 30000

conda activate voicecraft export CUDA_VISIBLE_DEVICES=0 cd ./data python phonemize_encodec_encode_hf.py \ –dataset_size xs \ –download_to path/to/store_huggingface_downloads \ –save_dir path/to/store_extracted_codes_and_phonemes \ –encodec_model_path path/to/encodec_model \ –mega_batch_size 120 \ –batch_size 32 \ –max_len 30000

从我个人的使用体验来看，VoiceCraft在操作上的灵活性让我印象深刻。通过简单的设置，我能够在不同的数据上实现精确的语音编辑和TTS，这在以往的技术中是难以想象的。尤其是其零样本语音编辑能力，为我处理多样化的语音数据提供了极大的便利。

温馨提示：

文章标题：VoiceCraft：在Wild环境中的零样本语音编辑与文本转语音

文章链接：https://yuntunft.cn/38406.html

更新时间：2024年07月05日

{{userData.name}}已认证

VoiceCraft：在Wild环境中的零样本语音编辑与文本转语音

如何运行VoiceCraft

开发和配置环境

如何使用VoiceCraft

AIGC研究中心

阿铎

AI工具

AI咨询

A联系我们

微信公众号最新资讯

微信公众号最新资讯

云图NFT导航站