The world of open source AI projects can be dizzying, with countless options vying for your attention. As developers ourselves, helping clients build AI software, we understand the challenge of sifting through myriad projects to find the gems – those with real-world applicability and commercial value.

Many of you have shared your struggles in navigating this landscape, unsure which projects are usable or how to choose between similar ones. That’s why we’ve committed to curating lists of the best open source AI projects, so you can make informed decisions quickly.

Text to Speech

We welcome your input! Please share any additions or corrections in the comments. And if you’d like to dive deeper, scan the QR code to join our AI discussion group (be sure to include your occupation).

Note: The order here does not imply ranking. Choose what suits your needs best.

1. Fish Speech

  • GitHub Stars: 5.1k
  • Last Updated: July 11
  • Function: Voice cloning
  • Features: Specially trained on Chinese, with good results. Struggles with long audio and English support.
  • Link: https://github.com/fishaudio/fish-speech

2. ChatTTS

  • GitHub Stars: 27.5k
  • Last Updated: July 8
  • Function: Multi-language, multi-speaker support. Fine-grained control over prosody like laughter, pauses, intonation.
  • Features: Controls laughter, pauses, interjections (with some inaccuracy). Chinese & English only. Adjustable parameters, but same settings may yield different output.
  • Link: https://github.com/2noise/ChatTTS

3. MARS5-TTS

  • GitHub Stars: 2.2k
  • Last Updated: July 5
  • Function: Voice cloning
  • Features: Low requirements for sample audio (2-12 seconds). Deep & shallow cloning options. More realistic emotions. Single-person cloning only, no dialogue.
  • Link: https://github.com/camb-ai/mars5-tts

4. GPT-SoVITS

  • GitHub Stars: 29k
  • Last Updated: July 11
  • Function: Voice cloning
  • Features: Fine-tunes model with just 1 min of training data for better similarity & realism. 5-sec sample for instant text-to-speech. Adapts to various languages & voice needs. Supports Chinese, English, Japanese. GPU-trained models on Mac significantly underperform other devices. Runs locally, no internet needed.
  • Link: https://github.com/RVC-Boss/GPT-SoVITS

5. IMS-Toucan

  • GitHub Stars: 1.3k
  • Last Updated: July 9
  • Function: Text-to-speech
  • Features: 7000 languages, including dialects. Pure Python & PyTorch. Human-machine editing to fine-tune speech to taste. Setup can be complex, especially on non-Linux.
  • Link: https://github.com/DigitalPhonetics/IMS-Toucan

6. OpenVoice

  • GitHub Stars: 27.2k
  • Last Updated: July 6
  • Function: Voice cloning
  • Features: Adjusts emotion, speaking style, pauses, etc. Cross-lingual voice cloning. Weak Chinese support. Allows commercial use.
  • Link: https://github.com/myshell-ai/OpenVoice

This is just a sampling of the many open source TTS projects out there. Please share your own finds in the comments or join our group chat. For those prioritizing quality and stability over localization, there are also excellent non-open source options like ElevenLabs and Heygen’s voice cloning service (which uses 11labs).

We’ll continue curating project lists for various AI functions. We’re also considering roundups of deployable projects or teardowns of successful ones. Let us know if you have suggestions!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *