@codecraftersllc/local-voice-mcp v0.1.4
Local Voice MCP
Give your MCP clients the ability to speak by running local voice models using Chatterbox TTS.
Quickstart
The package includes a high-quality female reference voice that's used by default. All environment variables are optional.
{
"mcpServers": {
"local-voice-mcp": {
"command": "npx",
"args": ["-y", "@codecraftersllc/local-voice-mcp"],
"env": {
"USE_MALE_VOICE": "false",
"CHATTERBOX_EXAGGERATION": "0.5",
"CHATTERBOX_CFG_WEIGHT": "1.2",
"CHATTERBOX_MAX_CHARACTERS": "2000",
"CHATTERBOX_PLAYBACK_VOLUME": "75"
}
}
}
}Features
- MCP Server Implementation: Full Model Context Protocol server using
@modelcontextprotocol/sdk - HTTP API: ElevenLabs-compatible REST API for direct integration
- Text-to-Speech Synthesis: High-quality voice synthesis using Chatterbox TTS
- Voice Cloning: Support for reference audio for voice cloning
- Prosody Controls: Adjustable exaggeration and configuration weights
- Volume Control: Configurable audio playback volume with cross-platform support
- Robust File Management: Automatic cleanup of temporary audio files
- Security: Path validation and sanitization to prevent directory traversal
- Dual Mode Operation: Run as MCP server or HTTP server
Installation
From npm (Recommended)
npm install -g local-voice-mcpFrom Source
git clone <repository-url>
cd local-voice-mcp
npm install
npm run buildUsage
MCP Server Mode (Default)
Run as an MCP server with stdio transport:
local-voice-mcp-serverOr using npx:
npx local-voice-mcp-serverHTTP Server Mode
Run as an HTTP server:
MCP_MODE=http local-voice-mcp-serverOr set the port:
PORT=3000 MCP_MODE=http local-voice-mcp-serverDevelopment
# Run MCP server in development
npm run dev:mcp
# Run HTTP server in development
npm run dev:http
# Run tests
npm test
# Build project
npm run buildMCP Tools
When running in MCP mode, the following tools are available:
synthesize_text
Converts text to speech and returns audio data.
Parameters:
text(string, required): Text to synthesizereferenceAudio(string, optional): Path to reference audio for voice cloningexaggeration(number, optional): Voice style exaggeration (0-2, default: 0.2)cfg_weight(number, optional): Configuration weight (0-5, default: 1.0)
Returns:
- JSON response with synthesis status and file path
Example Response:
{
"success": true,
"message": "Speech synthesis completed successfully",
"audioFile": "/tmp/local-voice-mcp/audio_20240115_103000_abc123.wav",
"textLength": 25,
"audioFormat": "wav",
"options": {
"exaggeration": 0.2,
"cfg_weight": 1.0
},
"generatedAt": "2024-01-15T10:30:00.000Z"
}The audio file is saved to the temporary directory and can be played using any audio player or accessed programmatically.
play_audio
Play an audio file using the system's default audio player with optional volume control.
Parameters:
audioFile(string, required): Path to the audio file to playvolume(number, optional): Playback volume as percentage (0-100). If not specified, uses CHATTERBOX_PLAYBACK_VOLUME environment variable or default of 50.
Supported Formats:
- WAV files (.wav)
- MP3 files (.mp3)
Returns:
- JSON response with playback status and system information
Example Response:
{
"success": true,
"message": "Successfully played audio file: /tmp/local-voice-mcp/audio_123.wav",
"audioFile": "/tmp/local-voice-mcp/audio_123.wav",
"volume": 50,
"platform": "darwin",
"command": "afplay -v 0.5 /tmp/local-voice-mcp/audio_123.wav",
"timestamp": "2024-01-15T10:30:00.000Z"
}Platform Support:
- Cross-platform: Prefers
ffplay(from ffmpeg) for consistent volume control across all platforms - macOS: Falls back to
afplaycommand with-vvolume flag - Windows: Falls back to PowerShell with
MediaPlayerand volume control - Linux: Falls back to
mpg123(MP3) with gain control oraplay(WAV, no volume control)
tts_status
Returns the current status of the TTS service.
Parameters: None
Returns:
- JSON response with service status and capabilities
Example Response:
{
"success": true,
"status": "operational",
"message": "TTS service is ready and operational",
"timestamp": "2024-01-15T10:30:00.000Z",
"service": {
"name": "Chatterbox TTS",
"version": "0.1.0",
"capabilities": [
"text-to-speech synthesis",
"voice cloning with reference audio",
"prosody controls"
]
}
}MCP Resources
service-info
Provides information about the Local Voice MCP service.
URI: local-voice://service-info
HTTP API
When running in HTTP mode, the server exposes:
POST /tts
ElevenLabs-compatible text-to-speech endpoint.
Headers:
X-API-Key: API key (placeholder for authentication)Content-Type: application/json
Request Body:
{
"text": "Hello, world!",
"options": {
"referenceAudio": "path/to/reference.wav",
"exaggeration": 0.5,
"cfg_weight": 1.2
}
}Response:
- Content-Type:
audio/wav - Binary audio data
Configuration
Environment Variables
Server Configuration
PORT: HTTP server port (default: 59125)MCP_MODE: Operation mode - "mcp" or "http" (default: "mcp")
TTS Configuration
These environment variables can be used to set default values for TTS synthesis. They will be used if not overridden by options passed to the synthesize method:
CHATTERBOX_REFERENCE_AUDIO: Path to reference audio file for voice cloning (can be anywhere on your system, supports .wav, .mp3, .flac, .ogg, .m4a, .aac). If not specified, uses the bundled high-quality female reference voice.USE_MALE_VOICE: Use male voice instead of bundled female reference voice (true/false, default: false). When set to true, uses the default Chatterbox male voice instead of the bundled female voice. This only applies when no custom reference audio is specified.CHATTERBOX_EXAGGERATION: Voice style exaggeration level (float, default: 0.2)CHATTERBOX_CFG_WEIGHT: Configuration weight for TTS model (float, default: 1.0)CHATTERBOX_MAX_CHARACTERS: Maximum number of characters allowed for text input (integer, default: 2000)CHATTERBOX_OUTPUT_DIR: Output directory for generated audio files (default: system temp + "local-voice-mcp")CHATTERBOX_PLAYBACK_VOLUME: Default audio playback volume as percentage (integer, 0-100, default: 50)
Example:
# Set default TTS parameters via environment variables
# Reference audio can be anywhere on your system
export CHATTERBOX_REFERENCE_AUDIO="/Users/john/Music/my-voice.wav"
export CHATTERBOX_EXAGGERATION="0.5"
export CHATTERBOX_CFG_WEIGHT="1.2"
export CHATTERBOX_MAX_CHARACTERS="3000"
export CHATTERBOX_PLAYBACK_VOLUME="75"
# Run the MCP server with these defaults
local-voice-mcp-serverUsing with npx:
{
"mcpServers": {
"local-voice-mcp": {
"command": "npx",
"args": ["-y", "@codecraftersllc/local-voice-mcp"],
"env": {
"CHATTERBOX_REFERENCE_AUDIO": "/Users/john/Music/my-voice.wav",
"CHATTERBOX_EXAGGERATION": "0.5",
"CHATTERBOX_CFG_WEIGHT": "1.2",
"CHATTERBOX_MAX_CHARACTERS": "3000",
"CHATTERBOX_PLAYBACK_VOLUME": "75"
}
}
}
}Using male voice instead of bundled female voice:
{
"mcpServers": {
"local-voice-mcp": {
"command": "npx",
"args": ["-y", "@codecraftersllc/local-voice-mcp"],
"env": {
"USE_MALE_VOICE": "true",
"CHATTERBOX_EXAGGERATION": "0.3",
"CHATTERBOX_CFG_WEIGHT": "1.0"
}
}
}
}Priority Order:
- Options passed to the
synthesize_textorplay_audiotools (highest priority) - Environment variables
- Built-in defaults (lowest priority)
MCP Client Configuration
Add to your MCP client configuration:
{
"local-voice-mcp": {
"command": "npx",
"args": ["-y", "local-voice-mcp-server"],
"env": {}
}
}Testing with Cursor
Cursor is a popular AI-powered code editor that supports MCP. Here's how to test the Local Voice MCP server with Cursor:
1. Install the Package
First, install the package globally or ensure it's available:
npm install -g local-voice-mcp
# or
npm install local-voice-mcp2. Configure Cursor
Add the MCP server to your Cursor configuration file. The location depends on your operating system:
- macOS:
~/Library/Application Support/Cursor/User/globalStorage/cursor.mcp/config.json - Windows:
%APPDATA%\Cursor\User\globalStorage\cursor.mcp\config.json - Linux:
~/.config/Cursor/User/globalStorage/cursor.mcp/config.json
Add this configuration:
{
"mcpServers": {
"local-voice-mcp": {
"command": "local-voice-mcp-server",
"args": [],
"env": {}
}
}
}Or if using npx:
{
"mcpServers": {
"local-voice-mcp": {
"command": "npx",
"args": ["-y", "local-voice-mcp-server"],
"env": {}
}
}
}3. Restart Cursor
After adding the configuration, restart Cursor to load the MCP server.
4. Test the Integration
Once Cursor is restarted, you can test the TTS functionality:
- Open Cursor's AI chat
Ask Cursor to use the TTS tools:
Can you synthesize speech for "Hello, this is a test of the local voice MCP server"?Check TTS status:
What's the status of the TTS service?Test with options:
Synthesize "Welcome to the future of AI coding" with exaggeration set to 0.5Test audio playback:
Play the audio file that was just generatedTest volume control:
Play the audio file at 25% volume
5. Verify the Tools Are Available
You should see the following tools available in Cursor:
synthesize_text- For text-to-speech conversionplay_audio- For playing audio files through system audiotts_status- For checking service status
6. Troubleshooting
If the MCP server doesn't appear in Cursor:
- Check the logs: Look for error messages in Cursor's developer console
- Verify installation: Run
local-voice-mcp-serverdirectly in terminal to ensure it works - Check paths: Ensure the command path is correct in your configuration
- Restart Cursor: Sometimes a full restart is needed after configuration changes
- JSON parsing errors: If you see "Unexpected token" errors, ensure you're using the latest version with proper stdio logging
7. Expected Behavior
When working correctly:
- Cursor will be able to call the TTS tools
- You'll receive structured JSON responses with file paths
- Audio files will be saved to the temporary directory
- The TTS service will use the Chatterbox TTS engine
- Files can be played using system audio players
All responses are in structured JSON format with clear file paths, making it easy for MCP clients and AI agents to understand and work with the results.
Requirements
- Node.js 16+
- Python 3.8+
- PyTorch
- Chatterbox TTS
The service automatically sets up the Python environment and installs required dependencies on first run.
Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ MCP Client │ │ HTTP Client │ │ CLI Tool │
│ (Cursor, etc.) │ │ │ │ │
└─────────┬───────┘ └─────────┬────────┘ └─────────┬───────┘
│ │ │
│ stdio │ HTTP │ stdio
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ Local Voice MCP Server │
│ ┌─────────────────┐ ┌─────────────────────────────────┐ │
│ │ MCP Server │ │ HTTP Server │ │
│ │ (stdio) │ │ (Express.js) │ │
│ └─────────────────┘ └─────────────────────────────────┘ │
│ │ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ TTS Tools & Services │ │
│ │ ┌─────────────────┐ ┌─────────────────────────────┐ │ │
│ │ │ ChatterboxService│ │ File Management │ │ │
│ │ │ │ │ (Cleanup & Security) │ │ │
│ │ └─────────────────┘ └─────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────┐
│ Python TTS │
│ (Chatterbox) │
└─────────────────────┘License
MIT
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request