Gemma 4 Multimodal Fine-Tuner: Open-Source Tool for Apple Silicon Enables Local AI Model Training
A developer has released an open-source tool that enables fine-tuning of Google's Gemma 4 multimodal models on Apple Silicon, filling a gap left by MLX's lack of audio fine-tuning support.
The Tool
Available on GitHub as gemma-tuner-multimodal, the tool supports:
- Gemma 4 multimodal model fine-tuning
- Whisper audio model fine-tuning (original project)
- Cloud-to-local data streaming from Google Cloud Storage during training
- Apple Silicon optimization (designed for M2 Ultra Mac Studio with 64GB RAM)
The Origin Story
The developer started six months ago trying to fine-tune Whisper locally with:
- 15,000 hours of audio data in Google Cloud Storage
- Limited compute budget
- No way to fit all audio on local machine
They built a streaming system from GCS to local machine, then added Gemma 3n support, and finally upgraded to Gemma 4.
Technical Challenges
"It's very easy to OOM when you fine-tune on longer sequences! My local Mac Studio has 64GB RAM, so I run out of memory constantly."
Key limitation: MLX (Apple's ML framework) doesn't support audio fine-tuning, which is the primary reason this tool exists.
Why It Matters
- Democratizes multimodal fine-tuning — No cloud GPU required
- Apple Silicon ecosystem growth — More tools enabling local AI development on Macs
- Gemma 4 momentum — Google's open model continues to gain traction
- Practical use cases — Audio processing, custom vision-language models
Hacker News Reception
23 points with positive community feedback, showing interest in local AI development on Apple hardware.