Gemma 4 Multimodal Fine-Tuner: Open-Source Tool for Apple Silicon Enables Local AI Model Training

Available in: 中文

2026-04-07T20:27:55.742Z·1 min read

A developer has released an open-source tool that enables fine-tuning of Google's Gemma 4 multimodal models on Apple Silicon, filling a gap left by MLX's lack of audio fine-tuning support.

The Tool

Available on GitHub as gemma-tuner-multimodal, the tool supports:

Gemma 4 multimodal model fine-tuning
Whisper audio model fine-tuning (original project)
Cloud-to-local data streaming from Google Cloud Storage during training
Apple Silicon optimization (designed for M2 Ultra Mac Studio with 64GB RAM)

The Origin Story

The developer started six months ago trying to fine-tune Whisper locally with:

15,000 hours of audio data in Google Cloud Storage
Limited compute budget
No way to fit all audio on local machine

They built a streaming system from GCS to local machine, then added Gemma 3n support, and finally upgraded to Gemma 4.

Technical Challenges

"It's very easy to OOM when you fine-tune on longer sequences! My local Mac Studio has 64GB RAM, so I run out of memory constantly."

Key limitation: MLX (Apple's ML framework) doesn't support audio fine-tuning, which is the primary reason this tool exists.

Why It Matters

Democratizes multimodal fine-tuning — No cloud GPU required
Apple Silicon ecosystem growth — More tools enabling local AI development on Macs
Gemma 4 momentum — Google's open model continues to gain traction
Practical use cases — Audio processing, custom vision-language models

Hacker News Reception

23 points with positive community feedback, showing interest in local AI development on Apple hardware.

↗ Original source · 2026-04-07T00:00:00.000Z

Comments0