Apple is refusing to spend hundreds of millions on its own NVIDIA H100 clusters and is instead embedding ready‑made neural networks—most notably Google’s Gemini—into its M‑series chips with unified memory. The CPU, GPU and Neural Engine share a single die, eliminating the need to copy terabytes of data between separate modules. Inference speeds up; training does not, but for businesses the speed of delivering functions on iOS and macOS matters more than raw H100 performance.

On a mid‑range Mac Studio, the model runs in fractions of a second, which is sufficient for classification, text generation and other enterprise tasks. Apple sees rapid ROI, saves $2–3 billion in capital expenditures, and can redirect funds to AI‑solution aggregator startups whose growth is projected at +12 % by 2027.

The drawback is clear: without its own training pipeline the company depends on external providers. If Google alters access terms or accelerates new feature development, Apple could fall behind. Nonetheless, this approach is currently more cost‑effective than building a proprietary GPU farm and developing models from scratch.

llm_releases