The Ultra and Lightning versions of GigaChat‑3.1 are already outpacing Qwen3‑235B‑A22B and DeepSeek‑V3‑0324 on benchmark tasks, while the Lightning model with 1.8 billion parameters matches GPT‑4o performance. Both models are released on HuggingFace under an MIT license, eliminating any need for external API costs.
The new Mixture‑of‑Experts (MoE) architecture combined with native FP8‑DPO cuts memory consumption in half without sacrificing accuracy, making local deployment economically viable. Servers now require fewer GPU hours, and infrastructure expenses drop almost proportionally.
A case study from a call‑center and document‑processing workflow shows that swapping OpenAI models for GigaChat‑3.1 delivered a 12 % ROI increase over six months. Reduced licensing fees, faster request handling, and the removal of dependence on third‑party providers were the primary drivers of this result.
What does this mean for you? You gain a tangible competitive edge: lower AI licensing costs, the ability to keep the model locked within your own infrastructure, and freedom from recurring payments to external services.