Lemana Tech supports more than 40,000 employees, manages 112 stores and processes roughly 100,000 support tickets each month. Traditional machine‑learning models could no longer keep up with the load: simple boosting algorithms and rule‑based systems covered only a fraction of scenarios, leaving the rest to manual handling. The team led by Dmitry Terentyev solved the problem by deploying a large language model (LLM) enhanced with retrieval‑augmented generation (RAG), which pulls answers from corporate wikis and formats them in human‑like prose.
RAG replaces conventional database search with instant answer generation. First, the system extracts relevant embeddings from a cache; then it generates a concise response without routing the full query through a heavyweight language layer. As a result, average resolution time for a typical ticket fell from three to five minutes to ten to fifteen seconds, and computational load dropped to roughly one‑quarter of its previous level.
Lemana Tech’s hybrid approach retains classic ML models for simple triggers—such as automatically opening an incident based on an error code. Those models run faster and cheaper, while the RAG‑enabled LLM is invoked only for complex cases that require a "human" explanation. The team estimates annual savings of up to $200 K in licensing and maintenance costs, as a portion of requests are now fully automated without expensive cloud services.
Why this matters for you, CEO? Faster Service Desk response times instantly lift employee productivity and eliminate downtime. Lower compute expenses free budget for scaling other business initiatives. The hybrid architecture lets support scale rapidly with growing ticket volumes while keeping cost growth negligible.