Why LinkedIn says prompting was a non-starter — and small models was the breakthrough Taryn Plumb January 21, 2026 CleoP made with Midjourney LinkedIn is a leader in AI recommender systems, having developed them over the last 15-plus years. But getting to a next-gen recommendation stack for the job-seekers of tomorrow required a whole new technique. The company had to look beyond off-the-shelf models to achieve next-level accuracy, latency, and efficiency. “There was just no way we were gonna be able to do that through prompting,” Erran Berger, VP of product engineering at LinkedIn, says in a new Beyond the Pilot podcast. “We didn't even try that for next-gen recommender systems because we realized it was a non-starter.” 0:03 / 14:09 Instead, his team set to develop a highly detailed product policy document to fine-tune an initially massive 7-billion-parameter model; that was then further distilled into additional teacher and student models optimized to hundreds of millions of parameters. The technique has created a repeatable cookbook now reused across LinkedIn’s AI products. “Adopting this eval process end to end will drive substantial quality improvement of the likes we probably haven't seen in years here at LinkedIn,” Berger says. Why multi-teacher distillation was a ‘breakthrough’ for LinkedIn Berger and his team set out to build an LLM that could interpret individual job queries, candidate profiles and job descriptions in real time, and in a way that mirrored LinkedIn’s product policy as accurately as possible. Working with the company's product management team, engineers eventually built out a 20-to-30-page document scoring job description and profile pairs “across many dimensions.” “We did many, many iterations on this,” Berger says. That product policy document was then paired with a “golden dataset” comprising thousands of pairs of queries and profiles; the team fed this into ChatGPT during data generation and experimentation, prompting the model over time to learn scoring pairs and eventually generate a much larger synthetic data set to train a 7-billion-parameter teacher model. However, Berger says, it's not enough to have an LLM running in production just on product policy. “At the end of the day, it's a recommender system, and we need to do some amount of click prediction and personalization.” So, his team used that initial product policy-focused teacher model to develop a second teacher model oriented toward click prediction. Using the two, they further distilled a 1.7 billion parameter model for training purposes. That eventual student model was run through “many, many training runs,” and was optimized “at every point” to minimize quality loss, Berger says. This multi-teacher distillation technique allowed the team to “achieve a lot of affinity” to the original product policy and “land” click prediction, he says. They were also able to “modularize and componentize” the training process for the student. Consider it in the context of a chat agent with two different teacher models: One is training the agent on accuracy in responses, the other on tone and how it should communicate. Those two things are very different, yet critical, objectives, Berger notes. “By now mixing them, you get better outcomes, but also iterate on them independently,” he says. “That was a breakthrough for us.” Changing how teams work together Berger says he can’t understate the importance of anchoring on a product policy and an iterative eval process. Getting a “really, really good product policy” requires translating product manager domain expertise into a unified document. Historically, Berger notes, the product management team was laser focused on strategy and user experience, leaving modeling iteration approaches to ML engineers. Now, though, the two teams work together to “dial in” and create an aligned teacher model. “How product managers work with machine learning engineers now is very different from anything we've done previously,” he says. “It’s now a blueprint for basically any AI products we do at LinkedIn.” Watch the full podcast to hear more about: How LinkedIn optimized every step of the R&D process to support velocity, leading to real results with days or hours rather than weeks; Why teams should develop pipelines for plugability and experimentation and try out different models to support flexibility; The continued importance of traditional engineering debugging. You can also listen and subscribe to Beyond the Pilot on Spotify, Apple or wherever you get your podcasts. Subscribe to get latest news! Deep insights for enterprise AI, data, and security leaders VB Daily AI Weekly AGI Weekly Security Weekly Data Infrastructure Weekly VB Events All of them By submitting your email, you agree to our Terms and Privacy Notice. Get updates You're in! Our latest news will be hitting your inbox soon. Credit: VentureBeat made with Midjourney Railway secures $100 million to challenge AWS with AI-native cloud infrastructure "As AI models get better at writing code, more and more people are asking the age-old question: where, and how, do I run my applications?" said Jake Cooper, Railway's 28-year-old founder and chief executive, in an exclusive interview with VentureBeat. "The last generation of cloud primitives were slow and outdated, and now with AI moving everything faster, teams simply can't keep up." Michael Nuñez January 22, 2026 Credit: VentureBeat made with Midjourney TrueFoundry launches TrueFailover to automatically reroute enterprise AI traffic during model outages When OpenAI went down in December, one of TrueFoundry’s customers faced a crisis that had nothing to do with chatbots or content generation. The company uses large language models to help refill prescriptions. Every second of downtime meant thousands of dollars in lost revenue — and patients who could not access their medications on time. Michael Nuñez January 21, 2026 Credit: VentureBeat made with Midjourney Claude Code costs up to $200 a month. Goose does the same thing for free. The artificial intelligence coding revolution comes with a catch: it's expensive. Michael Nuñez January 19, 2026 CleoP made with Midjourney Stop calling it 'The AI bubble': It's actually multiple bubbles, each with a different expiration date It’s the question on everyone’s minds and lips: Are we in an AI bubble? Val Bercovici, WEKA January 18, 2026 CleoP made with Midjourney Why reinforcement learning plateaus without representation depth (and other key takeaways from NeurIPS 2025) Every year, NeurIPS produces hundreds of impressive papers, and a handful that subtly reset how practitioners think about scaling, evaluation and system design. In 2025, the most consequential works weren't about a single breakthrough model. Instead, they challenged fundamental assumptions that academicians and corporations have quietly relied on: Bigger models mean better reasoning, RL creates new capabilities, attention is “solved” and generative models inevitably memorize. Maitreyi Chatterjee,Devansh Agarwal January 17, 2026 Image credit: VentureBeat with ChatGPT How Google’s 'internal RL' could unlock long-horizon AI agents Researchers at Google have developed a technique that makes it easier for AI models to learn complex reasoning tasks that usually cause LLMs to hallucinate or fall apart. Instead of training LLMs through next-token prediction, their technique, called internal reinforcement learning (internal RL), steers the model’s internal activations toward developing a high-level step-by-step solution for the input problem. Ben Dickson January 16, 2026 Shimon Ben-David, CTO, WEKA and Matt Marshall, Founder & CEO, VentureBeat Breaking through AI’s memory wall with token warehousing As agentic AI moves from experiments to real production workloads, a quiet but serious infrastructure problem is coming into focus: memory. Not compute. Not models. Memory. VB Staff January 15, 2026 CleoP made with Midjourney Why your LLM bill is exploding — and how semantic caching can cut it by 73% Our LLM API bill was growing 30% month-over-month. Traffic was increasing, but not that fast. When I analyzed our query logs, I found the real problem: Users ask the same questions in different ways. Sreenivasa Reddy Hulebeedu Reddy January 12, 2026 Partner Content How DoorDash scaled without a costly ERP overhaul Presented by NetSuite VB Staff January 12, 2026 Credit:Image generated by VentureBeat with FLUX-2-Pro Nvidia’s Vera Rubin is months away — Blackwell is getting faster right now Nvidia has been able to increase Blackwell GPU performance by up to 2.8x per GPU in a period of just three short months. Sean Michael Kerner January 9, 2026 CleoP made with Midjourney Why AI feels generic: Replit CEO on slop, toys, and the missing ingredient of taste Right now in the AI world, there are a lot of percolating ideas and experimentation. But as far as Replit CEO Amjad Masad is concerned, they're just "toys": unreliable, marginally effective, and generic. Taryn Plumb January 8, 2026 Image credit: VentureBeat with ChatGPT New ‘Test-Time Training’ method lets AI keep learning without exploding inference costs A new study from researchers at Stanford University and Nvidia proposes a way for AI models to keep learning after deployment — without increasing inference costs. For enterprise agents that have to digest long docs, tickets, and logs, this is a bid to get “long memory” without paying attention costs that grow with context length. Ben Dickson January 6, 2026 ============== LinkedIn отказалась от использования prompting и небольших моделей, сосредоточившись на разработке детальной политики продукта для точной настройки модели с 7 миллиардами параметров, а затем дальнейшей оптимизации для достижения высокой точности, низкой задержки и эффективности. Компания разработала и повторно использовала эту методику для улучшения качества AI-продуктов. Ключевым моментом стало совместное взаимодействие между командами разработки продукта и машинного обучения для создания эффективной системы оценки, что позволило добиться значительных улучшений.