Rethinking AI power efficiency with Vishal Sarin from Sagence AI

At the Generative AI Summit Silicon Valley 2025, Vishal Sarin, Founder, President & CEO of Sagence AI, sat down with Tim Mitchell, Business Line Lead, Technology at the AI Accelerator Institute, to explore one of the most urgent challenges in generative AI: its staggering power demands. In this interview, Vishal shares insights from his talk on the economic imperative of breaking power efficiency barriers and rethinking the AI stack to make generative AI scalable and sustainable.

wistia-player[media-id=’6gg207tx82′]:not(:defined) { background: center / contain no-repeat url(‘https://fast.wistia.com/embed/medias/6gg207tx82/swatch’); display: block; filter: blur(5px); padding-top:56.25%; }

The power problem in Generative AITim Mitchell: Vishal, great to have you here. Can you start by summarizing what you spoke about at the summit?Vishal Sarin: Absolutely. While generative AI has opened up enormous opportunities, it’s facing a major bottleneck: power efficiency. The massive energy requirements of AI workloads threaten their economic viability. My talk focused on the urgent need to break through these power efficiency barriers; not through incremental tweaks, but by rethinking the entire AI stack, from chips to cooling. We need radical innovation to make generative AI truly scalable and sustainable.Opportunities across the AI stackTim: Where are the biggest opportunities to reduce power consumption across the AI stack?Vishal: There are improvements possible across the stack, such as power generation and distribution, and even at the network level. But the most foundational leverage point is in computation and memory. If you look at the power consumption of generative AI workloads, memory access and data movement dominate. So, optimizing compute and memory together, what we call in-memory compute, is a massive opportunity to cut both cost and energy use. It’s where you get the most ROI.The promise of in-memory computeTim: In-memory compute sounds like a major shift. Can you elaborate?Vishal: Definitely. Today’s architectures are built on the von Neumann model, which separates compute and memory. Data constantly moves between memory and processor – an extremely power-hungry process. In-memory compute integrates computation closer to the memory, significantly reducing data movement. This architectural change could improve power efficiency by orders of magnitude, especially for inference tasks in generative AI.The role of accelerators and SLMsTim: What about hardware accelerators like GPUs? Where do they fit into this picture?Vishal: GPUs and other accelerators are critical but also contribute significantly to energy usage, largely because of memory bandwidth. Even with very efficient compute units, moving data to and from memory becomes a bottleneck. To go beyond current limits, we need a paradigm shift that combines the strengths of accelerators with architectural innovations like in-memory compute. That’s where the next wave of performance and efficiency gains will come from.Tim: Does this mean small language models (SLMs) are also part of the solution?Vishal: Yes. SLMs are more lightweight and reduce the need for massive infrastructure to run inference. When designed and trained efficiently, they can offer comparable performance for many applications while drastically cutting down power and cost. They’re not a replacement for large models in every case, but they play an important role in making AI more accessible and sustainable.Cooling and infrastructure innovationTim: Beyond compute, how important are cooling and energy systems?Vishal: They’re essential. High-density AI infrastructure creates tremendous heat, and traditional cooling is no longer enough. Innovations like liquid cooling and better energy recovery systems are necessary to keep power usage in check and manage operational costs. These systems need to evolve alongside the compute stack for AI infrastructure to be economically viable at scale.A path toward sustainable scalingTim: Looking ahead, what gives you optimism about solving these challenges?Vishal: What’s exciting is that we’re at a turning point. We know where the problems are and have clear paths forward, from architectural innovation to smarter energy use. The industry is no longer just chasing performance; it’s aligning around efficiency and sustainability. That shift in mindset is what will drive generative AI toward long-term viability.Final thoughtsVishal Sarin’s vision for the future of generative AI is clear: breaking power efficiency barriers isn’t optional; it’s a necessity. By rethinking compute architectures, embracing small models, and innovating across the AI stack, the industry can achieve both performance and sustainability. As the field matures, these breakthroughs will determine whether generative AI can scale beyond hype to deliver enduring, transformative value.