Small Language Models, Big Possibilities: The Future Of AI At The Edge

4 min read

The AI landscape is taking a dramatic turn, as small language and multimodal models are approaching the capabilities of larger, cloud-based systems.

This acceleration reflects a broader shift toward on-device intelligence. As the industry races toward AI that is local, fast, secure and power-efficient, the future is increasingly unfolding on the smallest, most resource-constrained devices at the very edge of the network.

From wearables and smart speakers to industrial sensors and in-vehicle systems, the demand is growing for language-capable AI that can operate independently of the cloud. As small language models (SLMs) continue to improve, they are poised to play a key role in making language AI more accessible across a wide range of embedded applications.

The New Edge Imperative

Device makers are pushing to reduce latency, strengthen privacy, lower operational costs and design more sustainable products. All of these point to a shift away from cloud-reliant AI toward local processing.

However, delivering meaningful AI performance in devices with tight power and memory budgets isn’t easy. Traditional approaches fall short, and hardware like the $95,000 “desktop supercomputer,” capable of running full large language models (LLMs) offline, while impressive, is cost- and energy-prohibitive for mass deployment.

By contrast, SLMs running on ultra-efficient processors offer a practical and sustainable path forward. Breakthroughs like Microsoft’s Phi, Google’s Gemini Nano and open models like Mistral and Metalama are closing the performance gap rapidly. Some models—like Google’s Gemma 3 and TinyLlama—are achieving remarkable results with only around one billion parameters, enabling summarization, translation and command interpretation directly on-device.

Optimizations such as pruning, quantization and distillation further shrink their size and energy draw. These models are already running on consumer-grade chipsets, proving that lean, localized intelligence is ready for prime time.

Bridging The Gap In Edge AI Deployment

As someone working closely with global chipmakers and system designers, I see this trend as a strategic inflection point. The industry is shifting toward AI that is leaner, faster and embedded where decisions happen—where milliseconds matter, and where compute resources are tightly bound.

As I attend events like Embedded World 2025, it has become clear that the appetite for intelligent edge solutions is growing faster than the infrastructure needed to support them. Device manufacturers want to bring AI to the edge—but face a fragmented ecosystem of silicon platforms, development tools and AI frameworks.

Recent research shows that edge AI adoption is rapidly growing across industries. The global edge AI in smart devices market is forecast to exceed $385 billion by 2034, according to Market.Us research.

The challenge is how to bridge the gap between today’s state-of-the-art models and tomorrow’s real-world deployment requirements. This means ensuring models not only fit into the tight power and memory budgets of edge devices—but that they can be deployed easily, updated efficiently and scaled cost-effectively.

Many device manufacturers are also struggling to bridge the “last mile” of inference: ensuring models not only run locally but can be maintained, updated and scaled cost-effectively.

Building Blocks For The Smart Edge

To solve these challenges, organizations across the tech ecosystem—from global chipmakers and tool vendors to consumer device manufacturers—are coalescing around a shared vision: The smarter future of AI lies at the edge.

This shift is fueled by increasing demands for real-time responsiveness, privacy-preserving data handling, lower latency and more sustainable compute alternatives—particularly in scenarios like wearables, automotive systems and industrial IoT.

Recent surveys show that a majority of enterprises are either deploying edge AI or planning to do so imminently, reflecting how on-device inference has shifted from experimental to strategic realms.

This momentum is supported by advancements across multiple fronts: edge-ready NPUs and accelerators embedded into devices, lightweight model formats like TensorFlow Lite and ONNX Runtime and hybrid cloud—edge architectures that offer flexibility and scale.

As AI capabilities become leaner and more optimized, the value of real-time, intelligent inference at the device level is accelerating not just across verticals like automotive, consumer electronics and industrial systems, but as a foundational requirement for the next generation of smart, energy-efficient connectivity and interaction.

The Real-World Challenges Of Deploying SLMs At The Edge

Despite the excitement, several hurdles still need to be addressed before SLMs at the edge can reach mainstream adoption:

• Model Compatibility And Scaling: Not all models can be easily pruned or quantized for edge deployment. Choosing the right architecture—and understanding trade-offs between size, latency and accuracy—is critical.

• Ecosystem Fragmentation: Many edge hardware platforms are siloed with proprietary software development kits (SDKs). This lack of standardization increases complexity for developers and slows adoption.

• Security And Update Infrastructure: Deploying and managing models on edge devices over time—e.g., via over-the-air (OTA) updates—requires robust, secure infrastructure.

Democratizing Intelligence—And Sustainability—One Device At A Time

Perhaps the most exciting outcome of the SLM revolution is that it levels the playing field. By removing the infrastructure barriers traditionally associated with AI, it allows startups, original equipment manufacturers (OEMs) and makers to embed meaningful intelligence in nearly any device.

With tens of billions of connected devices already in use—spanning everything from thermostats to factory robots—the opportunity is vast. And local inference is more than just responsive—it’s dramatically more energy efficient than cloud-based alternatives, supporting greener AI deployment strategies.

AI doesn’t need to be massive to be meaningful. Sometimes the most powerful intelligence is also the most efficient.

As SLMs continue to evolve and hardware support becomes more ubiquitous, the smart edge will move from possibility to default. In the process, we’ll unlock new classes of real-time, personalized and sustainable AI experiences—delivered not from distant data centers, but from the device in your hand, pocket or factory floor.

Published on Forbes Technology Council

Iri Trashanski

Iri Trashanski is Ceva’s Chief Strategy Officer, overseeing strategy, marketing, and corporate development. With over 20 years in the semiconductor industry, he has held leadership roles at GlobalFoundries, Hitachi Vantara, Samsung, Marvell, and SanDisk. He holds an MBA from Babson College and a BA from IDC, Israel.