The definitive shape of machine intelligence remains an open question. Will it manifest as a sprawling network of GPUs, a vast computational edifice humming within remote data centers? Or will it coalesce into a more intimate form, a localized intelligence residing within the very devices we hold close?
Local models offer an analogous advantage in the realm of cognition: sub-100ms latency on contemporary hardware transforms a mere exchange of prompts into a fluid conversation. The visceral reassurance of accessing consistent completion quality, irrespective of network connectivity, further underscores this point.
Models like DeepSeek's recent releases demonstrate near state-of-the-art performance at a fraction of the cost per token compared to dominant closed-source APIs. This economic inversion makes local deployment not just feasible, but potentially far more economical for many use cases.
Intelligence, however, transcends mere utility; it is a phenomenon of profound complexity, bearing a closer affinity to the capricious nature of fire than the predictable flow of water. Its essence lies in volatility and intimacy, thriving on the fertile ground of context and proximity, nurtured by a foundation of trust that defies facile outsourcing to the cloud.
When tokens cost no more than the joules that flip a transistor, conversation ceases to be metered. Secrets and half‑ideas flow, and software stops renting you space and starts giving you one.

Some human experiences are more suited to local environments. The innermost reflections confided within a diary, the intricacies of personal finance, the anxious inquiries posed to search engines in the solitude of 3:00 AM, the nascent narratives gestating within an unfinished novel, the unbridled missives drafted in moments of workplace frustration—these domains demand the sanctuary of locality. While cryptographic techniques offer a measure of data security, they cannot fully address the inherent vulnerabilities introduced by entrusting sensitive information to remote servers, nor mitigate the latency imposed by constant data transmission. Local vulnerabilities, though not negligible, afford users a degree of agency absent in the opaque security protocols of cloud providers.
Privacy can be experiential. Products built with local inference allow users to engage more openly, more naturally, fostering deeper trust and more genuine user interactions. Think of it as the difference between confessing your sins to a priest versus broadcasting them on TikTok. The ascendancy of Face ID over Touch ID, for instance, stems not from documented security advantages, but from the marginal reduction in latency during device authentication. The instantaneousness on contemporary hardware transforms a mere exchange of prompts into a fluid conversation.
No lag, no leakage, no nagging doubt about who else might be listening. It's the feeling of safety. The freedom to be wrong, raw, unfinished, imperfect. It is the prerequisite for authentic self-reflection. Just you and your machine. A closed loop of trust. This sense of safety, in turn, cultivates an environment conducive to authentic self-reflection, offering the freedom to explore nascent ideas, embrace imperfection, and venture into uncharted cognitive territories.
It is upon this bedrock of trust that the potential for genuine delegation, companionship, and synergistic partnership arises, enabling the emergence of entirely novel forms of agents and interactions. Imagine an AI, a constant companion across life's spectrum: capable of whispering personalized rituals to ease the burden of grief, negotiating on your behalf with a subtle understanding of your deepest values to achieve truly beneficial outcomes, or even transforming the patterns of your codebase into evocative musical compositions that unlock new approaches to software development—all while remaining a steadfast guardian of your privacy within the confines of your own device.
This is starting to become a reality. For example, new architectures for local computer use tasks, like the one General Agents is creating, are optimized for the unique demands of interacting with a mouse and a keyboard, and have been shown to outperform state-of-the-art models on these specific actions. This demonstrates that models running locally can not only be viable but superior for tasks that require a deep understanding of a user's local environment and digital behavior.
These agents will require deep, persistent access to your context, including files, APIs, local applications, and personal memory. They must take action, not just generate text. Such context has been generated mostly from our interactions with social media platforms, but what happens when there’s an alternative locally? Can that version of ourselves be even more true?
For agents to truly work, to be trusted with the keys to our digital actions, they need to live with you. Locally. Reliably. Privately. Not just above you in the cloud, operating under terms of service you didn't write and can't control.
The edge is getting smarter—Apple’s M-series, AMD’s AI PCs, NVIDIA’s consumer-side inferencing. The question is no longer if… but when local models will be the obvious choice for most personal tasks.
The models are also smaller and becoming surprisingly capable, as seen in Meta's Llama and Mistral's models, which run efficiently via tools like Ollama, allowing developers to prototype complex AI workflows entirely offline. They are also improving astonishingly fast.
The only thing missing is collective imagination—and perhaps, the collective will to demand something different. The bottleneck is shifting from raw FLOPS to innovative software architectures and user experiences designed for the edge. We're entering a world where raw power is abundant; the real challenge now is taste. It's about crafting elegant, human-scale experiences that feel like a natural extension of our minds, not just a firehose of optimized output.
We’ve reached a point where adding raw compute is no longer the interesting game. Progress now depends on imagination, on inventing uses that respect the scale of a single human life instead of echoing the geometry of data centers. A new generation, raised on sealed‑box apps, has rarely tasted the pleasure of owning its software—tuning a kernel, running a service at home. Yet you can feel the tide turning: curiosity about self‑hosted systems, about tools that bend to their users rather than the other way around, is beginning to surface again.
Self-hosting isn’t just fun, it can be identity-shaping. Running your own model means customization, autonomy, agency, a way to imprint your identity onto your tools. It's about knowing the contours and biases of the intelligence you rely on, and having the power to modify them. It's about building a digital home that reflects your values, not just your preferences.
And unlike email servers or water pipes, LLMs are soft systems. Malleable. You don’t need a backhoe to modify them. Just a GPU, a will, and a bit of “you can just do things” spirit. The ability to tinker, to personalize, to break and remake. These are acts of reclaiming agency in a world drifting towards opaque, immutable systems.
Of course, the cloud offers undeniable advantages: access to the largest, most powerful models, effortless updates, and capabilities that current local hardware simply can't match. Today, if you need the absolute state-of-the-art for complex reasoning across vast datasets, scientific simulation, or training foundational models, large-scale data analysis, cloud models remain the only practical option. Local models, even powerful ones like Llama 3 70B running on high-end consumer hardware, simply can't compete on raw horsepower or breadth of knowledge... yet. For tasks requiring sheer computational power or the absolute latest breakthroughs, centralized systems will likely remain dominant for the foreseeable future. But this doesn't negate the case for local, nor does it capture the whole picture. Think of it like this: the cloud is a supercomputer; your local machine is a weird artist. Both are valuable, but for different reasons.
Local isn't about replacing the cloud everywhere; it's about providing a meaningful alternative where it counts, particularly along the long tail of innovation. While cloud infrastructures optimize for efficiency through high batch sizes and economies of scale, this efficiency primarily benefits the most popular, standardized models. As you move down the long tail where creators experiment with specialized, niche, or experimental models, cloud efficiency advantages diminish significantly. Local inference creates a natural environment for this diverse ecosystem of experimentation, where individual creators can tinker with unique model variants without the economic pressures of cloud optimization. The economics of inference shift dramatically when batch sizes drop, making local deployment not just a philosophical choice, but potentially the more practical one for the vast creative frontier beyond mainstream model applications.
The open web didn’t scale like Facebook, but it still gave us Wikipedia (a collective hallucination we all somehow agree on), blogs (digital diaries shouted into the void), and countless strange and beautiful digital gardens (where code flowers in the dark). RSS didn’t beat the algorithm, but it nourished a generation of writers and independent thinkers. Linux didn’t become the desktop OS of the masses, but it powers every Android phone and every cloud server.
While integrated AI functions will increasingly run locally on our devices, becoming mainstream in their invisibility, running powerful, user-controlled local models might remain distinct, less about mass adoption, and more about depth. And it's precisely this depth that makes them the soil from which the next great ideas grow. The weird ideas. The inefficient ideas. The ideas too fragile or too personal for the glare of the cloud. Ideas that require incubation away from the immediate pressures of monetization and mass adoption.
A place for builders to play (fine-tuning a model on their writing). For philosophers to prototype new ways of thinking (embedding concepts directly into a reasoning engine). For artists to communicate with their machines without surveillance or limitations (creating truly private digital sketchbooks). For anyone to explore the frontiers of cognition in the privacy of their own digital home.
Centralization may be the default. The easy path. The path of least resistance—the one paved with good intentions and optimized for someone else's profit. The path that smooths away all the interesting edges. The path is optimized for economic efficiency above all else. It offers power and convenience, but often at the cost of intimacy and control.
But the best defaults are the ones we choose to override. Consciously. Deliberately. With an understanding of the trade-offs.
Local isn’t about beating the cloud; it’s about dreaming alongside it, but differently: carving out spaces where intelligence can be personal, sovereign, and maybe wild, where it can feel less like a utility relentlessly optimizing and more like a companion quietly understanding.
Ultimately, running your own model locally isn't merely about computing efficiency or privacy. It's about reclaiming the digital spaces where our most intimate thoughts dwell—making intelligence truly ours. It's about bringing the mind home.
Thank you Yondon Fu, Robert Miller, Jad Esber for the kind review. If you're building, tinkering, or simply dreaming of what local AI could become, I'd love to hear from you.
I’ve previously explored similar ideas in “People as Files” and “The Last Human Choice.“