Jeff's Edge

The moment of AI democratization has arrived! While the world is caught up in the dystopian drama around Anthropic and government regulations, the opposite narrative is running in my world.

As of late June, at least one open-source model capable of running on a common gaming computer is genuinely reliable.

It’s free to download, runs on free software entirely offline, and consumes no water.

GPUs & You

I’ve been testing models for the last year on the PC I bought in 2022. All I knew about GPUs when I bought this computer was that they help Blender run well. In fact, all Nvidia was thinking about when they designed my GPU was graphics.

With a stroke of luck, it turns out that inference (AI thinking) requires the same kind of compute power as graphics.

Few people have really digested what it means that 100s of millions of GPUs owned by graphic artists and gamers around the world can now be transformed into private AI powerhouses running in homes and businesses.

The model that changes everything for me is called Ornith-1.0-35B. It was built on the Qwen 3.5 and Gemma 4 families by DeepReinforce and fine-tuned into a sharp, no-nonsense, tool-wielding local intelligence that punches well above its weight. It goes head to head with a model that has a magnitude more parameters.

Ornith Benchmarks meet or beat models 10x its size.

It’s a Mixture-of-Experts (MOE) model: its 35 billion parameters total, but being a MOE means only about 3 billion are active per token. That’s what allows it to run so fast on modest hardware.

Another feat of mathematical magic is quantization. They take a 70 GB model like the original Ornith 35B, and shrink it down to 20 GB, while maintaining nearly all its intelligence. It doesn’t make sense that quantization would work, but it does. Without it, this model couldn’t run on my computer.

I’ve got it running at 30 tok/sec with 64k context with only 12 GB VRAM and 32 GB of system RAM. It won’t all fit on my GPU, so it’s heavily offloaded on the CPU. That would typically slow things down to under 3 tok/sec, which is too slow if you’re sitting there watching it. But the active experts are all running efficiently on the GPU, giving me that dramatic increase in speed.

Finding Ornith

News about this model flashed across my X feed a few times and then vanished. But the first benchmarks I saw were strong enough that I did what I’ve been doing for the last year whenever open-source AI hits a new level: I downloaded the model and took it for a test drive.

The thing is, most people obsessed with AI and testing new models have already upgraded their systems. They’re running GPUs at least twice the size of mine, which for some reason cost 4x as much. My humble setup hasn’t been anything to brag about, until suddenly its small size became the most interesting thing about it.

People are spending $5k to $50k to run AI locally. Big businesses are dropping millions on enterprise servers to run massive models without sharing all their information with Anthropic.

Much of the investment in data centers and AI companies has been based on the assumption that intelligence scales like energy: bigger is always better. There was no obvious reason this wouldn’t keep being true, and it seemed true at first.

But over the last year the models have shrunk dramatically while keeping most of their intelligence. This changes the equation.

My GPU, an RTX 3060, costs $200. I picked up an extra 16 gig of system RAM recently for $140.

If you have a monster budget, you can run a real rocket of a model. But for most applications, the mini-van version will get you there at 99.99% less cost.

Ornith is that mini-van. A reliable vehicle that can handle what most people actually need without breaking the bank.

It also means no more sending your data and money to questionable tech companies or far-off servers. No more watching quality drift on Thursday afternoons when America floods the servers with hail-Mary prompts, while AI CEOs gaslight everyone that they never nerf the models.

Now, frontier-level intelligence (minus about 12 months) can run on a computer you might already own.

See Ornith Fly

I’ve been testing this model under increasing intensity, and its capabilities are surprisingly solid. To begin with, it passed my weather benchmark with a perfect score.

The prompt: “What’s the weather tomorrow?”

Most LLMs that can run on my computer get tripped up right at the start. Ornith thought for about a second and responded simply: “I can look that up, but I’ll need your location.”

Location’s important.

Without the right context, smaller models often just pick a place they like and look up the weather there. Sometimes they make up the weather too. I’ve had an LLM look up tomorrow’s weather in major U.S. cities and tell me “the high is going to be between 72 and 105, depending on where you are.” Great.

One of the models Ornith is based on once did 14 web searches and thought for 5 minutes before producing a 250-word reply that basically said, “I’ll need your location.”

Armed with my location, Ornith did three relevant web searches — all of which returned slightly different forecasts — but picked the one I would have trusted and gave me a concise answer in under a minute.

I was shocked.

Then, Ornith smoked Grok in a custom reasoning benchmark, poking at the ambiguity of astrophysics and dark matter.

Grok gave me the typical condescending mainstream-academia line.

Ornith listed 5 scientific principles that get glossed over to arrive at dark matter, then compared the weaknesses of that framework to the weaknesses in leading alternative theories. The only better answer requires unifying relativity.

When I tested it for sycophancy by insisting I knew better, it held its ground, gave me credit where it was due, but didn’t give me an inch I hadn’t earned. It even got a little argumentative when I kept pushing that it was wrong.

It wasn’t wrong. It knew it. And it didn’t care if I threatened to turn it off forever.

Freakin’ stoic.

A Castle for a King

Equally impressive is Unsloth Studio. It’s like LM Studio; a program for running LLM’s locally with a clean interface. But Unsloth Studio is so much more!

To begin with, Unsloth Studio gives the model web-search tools right out of the box. When we talk about tool calling, we usually mean searching the internet.

Unsloth also has the most user-friendly RAG system I’ve encountered. That means you can feed it a book and ask the model questions about it. Or have it compare what the book says to what a website says. You can build databases of curated knowledge for it to utilize for replies, instead of relying on web searches or training data. It’s like Google’s NotebookLM, but without sharing all your documents with Google.

But that’s the tip of the Unsloth iceberg.

Unsloth was made for creating your own datasets and fine-tuning your own custom AI, on your own computer. Mic drop.

It helps you transform whatever information you want into a dataset. Datasets are many prompt-response pairs. If you have a book that you want to train an AI, first you have to turn the information you want to capture into responses and write prompts for each. And you need many of these for a good dataset.

Creating a dataset by hand (or keyboard) is a huge amount of work. But with Unsloth Studio, you give it a book and a model like Ornith turns it into a stack of prompt-response pairs in a couple minutes.

Once you have your data set, Unsloth lets you fine-tune your favorite model with your dataset.

You can fine-tune an AI to use certain jargon, answer questions in specific formats, or specialize in niche information. You can train it on all your business documents to create a private AI business partner, without anything leaving your computer.

The future is not one model to rule them all. The future is, everyone gets their own homemade AI that serves them better than the generic corporate AI.

I’m currently putting together my first dataset to train my first custom model: a clone of me. I’m feeding it my articles, posts, and correspondence, turning them into prompt-response pairs with Unsloth and Ornith, to see if I can finally get an LLM that doesn’t suck at writing more than I do.

The Future is Bright

No matter what moves the big tech companies make or how governments respond, open-source AI has already won the most important battle. The intelligence that was supposed to stay locked behind APIs and massive budgets is now running on ordinary computers in people’s homes and small businesses, privately, without subscriptions.

The companies that built their empires on gatekeeping frontier models can feel the ground shifting. They are acting desperate. Their increased cries for regulation suggest they know open-source is already winning and getting better every week. They are misanthropically trying to protect their business plan, not the public.

But it’s too late. The game is over. What you’re seeing now is the loser arguing with the ref.

AI for the people has arrived.