Recently, I was thinking again about where AI is going. I was wondering what I as a developer should, or should not, be building. I wrote a post about my thoughts and concluded two things:
Firstly, cloud-based AI gets much cheaper every year, namely about 90% cheaper. If your AI application costs $1 per day to run now, next year it'll cost 10 cents, and just a penny the year after that. And while the price drops, the models keep getting better.
Secondly, the best AI tools don't necessarily come from the big technology companies. Take Cursor AI for example. Microsoft had everything needed to make the best AI code editor - the most GPUs, the most popular code editor (Visual Studio), and years of AI experience. But a small startup built Cursor, which many developers now prefer. The same happened with DeepSeek. Google, Meta, Microsoft, and Amazon are all spending billions on developing the best models. But DeepSeek came out of nowhere and delivered great results. This isn't new. The same thing happened with Google in the early 2000s. Altavista was the biggest search engine until Google, a small newcomer, made something better.
This got me thinking about building AI tools. It's probably time to design new tools from the ground up instead of taking existing tools and putting AI on top. This means we are back at the exciting time of hackers/tinkerers in the 2000s, where young people would come up with the new ideas. Like Aaron Swartz who helped develop RSS, the technical achitecture for creative commons, and Reddit.
This got me thinking about how to run these models. I looked into running the state-of-the-art DeepSeek R1 model locally. The results weren't great. You need a $2000 EPYC server just to run the model, and it's very slow - about 2-3 tokens per second. That means waiting 10 to 20 minutes for one response. Also, you have to install many dependencies.
Running AI locally has clear benefits. You don't depend on anyone else, you're not locked into a provider, and you won't get surprise bills. But there's a simple choice to make:
Buy $2000 worth of hardware to run models locally, or
Use cloud services at $0.60 per million tokens (probably $0.06 next year).
Looking at these options, the cloud is multiple orders of magnitude cheaper and probably the way to go.
Now back to the question of what to build. I was thinking: What if I made a terminal tool that summarizes PDFs? You'd point it at a PDF, wait a few seconds while it processes, and get a summary. Then you could pipe that summary to another tool to ask questions or create images. Or what about a tool that can take a text and read it out loud so you can hear whether the text flows well?
Rust is a great language for building tools like this. It has great support for distributing applications via either cargo
or as a binary. It's fast, has a great package manager, and has good support for parallel processing. But, compared to Python, Rust has way fewer AI related packages for running models locally. Because of this, some people conclude that Rust is not learning yet. But Rust actually can learn if we use the cloud. All you need is good HTTP and JSON support, which Rust has.
However, I don't want to rely on just one cloud provider. What if the PDF tool stops working because the provider has an outage? But building both a PDF tool and handling multiple cloud providers seems like a lot of work.
That's why I created the transformrs
crate. It is a Rust crate that provides a unified interface that handles multiple AI cloud providers, so you don't have to. For example, this is how you can ask LLama 3.3 to respond with "hello world":
use transformrs::openai;
use transformrs::Message;
use transformrs::Provider;
#[tokio::main]
async fn main() {
let messages = vec![
Message {
role: "system".to_string(),
content: "You are a helpful assistant.".to_string(),
},
Message {
role: "user".to_string(),
content: "This is a test. Please respond with 'hello world'.".to_string(),
},
];
let keys = transformrs::load_keys(".env");
let key = keys.for_provider(&Provider::DeepInfra).unwrap();
let model = "meta-llama/Llama-3.3-70B-Instruct";
// Using the OpenAI-compatible API for chat completions.
let resp = openai::chat_completion(&key, model, &messages)
.await
.unwrap();
println!("{}", resp.choices[0].message.content);
}
(More examples are available at https://transformrs.org.)
I've tested this example hundreds of times while building the crate. It consistently returns "hello world" (or variations like "Hello world" or "hello world!"), which surprised me. It works much better than two years ago when I built a simple chat application using ChatGPT. Back then, responses were less predictable. Asking for "hello world" might get you something wordy like "Here is the answer to your request: 'Hello world!'". Plus, it cost a lot more at $60 per million tokens.
To back up my claims about reliability and low cost, I've set up automated tests in CI that run against the actual cloud APIs. This way, you can be sure the crate will work well with whichever provider you pick.
The core idea is simple: one consistent interface for all providers. When you use transformrs::chat_completion(...)
or other functions, they should just work, regardless of the provider you're using.
The crate currently supports:
Chat completions,
streaming chat completions,
text to image, and
text to speech.
And is tested against:
OpenAI,
DeepInfra, and
Hyperbolic.
Based on the popularity of the crate, I will add more functions and providers. If you like this crate, consider trying it out or starring the repo at https://github.com/rikhuijzer/transformrs.
Thanks for reading, and I'm excited to see what you'll build!