When AIs Would Rather Use Your CLI Than curl

Table of Contents

TL;DR
#

A while back I wrote about catching myself building a CLI for an AI, not for me — og, the unofficial command line for the OpenGate IoT platform. That weekend experiment just reached v1.0. This is the road-to-1.0 follow-up, and it opens with a moment I didn’t design for but can’t stop thinking about:

Given the choice, my colleagues’ AI assistants stopped reaching for curl and started reaching for og. For everything.

What the journey taught me, in four lines:

A good CLI is already an API for agents. Typed verbs, discoverable subcommands, structured output. An LLM doesn’t reconstruct auth headers and endpoint paths — it expresses intent.
Build for two users at once. Every operation is a CLI command and an MCP tool, from the same core. Humans and agents drive the same levers.
The v1.0 capstone was multi-tenant auth for the MCP server — per-request credentials in the transport, never through the model. The line between “neat laptop demo” and “a real multi-user product can stand on this.”
My most demanding tester was an AI. It used the tool, and it did not forgive a single rough edge. That made it better than any human QA would have.

Repo (Go, public): https://github.com/carlosprados/og-cli.

The moment it clicked
#

I made the full case in May: I’d set out to build a normal CLI for OpenGate — the industrial IoT platform we build at Amplía — and kept catching myself optimizing for a reader that never tires of verbosity, never memorizes anything, and can be handed the manual right before it acts. I was building for an LLM.

What I didn’t anticipate was how fast that would show up in the wild. As og matured, I shared builds with colleagues. Several of them work with AI assistants in their editors. And those assistants, asked to do real work against the platform, started doing something quietly remarkable: they refused to go back to curl.

Asked to “check which devices are in alarm,” an assistant didn’t hand-build a request with the right Authorization header, the right /north/v80/... path, and a bespoke search filter. It called devices_search. Asked to push a config change, it didn’t paste JSON into a half-remembered flag — it called the tool, with typed arguments, and got typed results back.

Once you watch that a few times, the question inverts. It’s no longer “how do I let AIs use my API?” It’s “why would an AI ever choose raw curl when a typed, self-describing tool is right there?” It wouldn’t. And that reframed what v1.0 needed to be.

The short version of why (the long version is the last post)
#

I won’t re-litigate the whole thesis here — the previous post is the deep dive. The one-paragraph recap, because it’s load-bearing for everything below:

curl is a blank page. You supply the method, the auth, the headers, the body, the exact path, the encoding — every call a fresh chance to get one of them wrong. A well-made CLI is the opposite: intent, not plumbing. Discoverable (og devices --help), typed and validated, structured on demand (--output json), and stateful in the boring ways (auth and host handled once). An LLM is precisely the consumer that benefits most, because it reconstructs intent on every turn — and models are very good at intent and very bad at remembering your URL scheme. The punchline that drove the whole project: the properties that make a CLI pleasant for a human are the same ones that make it reliable for an agent.

Designing for two users at once
#

If a CLI is already half an agent interface, the honest move is to finish the job. So og is three front doors over one body:

        og <command>                 (humans, terminal)
              │
   pkg/opengate  ──  one client, one set of operations
              │
        og mcp        (agents, Model Context Protocol)
              │
        og            (humans, interactive TUI)

CLI for the terminal, an interactive TUI for browsing, and an MCP server exposing the same operations as ~90 tools an LLM can call — over stdio for a local editor, or HTTP for a remote service. All thin layers over one Go client. The rule I held to: a capability doesn’t exist until a human can run it and an agent can call it. Humans and agents become interchangeable operators — a person can take over what an agent started, because they’re pulling the same levers.

The surfaces are deliberately tiered, not identical, and knowing what to leave out of each mattered as much as what to put in. The CLI is the complete surface. The MCP tools cover what an agent should drive remotely — and long-running log streams become bounded calls (“give me N lines or stop after T seconds”), because an agent shouldn’t hold a socket open forever. The TUI is a browser with a few high-value actions, not a data-entry form you fill with arrow keys.

The capstone: an MCP server a multi-user product can trust
#

For most of its life, the MCP server had a limitation that didn’t bother me and would have quietly killed it in production: it authenticated once, at startup, with my profile. Perfect for “me, my laptop, my editor.” Useless for a chatbot serving many users, where every request belongs to a different person with different permissions.

The v1.0 headliner fixed exactly that, around a constraint worth stating plainly — because it’s where a lot of “AI + API” integrations quietly go wrong:

Credentials must never travel through the model.

Pass a token as a tool argument and it flows through the LLM’s context — logged, cached, possibly echoed back. Non-starter. So in multi-tenant mode og mcp becomes a stateless conduit: the caller’s credentials ride in the HTTP transport headers, the model only ever sees tool names and business arguments, and the server builds a fresh, correctly-scoped client per request. No fallback to a startup identity, no cross-user leakage; a tool that needs a credential the caller didn’t present simply refuses.

The elegant part is how little the agent-facing surface had to change — the tools an LLM sees are identical. The identity plumbing moved entirely into the transport, where it belongs. The agent asks for what it wants; the infrastructure decides who it’s allowed to be. That separation is, I think, the right shape for almost any agent-facing service, and it’s what turned og’s MCP server from a personal toy into something a product could sit on.

My most demanding user was an AI
#

Here’s the part I find genuinely funny, and a little humbling.

I built a lot of og in tandem with a coding agent — pair programming, except one of us never tires of writing table-formatting code. But the real value wasn’t the typing. It was that the agent then became a first-class user of the tool, and it was a brutal tester.

A human developer forgives a clunky tool. We learn the quirks, work around them, stop noticing the friction after a week. An agent does not forgive. If output is ambiguous, it picks wrong. If an error is vague, it loops. If two commands do almost-the-same-thing, it second-guesses which to call. Every soft spot surfaces immediately, because the agent has no muscle memory to paper over it.

So the loop became: build a capability, then watch an AI try to accomplish a real task with it. Where it stumbled was the backlog. Output that wasn’t structured enough got structured; tools that were too vague got sharper descriptions; the dogfooding caught real rough edges in og before any unlucky user could — because my tester was relentless and had zero patience for “you just have to know.”

That loop is quietly the strongest argument for AI-native tooling I know: if you make a tool an agent can use well, you’ve almost certainly made a tool a human will love — because you were forced to delete every hidden assumption and undocumented step along the way.

The road to v1.0, briefly
#

Between that weekend experiment and v1.0, og grew up: automation rules and connector functions whose JavaScript you edit locally and deploy back, bulk device provisioning, dashboards and workspaces, and a South-plane IoT client that publishes telemetry and can even act as a virtual device — answering platform operations over MQTT as if it were real hardware. Eighteen command groups, ~90 MCP tools, three surfaces, one core.

Then, right before tagging v1.0, I did the least glamorous and most satisfying pass of all: I took things out. Dead code, orphaned helpers, a screen nothing reached, constants no one used. Saint-Exupéry gets quoted to death, but he earns it:

Perfection is achieved, not when there is nothing more to add, but when there is nothing more to take away.

A tool that humans and agents both depend on has to be legible. Every dead branch is a place an agent can wander into and a human has to explain away. Shipping 1.0 was as much about what I deleted as what I built.

What I’d tell you to steal
#

If you maintain an API, a platform, or a gnarly internal workflow and you’re wondering how to make it “AI-ready,” here’s the whole post compressed:

Build the CLI first. A clean, typed, self-describing CLI is 80% of an agent interface for free. If your CLI is pleasant, your MCP server almost writes itself.
Expose the same operations to agents over MCP, from the same core. Don’t fork a separate “AI integration.” One body, multiple front doors.
Keep credentials out of the model. Identity belongs in the transport, scoped per request. The agent expresses intent; the infrastructure decides permission.
Let an agent be your QA. It won’t forgive what humans tolerate. That’s a feature.
Then take things away. Legibility is a requirement, not a nicety, when your users can’t ask you what you meant.

og is public and small enough to read in an afternoon: https://github.com/carlosprados/og-cli. It’s an unofficial, personal project, but the patterns are the ones I’d reach for anywhere I want humans and agents operating the same system, side by side — neither one reaching for curl.

Built, dogfooded, and pruned in close collaboration with an AI coding agent — which is, after all, the whole point.

TL;DR#

The moment it clicked#

The short version of why (the long version is the last post)#

Designing for two users at once#

The capstone: an MCP server a multi-user product can trust#

My most demanding user was an AI#

The road to v1.0, briefly#

What I’d tell you to steal#