Back to Opinion
·

How to Build a World-Class CLI

CLI
SDK
Terminal & Agents
expensicat
$expensicat login
Waiting for authorization...
Open:expensicat.com/device
Code:A7K-9MX
Logged in successfully!
$expensicat transaction list
DateNameAmountStatus
Apr 15Vercel Pro-$20.00posted
Apr 14Stripe Payout+$2,400posted
Apr 12Figma Team-$45.00posted
npx @expensicat/cli

A weird thing happened to the command line in the last eighteen months.

For twenty years, the CLI was treated as a legacy interface. It was kept alive for sysadmins and CI pipelines, but otherwise something modern products built a web app instead of. Startups shipped dashboards. Enterprises shipped desktop clients. CLIs got treated as a compliance checkbox: "yes, we have one."

Then AI agents started using them. Not because the dashboards didn't exist, but because CLIs turned out to be cheaper to reason about, faster to call, and easier to compose. A recent comparison of agent tool-calling patterns found agents using CLIs hit roughly 100% task completion with 10–32× fewer tokens than the same work done through MCP or other protocols. Agents are trained on millions of shell examples. Pipes and exit codes are their native tongue.

That shift changed what a CLI is for. In 2026, a command-line tool is simultaneously a developer UX and a machine-readable API surface. If you build it for one audience, you fail the other.

Most CLIs fail the other.

What actually separates good CLIs from bad ones

When we rebuilt the Expensicat CLI last month, we started by auditing what makes the CLIs people love (gh, stripe, fly, supabase) different from the ones they tolerate. The Command Line Interface Guidelines cover the fundamentals, but the guidelines were written in 2020, before agents became a first-class user. We added the things clig.dev doesn't yet say.

Ten practices. Any of them moves your bar up a notch. All ten together is the difference between "works" and "something people recommend to a friend."

Most of the rest of this post has code samples in it, but you don't need to write code to follow along. If you ship any product that has a CLI (or might need one), the decisions here are product decisions first and implementation details second.

1. Dual citizenship from the start

Your CLI has two users: a human at a terminal, and a non-human calling execSync from a script or an agent. They want different things.

Humans want colors, progress spinners, and a confirmation prompt before the scary thing. Agents want structured output, stable contracts, no prompts, predictable exit codes, and a way to discover your commands programmatically.

The move is detecting which one is running and adapting:

  • Auto-format your output. Tables in a terminal, JSON when piped. Let users override with --json or --table.
  • Ship a --describe command that dumps the full command tree as JSON. Name, description, flags, examples, enum values, everything. An agent can call it once and know your entire surface.
  • Only show interactive prompts when process.stdout.isTTY === true. Never prompt an agent.

If your CLI still calls inquirer.prompt() without a TTY check, an agent calling it will hang forever waiting for input that will never come. That's not a compatibility issue. It's a broken contract.

2. Help that leads with examples

Auto-generated help is a flag dump. It's a useful reference, but a terrible first impression. The clig.dev maintainers are explicit about this: lead with examples, not options.

$ expensicat invoice create --help

Usage: expensicat invoice create|new [options]

Create an invoice

Options:
  --customer-id <value>
  --template-id <value>
  --due-date <value>
  --items <value>
  ...

Examples:
  $ expensicat invoice create
  $ expensicat invoice create --customer-id cust_123 --items '[...]'

The Examples: block at the bottom is the single most-read line in our docs. It turns help output from "here are the levers" into "here's how people actually use this." Every command in our CLI gets at least one example. Seeding seventy-nine of them across the codebase took half a day of typing that we'd do again tomorrow.

3. Typo suggestions, always

$ expensicat invoic list
error: unknown command 'invoic'
(Did you mean 'invoice'?)

One line of code. In Commander.js it's called showSuggestionAfterError(), and every CLI framework has an equivalent. Turn it on. Every time someone fat-fingers a noun and gets a useful redirect instead of a dead-end error, your tool feels less frustrating.

Worth doing at two levels: unknown root command and unknown subcommand. The cost is nothing. The polish is disproportionate to the effort.

4. Errors that tell you what to do next

The worst pattern in CLIs:

error: Unauthorized

Fine. Now what?

The better pattern:

error [AUTH_EXPIRED] Session expired.
  → Run `expensicat login` to re-authenticate.

Three pieces, in order. A structured code (machine-readable, stable). A sentence about what happened (human-readable). A next-step line. For scripts, the same error rendered as --json includes {code, message, hint} so an agent can branch on code directly instead of parsing a message string.

Every error in our CLI ships a hint now. We backfilled six categories (auth, not-found, network, rate-limit, server, validation) and moved the remediation text out of the message field into hint. The convention: message describes what happened, hint tells you what to do next. Three evenings of work. A permanent reduction in support questions.

5. Destructive operations: confirm, require --yes, or abort

A CLI that lets you run expensicat invoice delete inv_123 without pausing is a foot-gun. The delete happens as fast as you hit enter, except when you hit enter on the wrong shell, wrong terminal, or wrong invoice ID.

Here's the rule we landed on:

ContextFlagResult
Interactive terminalno --yesPrompt "Delete invoice inv_123? (y/N)"
Interactive terminal--yesDelete
Non-interactive (pipe, CI, cron)no --yesAbort with a usage error asking for --yes
Non-interactive--yesDelete

The third row is the one that matters. A non-interactive shell can't prompt, so either you silently delete (and people lose data), or you require explicit consent. gh does the explicit-consent version. We copied it.

Applied to all ten of our delete and remove commands. Zero extra friction in the 99% case. Impossible to accidentally delete in the 1%.

6. Sensible defaults, explicit overrides

A CLI that shows you all the data by default is wrong. It should show you your data, with a flag to widen the scope.

Bad:

$ expensicat tracking list   # dumps time entries for the entire team

Good:

$ expensicat tracking list              # your entries
$ expensicat tracking list --user all   # everyone's
$ expensicat tracking list --user <id>  # someone specific

We built this as a reusable pattern: --user me/all/<uuid>. The API resolves me to the session's user id on the server side. The same shape works for task list --assignee me/all/<uuid>. The move that matters is making the default match the 90% case and making the widening explicit. Nobody has to remember which flag to always pass just to see their own stuff.

7. Progress for anything that takes more than a second

Silent CLIs feel broken. A fetch that takes five seconds with no output is indistinguishable from a hang. Add a spinner. Not everywhere, just where it counts:

  • Uploads and downloads
  • Long list pagination (--fetch-all)
  • Any network call that might exceed one second
  • Background work like auth refresh, if it's user-facing

And critically: suppress the spinner when stdout is being consumed by a script. A spinner rendering between JSON lines is worse than silence.

withSpinner('Uploading…', () => upload(file), { silent: opts.json })

One wrapper, TTY-aware, cleanly cancellable on Ctrl-C. Every slow operation gets it.

8. Update itself like Claude Code does

Claude Code updates itself in the background. You launch it, it checks npm once a day, and if there's a newer version it spawns a detached npm install -g that finishes while you're doing real work. Next launch, you're on the new version. Nobody ran a command. Nobody saw a banner that blocked anything.

This pattern is under-appreciated. CLIs that ship weekly updates but whose users sit on three-month-old versions are broken by default. You have to meet the update in the middle. Fetch automatically, install in the background, stay out of the way.

Three opt-outs, in order of specificity:

  • --no-auto-update for a single invocation
  • EXPENSICAT_DISABLE_AUTOUPDATER=1 for the current shell
  • auto_update = false in config for permanent

Plus an expensicat upgrade command for when someone wants to force a check right now. For direct-binary or brew installs, fall back to a "new version available" notice. Never try to rewrite a binary you didn't manage.

9. Batch input that agents can stream into

An agent that wants to create fifty customers should never spawn fifty processes. It should pipe:

cat customers.jsonl | expensicat customer create --batch - --concurrency 5

JSONL in, JSONL out. One object per line going in, one result per line coming out: {ok, data, error, input}. Validation is per item. Errors are tagged with line numbers. Partial failures get their own exit code (we use 9, named PARTIAL_FAILURE) so || branches in shell know the difference between "all failed" and "most worked."

This one feature is what turns a CLI from "an agent can shell out to it" into "an agent can actually build workflows with it." On our end it cost a 150-line runtime and a boolean on six commands' specs. The workflows it unlocks are worth the investment many times over.

10. Welcome first-run users, don't reject them

What happens the first time someone runs your CLI? For most CLIs, this:

error: Not authenticated.

Red text. Exit code 1. That's how we greet our new users.

Replace it with:

Welcome to Expensicat.

  Run `expensicat login` to get started.
  See `expensicat --help` for what's possible.

Exit 0. This isn't an error, it's an onboarding. The distinction matters. "User isn't logged in yet" is a different state from "user had a session that expired." Treat them differently. An auth-expired error should still be an error (exit 3, red, with the hint to re-auth). A brand-new user should be greeted.

One error code split (AUTH_NEW_USER vs AUTH_EXPIRED), one branch in the renderer. Measurably better first impression.

Why all of this matters more now

Each of those ten practices stands on its own for human users. The reason to invest in all of them now, not when you have more time or more users, is that every one of them also makes your CLI better for agents.

  • --describe lets an agent learn your whole surface without scraping help text.
  • Structured errors with codes let an agent branch reliably on "code": "NOT_FOUND" instead of regex-matching a message.
  • --json everywhere means no HTML parsing and no screen-scraping.
  • --yes on destructive operations gives agents explicit-consent ergonomics that an audit trail can rely on.
  • Sensible defaults mean agents don't need to pass six flags just to see their own data.
  • Batch input means agents don't spawn one process per record.
  • Exit codes that distinguish total failure from partial failure mean agents can retry intelligently.

A CLI designed to be conversational with humans is, almost by accident, the CLI design that agents work with best. The overlap between "good UX for a developer" and "good contract for an LLM" is closer to 95% than it looks from the outside.

And the inverse holds. A CLI that's sloppy for humans is useless for agents. If your error output is a wall of untagged text, an agent can't parse it. If your help is a flag dump with no examples, the agent has to re-derive your semantics from scratch. If you prompt for confirmation without a --yes flag, the agent just hangs.

What this looks like in practice

We took our own CLI through every one of these ten items over a couple of weeks. The git history runs from a bare-bones Commander app to something that behaves like gh or stripe: confirmation prompts with --yes, --describe for agents, spinners on uploads, auto-update via npm, batch JSONL input, typo suggestions, welcome banners, actionable error hints. Seventy-nine commands, each with examples, short aliases (ls, rm, get, new), and a user-scoped default.

None of it is flashy. No single line here is a feature anyone is going to tweet about. It's the cumulative weight of a hundred tiny design choices that make the difference between a CLI people tolerate and a CLI people (and their agents) actually rely on.

The tools exist. The patterns exist. The clig.dev guidelines exist. The only thing between most CLIs and that higher bar is the decision to care about every one of those small choices instead of shipping the first thing that runs.

If you're building a product in 2026, your CLI isn't a checkbox. It's a surface your agents will live on whether you designed for that or not. The question isn't whether to invest. It's how long you're willing to leave that surface rough.

If you want to see what this looks like in the flesh, expensicat ships on npm, with full docs and a machine-readable --describe schema that agents can call today.