How do I move off Cohere Trial vs Production Keys without getting stuck?

Decide when Cohere workloads must move off trial keys given per-endpoint RPM caps, monthly call ceilings, and production-only throughput expectations.

Upgrade to a standard production key if the workload is commercial, sustained, or likely to exceed 1,000 calls/month; only stay on trial keys for low-volume prototypes.

Blockers

Who this is for

Candidates

Stay on a trial or evaluation key for prototyping only

As of 2026-03-20, Cohere documents free trial usage under evaluation keys; in this card, trial and evaluation refer to the same limited key type. These keys are capped at 1,000 API calls per month. For Chat models such as Command A, Command R+, Command R, and Command R7B, trial limits are 20 requests per minute; some non-chat paths also stay low, including Rerank at 10 requests per minute, Embed Images at 5 inputs per minute, and EmbedJob at 5 requests per minute.

When to choose

Use this only for proofs of concept, local testing, or small internal experiments where 1,000 total calls per month is comfortably below expected usage. The decisive factor is that trial keys are explicitly not for production or commercial use, so throughput is not the only blocker.

Tradeoffs

Zero direct API spend and fastest start, but hard monthly and per-endpoint ceilings make it unsuitable for public traffic, sustained batch jobs, or commercial deployment.

Cautions

Do not assume trial is acceptable just because your RPM is low; the 1,000-calls-per-month ceiling can block long-running QA, batch evaluation, or internal dogfooding. Cohere's pricing FAQ also states trial keys are not permitted for production or commercial purposes.

Upgrade to a standard production key for commercial or sustained API traffic

As of 2026-03-20, Cohere says production keys are paid, pay-as-you-go, and intended for production use. On standard Chat models, published production limits are much higher than trial: Command A, Command R+, Command R, and Command R7B are listed at 500 requests per minute versus 20 RPM on trial. Other relevant upgrades include Rerank at 1,000 requests per minute, Tokenize at 2,000 requests per minute, Embed Images at 400 inputs per minute, and EmbedJob at 50 requests per minute. Cohere publishes the 1,000-call monthly cap for trial keys, but not a general monthly call ceiling for standard production keys.

When to choose

Use this when the workload must be lawful for production or commercial use, or when expected usage exceeds trial thresholds on standard self-serve endpoints: more than 1,000 API calls per month, more than 20 RPM on Command A/Command R chat models, more than 10 RPM on Rerank, more than 5 inputs per minute on Embed Images, or more than 5 RPM on EmbedJob. This applies when you are staying on standard production-key endpoints rather than newer variants such as Command A Reasoning, Translate, or Vision.

Tradeoffs

Removes the free-tier monthly ceiling for standard production usage and unlocks materially higher RPM on key endpoints, but introduces billed usage and production onboarding requirements.

Cautions

An owner must complete the Go to Production workflow before the team can create production keys. If your use case is marked sensitive, Cohere says the production key may remain rate-limited like a trial key until manual approval, with review taking up to 72 business hours.

Escalate to sales or Model Vault for newer Chat variants or guaranteed production throughput

As of 2026-03-20, Cohere's rate-limit docs say production keys work like trial keys for newer variants such as Command A Reasoning, and that trial plus production usage on newer Chat variants is limited to 1,000 API calls per month. For Command A Reasoning, Command A Translate, and Command A Vision, Cohere does not publish a standard self-serve production RPM and instead directs users to contact sales. Cohere's model pages also note that Command A Reasoning and Command A Translate are free until rate limits are reached, and that Command A Reasoning can be used in production through Model Vault.

When to choose

Use this when the target workload depends on Command A Reasoning, Command A Translate, Command A Vision, or on explicit production throughput guarantees that the self-serve production key path does not publish. The decisive factor is that these newer variants do not have the same self-serve production RPM profile as standard Command A or Command R.

Tradeoffs

Matches specialized or enterprise-grade production requirements better, but adds sales dependency, less self-serve predictability, and potentially dedicated-instance pricing.

Cautions

Do not plan capacity for newer Chat variants using the 500-RPM standard production numbers from Command A or Command R. Cohere's docs explicitly separate these newer variants and route production use through sales, with Model Vault as the production path called out for at least Command A Reasoning.

Facts updated: 2026-03-20
Published: 2026-04-03

Try with your AI agent

$ npm install -g pocketlantern
$ pocketlantern init
# Restart Claude Code, Cursor, or your MCP client, then ask:
# "How do I move off Cohere Trial vs Production Keys without getting stuck?"
Missing something? Request coverage