OpenAI Batch API vs Standard Requests — when does one get cheaper?

Decide whether asynchronous jobs should shift to the Batch API now that, as of 2026-04-08, OpenAI still advertises a 50% discount versus standard request processing.

Shift eligible async workloads to the Batch API now, if a 24-hour window and no streaming are acceptable; keep standard requests only when latency or operational control matters.

Blockers

Who this is for

Candidates

Shift eligible asynchronous workloads to the Batch API

As of 2026-04-08, OpenAI's pricing page still states that the Batch API saves 50% on input and output costs relative to standard processing. The Batch API runs work asynchronously over a 24-hour window, and the Batch API FAQ says requests are processed within 24 hours. OpenAI also states batch rate limits are separate from existing limits, which can help isolate bulk jobs from online traffic. This is the clearest cost lever for non-interactive inference when waiting hours is acceptable.

When to choose

Use this for cost-sensitive, high-scale jobs such as offline enrichment, classification, summarization, or embeddings where same-request interactivity is unnecessary. The decisive factor is whether a 24-hour completion window and no streaming are acceptable in exchange for the 50% token discount.

Tradeoffs

Lowest verified token cost for eligible async work, plus separate batch rate limits. In return, you give up low-latency responses, streaming, and guaranteed completion of every item inside the SLA window.

Cautions

OpenAI says Batch API support is available across most models, but not all, so verify model support before migrating. If a batch expires or is manually cancelled, completed work is still returned and billed while unfinished work is cancelled. New async platform work should avoid depending on the Assistants API because OpenAI has already scheduled its removal for August 26, 2026 in favor of the Responses API and Conversations API.

Keep asynchronous jobs on standard requests via Responses API or Chat Completions

As of 2026-04-08, OpenAI's standard published model prices remain the baseline, and the pricing page still frames Batch as a separate 50% discount against those standard rates. Standard requests avoid the Batch API's 24-hour processing window and support normal online request patterns, which is simpler when jobs must finish promptly or need streaming-style behavior. The main post-March 31, 2026 pricing change visible on the pricing page is for Containers billing, not a removal of the Batch discount. This means standard requests remain the operationally simpler option, but not the cheapest for bulk async work.

When to choose

Use this when jobs are asynchronous in implementation but still operationally time-sensitive, user-triggered, or dependent on immediate retries and observability. The decisive factor is that latency, predictability, or feature compatibility matters more than the 50% Batch discount.

Tradeoffs

Simpler operational model and no 24-hour queue semantics, but you pay full standard token prices. You also miss the batch-specific rate-limit separation that can protect online traffic from bulk backlogs.

Cautions

Do not anchor new async integrations on the Assistants API: as of 2026-04-08, OpenAI has already announced its shutdown date as August 26, 2026, with Responses API and Conversations API as the replacement path. If you stay on standard requests, prefer building on the current Responses-era APIs rather than legacy Assistants flows.

Facts updated: 2026-04-08
Published: 2026-04-09

Try with your AI agent

$ npm install -g pocketlantern
$ pocketlantern init
# Restart Claude Code, Cursor, or your MCP client, then ask:
# "OpenAI Batch API vs Standard Requests — when does one get cheaper?"
Missing something? Request coverage