OpenAI Batch API vs Standard Requests — when does one get cheaper?
Decide whether asynchronous jobs should shift to the Batch API now that, as of 2026-04-08, OpenAI still advertises a 50% discount versus standard request processing.
Blockers
- 50% discount on input and output costs relative to standard processing
- package/openai-batch-api incompatible with capability/streaming: Batch API gives up streaming
- package/openai-assistants-api — EOL 2026-08-26
- replaces: package/openai-responses-api → package/openai-assistants-api
- replaces: package/openai-conversations-api → package/openai-assistants-api
Who this is for
- cost-sensitive
- high-scale
Candidates
Shift eligible asynchronous workloads to the Batch API
As of 2026-04-08, OpenAI's pricing page still states that the Batch API saves 50% on input and output costs relative to standard processing. The Batch API runs work asynchronously over a 24-hour window, and the Batch API FAQ says requests are processed within 24 hours. OpenAI also states batch rate limits are separate from existing limits, which can help isolate bulk jobs from online traffic. This is the clearest cost lever for non-interactive inference when waiting hours is acceptable.
When to choose
Use this for cost-sensitive, high-scale jobs such as offline enrichment, classification, summarization, or embeddings where same-request interactivity is unnecessary. The decisive factor is whether a 24-hour completion window and no streaming are acceptable in exchange for the 50% token discount.
Tradeoffs
Lowest verified token cost for eligible async work, plus separate batch rate limits. In return, you give up low-latency responses, streaming, and guaranteed completion of every item inside the SLA window.
Cautions
OpenAI says Batch API support is available across most models, but not all, so verify model support before migrating. If a batch expires or is manually cancelled, completed work is still returned and billed while unfinished work is cancelled. New async platform work should avoid depending on the Assistants API because OpenAI has already scheduled its removal for August 26, 2026 in favor of the Responses API and Conversations API.
Keep asynchronous jobs on standard requests via Responses API or Chat Completions
As of 2026-04-08, OpenAI's standard published model prices remain the baseline, and the pricing page still frames Batch as a separate 50% discount against those standard rates. Standard requests avoid the Batch API's 24-hour processing window and support normal online request patterns, which is simpler when jobs must finish promptly or need streaming-style behavior. The main post-March 31, 2026 pricing change visible on the pricing page is for Containers billing, not a removal of the Batch discount. This means standard requests remain the operationally simpler option, but not the cheapest for bulk async work.
When to choose
Use this when jobs are asynchronous in implementation but still operationally time-sensitive, user-triggered, or dependent on immediate retries and observability. The decisive factor is that latency, predictability, or feature compatibility matters more than the 50% Batch discount.
Tradeoffs
Simpler operational model and no 24-hour queue semantics, but you pay full standard token prices. You also miss the batch-specific rate-limit separation that can protect online traffic from bulk backlogs.
Cautions
Do not anchor new async integrations on the Assistants API: as of 2026-04-08, OpenAI has already announced its shutdown date as August 26, 2026, with Responses API and Conversations API as the replacement path. If you stay on standard requests, prefer building on the current Responses-era APIs rather than legacy Assistants flows.
Try with your AI agent
$ npm install -g pocketlantern $ pocketlantern init # Restart Claude Code, Cursor, or your MCP client, then ask: # "OpenAI Batch API vs Standard Requests — when does one get cheaper?"