Learn

OpenAI-compatible API: the shape everything speaks

An OpenAI-compatible API is a server that accepts the same request and response shape OpenAI's API popularised, most visibly the /v1/chat/completions endpoint. Most local inference servers expose it, so a client written for one points at another by changing only the address.

At a glance

What it is
The de-facto request/response shape most inference servers expose
The signature endpoint
/v1/chat/completions, with messages in, a completion out
Why it matters
Swap server or model by changing an address, not your code
What it does not promise
Identical behaviour; the wire format matches, the model does not
Flow

One client, any compatible server

The client speaks one format. Point it at a local server or a hosted one and the request looks the same; only the address and the model behind it change.

1
Your client or agent speaks the chat-completions format
2
OpenAI-compatible endpoint local server or a hosted API, same wire shape
3
A model answers swap the model without touching the client

Why does everything speak this format?

When OpenAI’s API took off, the request and response shape it used, a list of messages in and a completion out at /v1/chat/completions, became the format every client and library was written against. Rather than invent their own, the local inference servers adopted the same shape. The result is a de-facto standard: an OpenAI-compatible API is any server that accepts those requests and returns those responses, whoever wrote it and whatever model sits behind it.

The payoff is plain. A client, an agent, or a script written for the format points at a different server by changing one thing: the base address. You can run a model on your own box and have tools built for a hosted service talk to it without modification. That portability is the main reason a self-hosted stack cares about the compatibility at all.

What does compatible not mean?

It means the envelope matches, not the contents. Two compatible servers take the same request, but the model behind each answers differently, so compatible is not interchangeable in quality or behaviour. Servers also implement the shape to different depths: an optional field one supports, another may ignore. And the format itself is not frozen, so fields and defaults drift over time and a client can fall behind.

Speed is a separate matter entirely. The API shape says nothing about how fast tokens come back; that is set by the hardware and the inference engine, not the wire format. Compatibility buys you the freedom to swap pieces. It does not promise the pieces perform alike.

The standard shape gives you

  • Point an existing client at a local server by changing the base address
  • Swap the model behind the endpoint without rewriting clients
  • Reuse tooling and libraries built for the format off the shelf
  • Move between a hosted API and your own box with little friction

It does not give you

  • The same answers; the wire format matches, the model behind it does not
  • Every optional feature; servers implement the shape to varying depth
  • A guarantee of speed; that is set by the hardware and the engine
  • Freedom from version drift; fields and defaults still change over time

Related terms

← All terms Reviewed: June 2026