OpenAI-compatible API: the shape everything speaks : Learn

An OpenAI-compatible API is a server that accepts the same request and response shape OpenAI's API popularised, most visibly the /v1/chat/completions endpoint. Most local inference servers expose it, so a client written for one points at another by changing only the address.

At a glance

What it is

The de-facto request/response shape most inference servers expose

The signature endpoint

/v1/chat/completions, with messages in, a completion out

Why it matters

Swap server or model by changing an address, not your code

What it does not promise

Identical behaviour; the wire format matches, the model does not

Why does everything speak this format?

When OpenAI’s API took off, the request and response shape it used, a list of messages in and a completion out at /v1/chat/completions, became the format every client and library was written against. Rather than invent their own, the local inference servers adopted the same shape. The result is a de-facto standard: an OpenAI-compatible API is any server that accepts those requests and returns those responses, whoever wrote it and whatever model sits behind it.

The payoff is plain. A client, an agent, or a script written for the format points at a different server by changing one thing: the base address. You can run a model on your own box and have tools built for a hosted service talk to it without modification. That portability is the main reason a self-hosted stack cares about the compatibility at all.

What does compatible not mean?

It means the envelope matches, not the contents. Two compatible servers take the same request, but the model behind each answers differently, so compatible is not interchangeable in quality or behaviour. Servers also implement the shape to different depths: an optional field one supports, another may ignore. And the format itself is not frozen, so fields and defaults drift over time and a client can fall behind.

Speed is a separate matter entirely. The API shape says nothing about how fast tokens come back; that is set by the hardware and the inference engine, not the wire format. Compatibility buys you the freedom to swap pieces. It does not promise the pieces perform alike.

OpenAI-compatible API: the shape everything speaks

At a glance

One client, any compatible server

Why does everything speak this format?

What does compatible not mean?

The standard shape gives you

It does not give you

Related terms

At a glance

One client, any compatible server

Why does everything speak this format?

What does compatible not mean?

The standard shape gives you

It does not give you

Related terms

Go deeper