Why does everything speak this format?
When OpenAI’s API took off, the request and response shape it used, a list of messages in and a completion out at /v1/chat/completions, became the format every client and library was written against. Rather than invent their own, the local inference servers adopted the same shape. The result is a de-facto standard: an OpenAI-compatible API is any server that accepts those requests and returns those responses, whoever wrote it and whatever model sits behind it.
The payoff is plain. A client, an agent, or a script written for the format points at a different server by changing one thing: the base address. You can run a model on your own box and have tools built for a hosted service talk to it without modification. That portability is the main reason a self-hosted stack cares about the compatibility at all.
What does compatible not mean?
It means the envelope matches, not the contents. Two compatible servers take the same request, but the model behind each answers differently, so compatible is not interchangeable in quality or behaviour. Servers also implement the shape to different depths: an optional field one supports, another may ignore. And the format itself is not frozen, so fields and defaults drift over time and a client can fall behind.
Speed is a separate matter entirely. The API shape says nothing about how fast tokens come back; that is set by the hardware and the inference engine, not the wire format. Compatibility buys you the freedom to swap pieces. It does not promise the pieces perform alike.