Editing Openai/694057b6-101c-8007-9a65-c40578c7252d (section)

===== Readiness probe =====
* GET /v1/models
* Consider READY on: HTTP 200 + JSON parse success.

Startup sequence
* Start subprocess → probe every 0.5s initially (short burst) until ready, then back off.
* Startup max wait: 120s before declaring startup failed and moving to FAILED (this is about “server didn’t come up”, not model prefill).

Rationale: server should bind/respond quickly even if model load is heavy; if it truly needs longer, increase this per worker.