Editing Openai/694057b6-101c-8007-9a65-c40578c7252d (section)

==== ### ====

Each worker instance is created with a TimeoutPolicy, e.g.
* connect_timeout_s (TCP connect)
* headers_timeout_s (time to receive HTTP headers / accept the request)
* idle_stream_timeout_s (once streaming, max time with no bytes)
* prefill_liveness_timeout_s (before first byte/token, max time with no liveness evidence)
* absolute_timeout_s (optional, usually None)
* restart_backoff_s and max_restarts_per_window (crash-loop protection)

Default stance for your environment
* connect_timeout_s: small (1–3s)
* headers_timeout_s: moderate (10–30s)
* prefill_liveness_timeout_s: large (e.g., 20–60+ minutes) or None if you want “never kill slow prefill unless dead”
* idle_stream_timeout_s: moderate/large (e.g., 60–300s) since once it’s streaming, silence usually means trouble
* absolute_timeout_s: None by default

===== Timeouts that apply before first token depend on having something to watch. Require: =====
* liveness_probe_interval_s
* liveness_sources list (enabled/disabled): - subprocess alive (always) - /proc/<pid> CPU time delta (portable on Linux) - optional GPU activity sampling (NVIDIA-only; nice-to-have)

Define a single internal value per request:
* last_liveness_at
* last_stream_byte_at
* and a derived last_progress_at = max(last_liveness_at, last_stream_byte_at)

All “stall” logic keys off last_progress_at.

===== Callers can override upwards or downwards within guardrails: =====
* e.g., submit(..., timeouts_override={ "absolute_timeout_s": 3600 })

Precedence:
# request override (if provided)
# worker policy
# module hard minimums/maximums (safety rails)

===== If a timeout triggers, the request ends with: =====
* FAILED(reason="headers_timeout")
* FAILED(reason="prefill_liveness_timeout")
* FAILED(reason="idle_stream_timeout")
* FAILED(reason="absolute_timeout")

and the worker may also:
* restart the subprocess for stall-type reasons (prefill/idle), per policy.

===== You can optionally define a few named presets (just for convenience in your higher-level config): =====
* fast: aggressive timeouts (small models)
* normal: reasonable defaults
* slow_prefill: huge/disabled TTFT + large prefill liveness
* debug: disables restarts, logs everything

But the key requirement is the policy object, not the naming.