Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/694057b6-101c-8007-9a65-c40578c7252d
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==== This section makes the “boring but critical” rules explicit so the implementation and tests have a tight correctness target. The module should enforce these invariants rather than relying on convention. ==== ===== #### ===== * STOPPED: subprocess not running; cannot accept requests * RUNNING: subprocess started but not yet confirmed ready (or starting/restarting) * READY: readiness probe succeeds (GET /v1/models → 200 + JSON) * FAILED: subprocess dead or crash-loop lockout; cannot accept requests ====== - STOPPED → RUNNING (start) ====== * RUNNING → READY (readiness probe succeeds) * RUNNING → FAILED (subprocess exits before ready, or repeated probe failure beyond policy) * READY → RUNNING (restart initiated) * READY → FAILED (subprocess exits, crash-loop lockout triggers) * FAILED → RUNNING (explicit restart/start allowed by policy) * RUNNING/READY → STOPPED (stop) ====== 1. No request dispatch unless state is READY. ====== # State READY implies: - subprocess is alive and - last readiness probe succeeded within a reasonable recent window (configurable, but “probe success happened” is the minimum). # FAILED implies: - submit() returns WORKER_FAILED immediately (no slot consumption). # STOPPED implies: - submit() returns WORKER_NOT_READY immediately. ===== #### ===== * RUNNING: request task is executing (includes prefill and token streaming) * TOOL_RUNNING: normal tool execution in progress (awaiting ToolRunner) * COMPLETED: finished successfully * FAILED: finished with error * CANCELED: canceled by caller ====== - COMPLETED, FAILED, CANCELED ====== ====== - (created) → RUNNING ====== * RUNNING → TOOL_RUNNING (model emitted normal tool call) * TOOL_RUNNING → RUNNING (tool result appended; generation continues) * RUNNING → COMPLETED * RUNNING → FAILED * TOOL_RUNNING → FAILED (tool error / parse error) * RUNNING/TOOL_RUNNING → CANCELED (cancel) * RUNNING/TOOL_RUNNING → FAILED (worker restart / server death / timeout) ====== 1. One request occupies exactly one slot for its entire lifetime from acceptance to terminal state. ====== # A request ID is unique within the lifetime of a worker instance. # Output accumulation is monotonic (only grows) until terminal. # Exit tools never cause a transition by themselves: - they may be recorded at any time, but do not change request state. # Tool iteration budget is monotonic decreasing and never negative. # After <code>get_result()</code> succeeds, the request record is released: - subsequent get_status/get_result returns NOT_FOUND. ===== Slots are the core stability mechanism and must be leak-proof. ===== # No slot is consumed on failed submit. - If submit() returns NO_SLOT_AVAILABLE, nothing was allocated. # If <code>submit()</code> succeeds, a slot is consumed immediately and will be released exactly once. # Slot release must occur in a finally: block in the request task, ensuring: - tool errors - transport exceptions - cancellations - restart-triggered failures do not leak slots. # slots_used = number of non-terminal request records currently held (or equivalently, the number of acquired semaphore permits). ===== When a restart happens, correctness matters more than salvage. ===== # No replay of in-flight requests. # On restart initiation, all in-flight requests must transition to FAILED with: - fail_reason="worker_restarted" (or more specific, e.g. "server_died"), - and must free their slots promptly. # Restart is performed as: - stop process group (SIGTERM → SIGKILL) - start new subprocess - wait for readiness probe success - move worker state to READY or FAILED # Crash-loop protection invariant: - If restarts exceed max_restarts_per_window, worker becomes FAILED and will not auto-restart until explicitly restarted (or until policy allows). ===== Given long prefill, timeouts must be careful. ===== # Progress is defined as any bytes received after headers, plus any liveness evidence from prefill probes. # A request updates: - last_stream_byte_at on any received stream data - last_liveness_at on each successful liveness probe - last_progress_at = max(last_stream_byte_at, last_liveness_at) # “Stall” decisions are based on no progress for configured windows, not lack of tokens. # Connect and header timeouts remain short/moderate and are treated as “server unreachable” rather than “model slow”. ===== Normal tools: ===== # Tool calls must be matched to a known tool name from normal_tools, otherwise it’s a tool parse/validation failure (FAILED). # Each tool iteration decrements the remaining tool budget. # ToolRunner output must be JSON-serializable (or worker serializes safely), and appended as a tool message. Exit tools: # Exit tool calls are recorded into signals[] even if they are malformed (best effort), but malformed signals should not crash the worker. # Exit tools do not decrement normal tool budgets unless you explicitly decide to share budgets (default: separate; but not required). ===== Worker status: ===== * READY/RUNNING/FAILED/STOPPED must be accurate with respect to readiness probe and subprocess state. Request status: * state must reflect the real request lifecycle * output length must be consistent with accumulated text * after release, request is NOT_FOUND ===== In debug builds or behind a flag, periodically assert: ===== * 0 <= slots_used <= slots_total * semaphore permits + active requests are consistent * no request is both terminal and still has an active task * tool_iters_remaining never negative These checks are low-cost and prevent silent corruption. If you want to continue, the next section would be Appendix D: Concrete defaults (recommended default values for timeouts, tool budgets, loop detector thresholds, restart backoff) to help implementation start with sensible behavior without requiring more decisions later.
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)