Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
freem
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Openai/694057b6-101c-8007-9a65-c40578c7252d
(section)
Add languages
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Assistant: === ==== This section defines the concrete typed interfaces the implementation and test suite will code against. It’s intentionally “boring” and explicit. ==== <syntaxhighlight lang="python">from __future__ import annotations from dataclasses import dataclass from datetime import datetime from enum import Enum from typing import Any, Mapping, MutableMapping, Sequence from typing_extensions import ( Literal, NotRequired, Protocol, TypedDict, runtime_checkable, ) </syntaxhighlight> ===== <syntaxhighlight lang="python">class WorkerState(str, Enum): ===== RUNNING = "running" # process started (may or may not be ready yet) READY = "ready" # GET /v1/models succeeds FAILED = "failed" # crashed, or locked out by crash-loop protection STOPPED = "stopped" # not running class RequestState(str, Enum): RUNNING = "running" TOOL_RUNNING = "tool_running" COMPLETED = "completed" FAILED = "failed" CANCELED = "canceled" </syntaxhighlight> ===== Keep these as stable strings so the orchestrator can route without parsing text. ===== <syntaxhighlight lang="python">RequestFailReason = Literal[ "worker_restarted", "server_died", "connect_failed", "headers_timeout", "stall_timeout", "tool_parse_error", "tool_execution_error", "repeated_line_loop", "canceled", "unknown_error", ] RequestFinishReason = Literal[ "stop", # normal stop sequence / end-of-generation "max_tokens", # hit max new tokens "canceled", "failed", ] </syntaxhighlight> ===== We keep this permissive so the module doesn’t need edits whenever llama.cpp adds fields. ===== <syntaxhighlight lang="python">class ChatMessage(TypedDict, total=False): role: Literal["system", "user", "assistant", "tool"] content: str name: str tool_call_id: str # For tool calling, the server may include more fields: tool_calls: Any class ToolFunctionDef(TypedDict, total=False): name: str description: str parameters: dict[str, Any] # JSON schema class ToolDef(TypedDict, total=False): type: Literal["function"] function: ToolFunctionDef class ToolCall(TypedDict, total=False): id: str type: Literal["function"] function: dict[str, Any] # expects {"name": str, "arguments": str|dict} </syntaxhighlight> ===== Exit tools are “one-way”: they are recorded, not executed, and do not change control flow. ===== <syntaxhighlight lang="python">class ExitSignal(TypedDict, total=False): tool_name: str arguments: dict[str, Any] # helpful metadata for debugging / correlation emitted_at: float </syntaxhighlight> ===== IDs are incrementing integers. job_name is caller-provided. ===== <syntaxhighlight lang="python">class SubmitOk(TypedDict): ok: Literal[True] request_id: int class SubmitErr(TypedDict): ok: Literal[False] error: Literal["NO_SLOT_AVAILABLE", "WORKER_NOT_READY", "WORKER_FAILED"] SubmitResult = SubmitOk | SubmitErr class RequestStatus(TypedDict, total=False): request_id: int job_name: str state: RequestState created_at: float dispatched_at: NotRequired[float] completed_at: NotRequired[float] last_progress_at: NotRequired[float] output_chars: NotRequired[int] # optional nice-to-haves tokens_received: NotRequired[int] tokens_per_second: NotRequired[float] # tool loop info tool_iters_remaining: NotRequired[int] # exit-tool info signals: NotRequired[list[ExitSignal]] # error info if terminal failed/canceled fail_reason: NotRequired[RequestFailReason] fail_detail: NotRequired[str] class RequestResult(TypedDict, total=False): request_id: int job_name: str state: Literal["completed", "failed", "canceled"] finish_reason: RequestFinishReason text: str signals: NotRequired[list[ExitSignal]] # terminal error info (if failed/canceled) fail_reason: NotRequired[RequestFailReason] fail_detail: NotRequired[str] </syntaxhighlight> Release semantics (locked in): * get_result(request_id) returns RequestResult only once; it also releases stored state/output. * After release, later lookups return a stable “not found”. <syntaxhighlight lang="python">class NotFound(TypedDict): ok: Literal[False] error: Literal["NOT_FOUND"] GetResultResponse = RequestResult | NotFound GetStatusResponse = RequestStatus | NotFound </syntaxhighlight> ===== Minimum: running/ready/failed. Extra fields are optional. ===== <syntaxhighlight lang="python">class WorkerStatus(TypedDict, total=False): state: WorkerState slots_total: int slots_used: int active_request_ids: list[int] restart_count: int last_error: NotRequired[str] last_ready_at: NotRequired[float] class WorkerDebugInfo(TypedDict, total=False): # bounded ring buffer content (most recent N lines) recent_logs: list[str] recent_restart_reasons: list[str] </syntaxhighlight> ===== No per-request overrides. ===== <syntaxhighlight lang="python">@dataclass(frozen=True, slots=True) class TimeoutProfile: connect_timeout_s: float headers_timeout_s: float # time-to-first-token is usually disabled / huge in your environment: ttft_timeout_s: float | None # prefill-safe: based on liveness probes before any bytes arrive prefill_liveness_timeout_s: float | None # once streaming starts: max allowed time with no bytes idle_stream_timeout_s: float | None absolute_timeout_s: float | None liveness_probe_interval_s: float # restart control restart_backoff_s: float restart_window_s: float max_restarts_per_window: int </syntaxhighlight> ===== BIOS generation is its own component. It should be easy to unit test. ===== <syntaxhighlight lang="python">@dataclass(frozen=True, slots=True) class BiosContext: now: datetime timezone_name: str worker_name: str tool_iters_remaining: int normal_tools: Sequence[ToolDef] exit_tools: Sequence[ToolDef] # optional: stable version tag for formatting evolution bios_version: str = "bios-v1" @runtime_checkable class BiosProvider(Protocol): def __call__(self, ctx: BiosContext) -> str: ... </syntaxhighlight> A separate method/component assembles the message list: <syntaxhighlight lang="python">def build_message_stack( *, bios_text: str, caller_system_prompt: str, conversation: Sequence[ChatMessage], ) -> list[ChatMessage]: """Pure function: returns full message list in required order.""" ... </syntaxhighlight> ===== Normal tools use a plugin that can be swapped for lightweight vs heavyweight implementations. ===== <syntaxhighlight lang="python">@runtime_checkable class ToolRunner(Protocol): async def run_tool( self, *, name: str, arguments: dict[str, Any], request_id: int, job_name: str, ) -> Any: """ Return any JSON-serializable result (dict/list/str/number/bool/null). Worker will serialize it into a tool message payload. """ ... </syntaxhighlight> ===== <syntaxhighlight lang="python">GenerationParams = Mapping[str, Any] ===== @dataclass(frozen=True, slots=True) class WorkerConfig: name: str host: str port: int # full command including executable: ["./llama-server", "-m", "...", "--port", "...", ...] server_cmd: Sequence[str] # env overrides, e.g. {"CUDA_VISIBLE_DEVICES": "0"} env: Mapping[str, str] slots: int timeouts: TimeoutProfile # Tools normal_tools: Sequence[ToolDef] tool_runner: ToolRunner | None # Exit tools (one-way) exit_tools: Sequence[ToolDef] # BIOS bios_provider: BiosProvider timezone_name: str </syntaxhighlight> ===== Locked-in default: ===== <syntaxhighlight lang="python">@dataclass(frozen=True, slots=True) class ReadinessProbe: method: Literal["GET"] = "GET" path: str = "/v1/models" # Used with timeouts.headers_timeout_s and connect_timeout_s </syntaxhighlight> ==== - Exit tools never alter control flow. They’re recorded into signals[] and returned to the orchestrator. ==== * Normal tool calls run through ToolRunner and continue generation (tool loop budgeted). * Progress is “any bytes after headers count as progress,” plus prefill liveness via /proc/<pid> CPU time deltas. * <code>get_result()</code> releases everything for that request_id immediately. If you want the next section after this, I’d suggest Appendix B: Internal module layout (classes/files and which component owns what), so the implementation stays clean and testable from day one.
Summary:
Please note that all contributions to freem are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 (see
Freem:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)