Editing Openai/694057b6-101c-8007-9a65-c40578c7252d (section)

===== llama-server supports OpenAI-style function/tool calling via its chat handling, including: =====
* native tool-call formats for many model families
* a generic tool-call handler when a template isn’t recognized
* optional parallel tool calling via payload "parallel_tool_calls": true (supported but disabled by default). GitHub<ref>{{cite web|title=GitHub|url=https://raw.githubusercontent.com/ggml-org/llama.cpp/master/docs/function-calling.md|publisher=raw.githubusercontent.com|access-date=2025-12-16}}</ref>

Worker behavior requirements (transport-facing):
* The transport should not “decide” tool semantics; it should simply surface parsed JSON events to the worker/tool loop.
* Tool calls may appear: - in a final message object, or - in streaming deltas (depending on server/model/template behavior).
* The tool loop layer (tooling.py) must support: - structured tool_calls when provided, and - the BIOS-driven fallback parsing strategy when they’re not.

Note: the function-calling doc indicates llama-server tool calling is used when started with --jinja and that generic/native handlers exist. GitHub<ref>{{cite web|title=GitHub|url=https://raw.githubusercontent.com/ggml-org/llama.cpp/master/docs/function-calling.md|publisher=raw.githubusercontent.com|access-date=2025-12-16}}</ref>
(Your worker config keeps the server command fully configurable, so enabling --jinja or templates is an orchestrator concern.)