Editing Openai/694ab010-3e7c-8000-b0e8-b98c22b7a063 (section)

==== ### ====

If UI + API are together:
* 1 user opens the page → UI load
* 1 user chats → API load
* You must scale both together

Example:
* 100 users browsing UI
* Only 10 users chatting

You still pay for 100 backend LLM-capable containers

===== <syntaxhighlight>UI service: 2–3 tasks =====
API service: autoscale on CPU / requests

</syntaxhighlight>

Result:
* UI stays cheap and stable
* API scales only when chat traffic increases
* Much lower cost
* Much better reliability

This is why every serious SaaS separates frontend and backend.