Editing Openai/692aba85-5b10-8006-817a-d5a83b020f8e (section)

==== ### ====
* モデルカード：PrimeIntellect/INTELLECT-3Hugging Face<ref>{{cite web|title=Hugging Face|url=https://huggingface.co/PrimeIntellect/INTELLECT-3|publisher=Hugging Face|access-date=2025-11-29}}</ref>
* BF16 版と FP8 版があり、さらに 8bit/4bit など複数の量子化モデルもコレクションとして公開されています。Hugging Face<ref>{{cite web|title=Hugging Face|url=https://huggingface.co/collections/PrimeIntellect/intellect-3|publisher=Hugging Face|access-date=2025-11-29}}</ref>

ライセンスは MIT / Apache2 系で、商用利用を含めかなり緩い条件。Hugging Face<ref>{{cite web|title=Hugging Face|url=https://huggingface.co/PrimeIntellect/INTELLECT-3|publisher=Hugging Face|access-date=2025-11-29}}</ref>

===== Hugging Face のモデルカードに、vLLM 用の起動例が掲載されています。Hugging Face<ref>{{cite web|title=Hugging Face|url=https://huggingface.co/PrimeIntellect/INTELLECT-3|publisher=Hugging Face|access-date=2025-11-29}}</ref> =====
* BF16 版（2× H200）：

<syntaxhighlight lang="bash">vllm serve PrimeIntellect/INTELLECT-3 \
  --tensor-parallel-size 2 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --reasoning-parser deepseek_r1

</syntaxhighlight>
* FP8 版（1× H200）：

<syntaxhighlight lang="bash">vllm serve PrimeIntellect/INTELLECT-3-FP8 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --reasoning-parser deepseek_r1

</syntaxhighlight>

ポイント：
* 107B パラメータ MoE（アクティブ ~12B） なので、それなりに GPU が必要。
* 公式例では H200 を想定していますが、 Hugging Face 側の量子化モデルを使えば小さめの GPU（A100 80G 相当など）でも現実的になります。Hugging Face<ref>{{cite web|title=Hugging Face|url=https://huggingface.co/PrimeIntellect/INTELLECT-3|publisher=Hugging Face|access-date=2025-11-29}}</ref>
* vLLM を立ててしまえば、あとは OpenAI 互換の HTTP API として どのクライアントからも使えます。