MiniMax-M3 - Model ใหม่ 1M Context ควบคู่กับ LiteLLM Gateway

June 16, 2026 · 7 min read

Author

สารบัญ

TL;DR
MiniMax-M3 คืออะไร
Benchmark ที่น่าสนใจ
พารามิเตอร์ที่แตกต่างจาก Qwen
ตั้งค่า LiteLLM Gateway
1. M3-CHAT - ใช้แทน GPT ทั่วไป
2. M3-THINK - Reasoning ทั่วไป
3. M3-CODE - Coding แม่นยำ
4. M3-AGENT - Tool Calling
5. M3-LONGCTX - งานเอกสารใหญ่
6. M3-ULTRA - Agent สุดขั้ว
ถ้าจะทำ LiteLLM Gateway จริง
สรุป
ตารางเปรียบเทียบ profile
อ้างอิง

เมื่อสัปดาห์ที่แล้วผมลองเปลี่ยนโมเดลบน LiteLLM proxy จาก Qwen3.6-35B ไปเป็น MiniMax-M3 แล้วเจอว่าพารามิเตอร์ที่ใช้อยู่กับ Qwen ใช้กับ M3 ไม่ได้เลย

Qwen ผมตั้ง presence_penalty=1.5, top_k=20, chat_template_kwargs={"preserve_thinking": true} — แต่ M3 เพิกเฉย top_k (บน API), ไม่มี presence_penalty, และใช้ reasoning_split แทน preserve_thinking

Note: อย่า copy-paste config ระหว่าง model — แต่ละโมเดลมีพารามิเตอร์ default และ recommended range ต่างกัน การตั้งค่า Qwen มาใส่ M3 โดยไม่ตรวจสอบ = พารามิเตอร์ที่ถูกเพิกเฉยโดยไม่มีการแจ้งเตือน

พออ่าน docs ของ MiniMax อย่างละเอียดแล้ว เลยต้องปรับ profile ใหม่ทั้งหมด — ผลที่ได้คือชุด config ที่ผมใช้มาตลอด และอยากเอามาเล่าให้ฟัง

TL;DR

MiniMax-M3 เป็น open-weight model จาก MiniMax ที่รวม 3 capability สำคัญไว้ในโมเดลเดียว: coding & agentic (SWE-bench Pro 59%), context window 1M tokens และ native multimodal สำหรับคนที่ใช้ผ่าน LiteLLM Gateway ต้องระวังว่า M3 ไม่รองรับ presence_penalty / top_k (บน API) และใช้ max_completion_tokens + reasoning_split แทน max_tokens / preserve_thinking ผมแบ่ง profile ไว้ 6 แบบตาม use case ตั้งแต่ chat ทั่วไป ไปจนถึง autonomous agent สุดขั้ว

MiniMax-M3 คืออะไร

MiniMax-M3 เป็น open-weight model ที่ปล่อยออกมาเมื่อ 1 มิถุนายน 2026 ภายใต้สัญญาอนุญาต MiniMax Community License

M3 มี 3 จุดเด่นที่รวมอยู่ในโมเดลเดียว:

Coding & agentic performance — SWE-bench Pro 59% เอาชนะ GPT-5.5
Context window 1 ล้านโทเคน (รับประกันขั้นต่ำ 512K)
Multimodal แบบดั้งเดิม — รองรับข้อความ ภาพ และวิดีโอ

โมเดลนี้ใช้ architecture ชื่อ MiniMax Sparse Attention (MSA) ซึ่งแบ่ง KV cache เป็น block เพื่อลด per-token compute ที่ context ยาวลงเหลือประมาณ 1/20 ของ dense attention ธรรมดา

Property	Value
Parameters	~428B total / ~23B active (MoE)
Context Window	1,000,000 tokens (min. 512K)
Max Output	512,000 tokens (recommended 128K)
Temperature Default	1.0 (range [0, 2])
Top-P Default	0.95
Pricing	$0.60 / M tokens input, $2.40 / M tokens output

Benchmark ที่น่าสนใจ

BrowseComp: 83.5 — เอาชนะ Opus 4.7 (79.3)

PostTrainBench: 37.1 — อันดับ 3 ของโลก (หลัง Opus 4.7: 42.4, GPT-5.5: 39.3)

SWE-bench Pro: 59% — เอาชนะ GPT-5.5

Terminal-Bench 2.1: 66%

MCP Atlas: 74.2%

ที่น่าสนใจคือ autonomous case study — M3 สามารถ replicate งานวิจัย Outstanding Paper ของ ICLR 2025 ได้เองใน ~12 ชั่วโมง โดยไม่มีคนช่วย ตั้งแต่การ parse ตารางและสูตรจากภาพ ไปจนถึงเขียนโค้ดและรัน experiment ทั้งหมด

พารามิเตอร์ที่แตกต่างจาก Qwen

ก่อนจะเข้า config ผมขอชี้จุดที่ M3 ต่างจาก Qwen3.6-35B ที่ผมใช้อยู่:

Parameter	Qwen3.6-35B	MiniMax-M3
`temperature`	0.7	1.0 (default)
`top_p`	0.8	0.95 (default)
`top_k`	20	เพิกเฉยบน API (ใช้ได้บน vLLM/SGLang: แนะนำ 40)
`presence_penalty`	1.5	เพิกเฉย
`reasoning`	`chat_template_kwargs.preserve_thinking`	`reasoning_split: true` + `thinking` param
`max_tokens`	ใช้ได้	ใช้ `max_completion_tokens` แทน

Note: max_tokens deprecated แล้ว — M3 ใช้ max_completion_tokens แทน ถ้ายังส่ง max_tokens อยู่ API จะเพิกเฉยโดยไม่แจ้งเตือน

reasoning_split vs thinking: reasoning_split เป็นแค่ output-format switch — แยก thinking ออกจาก content ส่วน thinking คือตัวควบคุมว่าจะให้ M3 reasoning หรือไม่ (enabled / adaptive / disabled) ถ้าไม่ระบุ thinking จะ default เป็น adaptive ซึ่งเปิด reasoning อัตโนมัติ

ตั้งค่า LiteLLM Gateway

ผมแบ่ง profile ไว้ 6 แบบสำหรับงานแต่ละประเภท — ทุก profile ใช้ model เดียวคือ minimax/MiniMax-M3 ต่างกันแค่พารามิเตอร์

1. M3-CHAT - ใช้แทน GPT ทั่วไป

{
  "model": "minimax/MiniMax-M3",
  "temperature": 0.8,
  "top_p": 0.95,
  "max_completion_tokens": 8192,
  "reasoning_split": false
}

เหมาะสำหรับ: Chat, QA, Summarization, Translation

LiteLLM alias:

model_name: minimax-m3-chat

2. M3-THINK - Reasoning ทั่วไป

{
  "model": "minimax/MiniMax-M3",
  "temperature": 1.0,
  "top_p": 0.95,
  "max_completion_tokens": 16384,
  "reasoning_split": true
}

เหมาะสำหรับ: วิเคราะห์ระบบ, Architecture Review, Agent Planning, Research

LiteLLM alias:

model_name: minimax-m3-think

3. M3-CODE - Coding แม่นยำ

{
  "model": "minimax/MiniMax-M3",
  "temperature": 0.7,
  "top_p": 0.95,
  "max_completion_tokens": 16384,
  "reasoning_split": true
}

เหมาะสำหรับ: Coding ทั่วไป เช่น FastAPI, React, PostgreSQL, Docker, Kubernetes

Note: ผมไม่ลด temperature ลงเหลือ 0.3–0.5 เหมือน Qwen — จาก official docs recommended value คือ 1.0 (range [0, 2]) การลดต่ำเกินไปอาจทำให้ response ตายตัว

LiteLLM alias:

model_name: minimax-m3-code

4. M3-AGENT - Tool Calling

{
  "model": "minimax/MiniMax-M3",
  "temperature": 0.6,
  "top_p": 0.9,
  "max_completion_tokens": 8192,
  "reasoning_split": true
}

เหมาะสำหรับ: Hermes, OpenCode, Codex CLI, Roo, MCP Workflow

ลด temperature เพื่อให้ workflow มีเสถียรภาพมากขึ้น

LiteLLM alias:

model_name: minimax-m3-agent

5. M3-LONGCTX - งานเอกสารใหญ่

{
  "model": "minimax/MiniMax-M3",
  "temperature": 0.7,
  "top_p": 0.9,
  "max_completion_tokens": 16384,
  "reasoning_split": true
}

เหมาะสำหรับ: Log Analysis, Repo Analysis, PR Review, RFC Review, 100K–1M Token Context

M3 มีจุดขายเรื่อง 1M Context โดยเฉพาะ — profile นี้เน้นใช้ capability นั้นโดยตรง

LiteLLM alias:

model_name: minimax-m3-longctx

6. M3-ULTRA - Agent สุดขั้ว

{
  "model": "minimax/MiniMax-M3",
  "temperature": 1.0,
  "top_p": 0.95,
  "max_completion_tokens": 32768,
  "reasoning_split": true
}

เหมาะสำหรับ: Deep Research, Autonomous Agent, Multi-step Planning, SWE-bench style task

LiteLLM alias:

model_name: minimax-m3-ultra

ถ้าจะทำ LiteLLM Gateway จริง

config เต็มใน config.yaml:

model_list:
  - model_name: minimax-m3-chat
    litellm_params:
      model: minimax/MiniMax-M3
      api_key: os.environ/MINIMAX_API_KEY
      api_base: https://api.minimax.io/v1
      temperature: 0.8
      top_p: 0.95
      max_completion_tokens: 8192

  - model_name: minimax-m3-think
    litellm_params:
      model: minimax/MiniMax-M3
      api_key: os.environ/MINIMAX_API_KEY
      api_base: https://api.minimax.io/v1
      temperature: 1.0
      top_p: 0.95
      max_completion_tokens: 16384
      reasoning_split: true

  - model_name: minimax-m3-code
    litellm_params:
      model: minimax/MiniMax-M3
      api_key: os.environ/MINIMAX_API_KEY
      api_base: https://api.minimax.io/v1
      temperature: 0.7
      top_p: 0.95
      max_completion_tokens: 16384
      reasoning_split: true

  - model_name: minimax-m3-agent
    litellm_params:
      model: minimax/MiniMax-M3
      api_key: os.environ/MINIMAX_API_KEY
      api_base: https://api.minimax.io/v1
      temperature: 0.6
      top_p: 0.9
      max_completion_tokens: 8192
      reasoning_split: true

  - model_name: minimax-m3-longctx
    litellm_params:
      model: minimax/MiniMax-M3
      api_key: os.environ/MINIMAX_API_KEY
      api_base: https://api.minimax.io/v1
      temperature: 0.7
      top_p: 0.9
      max_completion_tokens: 16384
      reasoning_split: true

  - model_name: minimax-m3-ultra
    litellm_params:
      model: minimax/MiniMax-M3
      api_key: os.environ/MINIMAX_API_KEY
      api_base: https://api.minimax.io/v1
      temperature: 1.0
      top_p: 0.95
      max_completion_tokens: 32768
      reasoning_split: true

พิมพ์สลับโมเดลใน OpenCode หรือ Hermes จะง่ายมาก:

/model minimax-m3-code
/model minimax-m3-agent
/model minimax-m3-longctx

Note: ถ้าต้องเลือกแค่ 3 profile สำหรับงานสาย DevOps / Backend / Agent ที่ใช้งานบ่อย ผมจะเก็บแค่ minimax-m3-chat, minimax-m3-code, minimax-m3-agent เพราะครอบคลุมงานประมาณ 90% ที่ใช้จริง

สรุป

M3 เป็นโมเดลที่น่าสนใจสำหรับงาน coding และ agentic โดยเฉพาะถ้าต้องการ context window ยาว — แต่ต้องอย่าลืมว่าพารามิเตอร์ default ต่างจากโมเดลอื่นๆ การ copy config มาใช้โดยตรงโดยไม่ตรวจสอบ = พารามิเตอร์ที่ถูกเพิกเฉย

ถ้าใครกำลังทดลอง M3 ผ่าน LiteLLM Gateway — ลองเริ่มจาก profile m3-chat ก่อน แล้วค่อยขยับไป m3-code หรือ m3-agent ตาม use case

ตารางเปรียบเทียบ profile

Profile	Temp	Top-P	Max Tokens	Reasoning	เหมาะกับ
M3-CHAT	0.8	0.95	8K	❌	Chat, QA, Summarization, Translation
M3-THINK	1.0	0.95	16K	✅	วิเคราะห์ระบบ, Architecture, Research
M3-CODE	0.7	0.95	16K	✅	Coding ทั่วไป: FastAPI, React, PostgreSQL, Docker, Kubernetes
M3-AGENT	0.6	0.90	8K	✅	Hermes, OpenCode, Codex CLI, MCP
M3-LONGCTX	0.7	0.90	16K	✅	Log/Repo/PR Review, 100K–1M tokens
M3-ULTRA	1.0	0.95	32K	✅	Deep Research, Autonomous Agent, SWE-bench

Note: temperature ต่ำสุดใน profile ของผมคือ 0.6 (M3-AGENT) — API รองรับ range [0, 2] แต่การลดต่ำเกินไปอาจทำให้ response ตายตัว

อ้างอิง

MiniMax M3 Official - Model overview, benchmarks, architecture
MiniMax API Docs - API reference, parameters, examples
MiniMax OpenAI-Compatible API - Chat completions format, multimodal input
LiteLLM MiniMax Provider Docs - Proxy configuration, SDK usage
Ollama MiniMax-M3 - Ollama Cloud deployment (minimax-m3:cloud)
HuggingFace MiniMax-M3 - Model weights, inference parameters (temperature=1.0, top_p=0.95, top_k=40), MiniMax Community License

model minimax litellm llm-gateway coding

แชร์บทความ

Facebook X

☕

เนื้อหานี้มีประโยชน์ไหม? ช่วยสนับสนุนค่ากาแฟให้ผู้เขียนสักแก้ว

Buy Me a Coffee

สารบัญ

TL;DR​

MiniMax-M3 คืออะไร​

Benchmark ที่น่าสนใจ​

พารามิเตอร์ที่แตกต่างจาก Qwen​

ตั้งค่า LiteLLM Gateway​

1. M3-CHAT - ใช้แทน GPT ทั่วไป​

2. M3-THINK - Reasoning ทั่วไป​

3. M3-CODE - Coding แม่นยำ​

4. M3-AGENT - Tool Calling​

5. M3-LONGCTX - งานเอกสารใหญ่​

6. M3-ULTRA - Agent สุดขั้ว​

ถ้าจะทำ LiteLLM Gateway จริง​

สรุป​

ตารางเปรียบเทียบ profile​

อ้างอิง​

TL;DR

MiniMax-M3 คืออะไร

Benchmark ที่น่าสนใจ

พารามิเตอร์ที่แตกต่างจาก Qwen

ตั้งค่า LiteLLM Gateway

1. M3-CHAT - ใช้แทน GPT ทั่วไป

2. M3-THINK - Reasoning ทั่วไป

3. M3-CODE - Coding แม่นยำ

4. M3-AGENT - Tool Calling

5. M3-LONGCTX - งานเอกสารใหญ่

6. M3-ULTRA - Agent สุดขั้ว

ถ้าจะทำ LiteLLM Gateway จริง

สรุป

ตารางเปรียบเทียบ profile

อ้างอิง