Abstract

对于复杂问题（如多跳推理或多模态任务），不是用单个teacher distill到单个student，而是teacher distill到多个specialized student agent（每个student聚焦不同子任务或能力）。然后，这些students使用specialize MCP Box进行multi-agent cooperation（e.g., 通过通信、任务分工）。比较其效果是否优于“单纯的multi-agent cooperation”（即不使用distillation的多个代理直接协作）。

Proposed Method

The framework is fully training-free, with three phases: teacher-driven MCP generation, MCP Box construction, and multi-agent collaboration.

Teacher Role and MCP Generation:
- A large teacher agent (e.g., GPT-4o) analyzes the complex task and decomposes it into sub-tasks (e.g., image analysis for multi-modal tasks, numerical reasoning for multi-hop problems).
- For each sub-task, the teacher generates specialized MCPs (e.g., an MCP for image thresholding, another for arithmetic combinations) from its task-solving trajectories.
MCP Box Construction:
- Extracted MCPs undergo abstraction, clustering, and consolidation [Qiu et al., 2025] to form a shared MCP Box.
- Specialization: Subsets of MCPs are assigned to students (e.g., image-related MCPs to Student A), creating "specialized" views of the box while maintaining a shared core for collaboration.
Multi-Agent Collaboration Phase:
- Multiple small student agents (e.g., Llama-3.1-8B) directly integrate the MCP Boxes (no training or fine-tuning).
- Students cooperate via communication/task division (e.g., in AutoGen), invoking MCPs from the box (e.g., Student A handles visual input with image MCPs, Student B integrates with reasoning MCPs).
- This leverages teacher-generated specialization: Students inherit sub-task expertise via targeted MCP subsets, enabling efficient use of the shared box.

Experimental Design

Baselines:
- Pure multi-agent cooperation: Multiple small LLMs collaborating without MCP Boxes or teacher generation.
- Ablations: Multi-agents with generic (non-teacher-generated) tools; single-agent with MCP Box.
Benchmarks:
- Multi-hop reasoning: Game of 24 (mathematical puzzles).
- real-world VQA datasets, PathVQA and SLAKE
- Multi-modal tasks
- Metrics: Task success rate, collaboration efficiency (e.g., communication rounds), generalization to unseen sub-tasks, and latency.
Implementation:
- Teacher: GPT-4o for sub-task decomposition and MCP generation.
- Students: 3-5 small LLMs (e.g., Qwen-8B), each assigned specialized MCP subsets.
- Framework: AutoGen for collaboration; evaluate on 100-500 task instances per benchmark.
Hypotheses:
- Teacher-generated specialized MCP Boxes improve multi-agent success by 10-20% over pure baselines due to inherited sub-task expertise.
- Shared boxes reduce collaboration overhead by 15-30%

Reference

https://arxiv.org/pdf/2506.14728

https://arxiv.org/pdf/2505.13820v1

https://arxiv.org/pdf/2502.11271

https://arxiv.org/abs/2505.17612