It has long been presumed that if one AI agent is effective, more must be superior. However, a recent study conducted by Google, Google DeepMind, and MIT indicates that this assertion is incorrect. The study revealed that incorporating AI agents can occasionally reduce performance by up to 70%.
This discovery originates from one of the most extensive controlled research on AI agent systems conducted to date. In the paper entitled ‘Towards a Science of Scaling Agent Systems’, researchers evaluated 180 distinct configurations in the domains of financial analysis, web search, game strategy, and office chores. Their analysis contrasted single-agent systems with multi-agent teams exhibiting various communication styles: independent workers, manager-led teams, peer discussion groups, and hybrid models.
The findings were remarkable: Multi-agent teams excelled at specific tasks, enhancing financial analysis by over 80%, yet they significantly underperformed in others, such as sequential game planning, where their performance was 70% lower than that of a single agent.
The study attributes the explanation to “task fit.” Tasks that can be divided into parallel subtasks benefit from many agents; however, tasks that require sequential logic are hindered by communication overhead. As the complexity of the tools increases, the performance of multi-agent systems deteriorates.
This study is the inaugural research to provide a definitive, data-supported methodology for the appropriate use of several medicines. The team developed a predictive model that analyses a task’s structure, including the number of required tools, the sequential nature of the phases, and each agent’s performance, ultimately recommending the optimal system design with 87% accuracy.
This significantly alters numerous aspects for AI developers and enterprises. Teams should now align the system with the task rather than defaulting to multi-agent configurations. This transition will result in more intelligent, rapid, and cost-effective AI implementations, ushering in a new epoch of accuracy in AI system architecture.