[ACL 2025] Do Androids Question Electric Sheep? A Multi-Agent Cognitive Simulation of Philosophical Reflection on Hybrid Table Reasoning
Do Androids Overthink Electric Sheep?
Turns out, AI might be just like us when it comes to thinking too much.
In my latest paper, I set up a “philosophical debate club” for language models—where AI agents role-play as Socrates, Aristotle, and other great thinkers to solve tricky table-reasoning problems.
The result? They definitely think. But just like humans, more thinking doesn’t always mean better answers. In fact, after a certain point—what I call the “overthinking threshold”—performance starts dropping. Sound familiar?
LLMs showed eerily human-like behaviors: under-confidence, echo chambers, going off-topic, and even daydreaming. Maybe androids do question electric sheep… a little too much.
👉 Read the full paper here – and let me know if you’ve also found yourself stuck in an infinite thought loop.
Abstract:
While LLMs demonstrate remarkable reasoning capabilities and multi-agent applicability, their tendency to “overthink” and “groupthink” pose intriguing parallels to human cognitive limitations. Inspired by this observation, we conduct an exploratory simulation to investigate whether LLMs are wise enough to be thinkers of philosophical reflection. We design two frameworks, Philosopher and Symposium, which simulate self- and group-reflection for multi-persona in hybrid table reasoning tasks. Through experiments across four benchmarks, we discover that while introducing varied perspectives might help, LLMs tend to under-perform simpler end-to-end approaches. We reveal from close reading five emergent behaviors which strikingly resemble human cognitive closure-seeking behaviors, and identify a consistent pattern of “overthinking threshold” across all tasks, where collaborative reasoning often reaches a critical point of diminishing returns. This study sheds light on a fundamental challenge shared by both human and machine intelligence: the delicate balance between deliberation and decisiveness.