Thinking Machines Lab is the AI research company founded by Mira Murati, the former CTO of OpenAI and one of the most visible figures behind ChatGPT’s development. Since leaving OpenAI, Murati has built Thinking Machines as a new AI lab focused on making advanced AI systems more useful, collaborative, and natural to work with.
On 11 May 2026, Thinking Machines announced a research preview of a new type of interaction model. Rather than simply making AI smarter on traditional text benchmarks, the work focuses on how AI behaves in live interaction: how quickly it responds, how naturally it handles voice, video, and text, and whether it can work with people in a more fluid way.
That is why this announcement is interesting for Move Agent. It suggests that the next AI race may not only be about who has the most intelligent model. It may also be about who can build AI that feels natural enough to use in live work.
The latency number that matters
One number stands out in the Thinking Machines benchmark table.
Thinking Machines reports that TML-Interaction-Small achieved 0.40 seconds turn-taking latency on FD-bench V1. In plain terms, this measures the delay between a person finishing a spoken turn and the model beginning to respond.
The reported comparison table includes:
These are vendor-reported research benchmarks, so they should not be treated as independent proof of real-world product performance. But the direction matters.
In conversation, small delays are not small. Human dialogue is built around rapid timing, overlap, correction, and prediction. Stivers et al. (2009) reports that the average human response offset across ten spoken languages was about 208ms, with Japanese at about 7.29ms and Danish at about 468.88ms for the measured question-response sequences.
The point is not that AI must exactly copy human timing. The point is that people are sensitive to conversational rhythm. Once AI gets close enough to natural timing, slower agents will feel less like assistants and more like forms with a voice interface.
The real shift is from turns to presence
Most AI still feels turn-based:
user speaks -> AI waits -> AI thinks -> AI replies
That works for research, writing, coding, and structured analysis. It works whenever the user can package the task neatly and wait for an answer.
Live operations are different. People pause, interrupt themselves, change direction, point at things, ask side questions, correct details, and show new information. The more natural pattern is closer to:
user speaks, pauses, moves, shows, interrupts, and changes direction while the AI keeps context and responds at the right moment
That is a different interaction model. Thinking Machines describes a system designed around continuous audio, video, and text streams split into small time-aligned micro-turns. The company also describes a split between a real-time interaction model and a background model that can handle deeper reasoning, tool use, browsing, and longer tasks while the interaction layer remains present with the user.
That architecture is important because it separates two needs that often conflict: immediate presence and deeper intelligence.
Beyond voice: real-time multimodal AI
This is not only about faster voice response.
Thinking Machines is pitching a broader real-time multimodal architecture covering audio, video, text, simultaneous listening and speaking, visual proactivity, time awareness, background reasoning, and tool use.
That matters because most AI products still feel like submitting a request and waiting for a response. Even when the model is powerful, the interaction pattern often feels mechanical.
For voice agents, customer support, sales calls, training, tutoring, operational assistants, and field workflows, the experience has to feel alive. A slow AI assistant quickly feels like a blocker. A responsive one starts to feel present.
The hype and the reality
This does not mean Thinking Machines has won the voice AI race. That would be hype.
The announcement is best understood as a research preview of a real-time multimodal interaction model, not as a fully proven mass-market phone-agent product. Thinking Machines says it will open a limited research preview in the coming months, with a wider release planned later in 2026.
The latency numbers are impressive, but the real test will be how systems like this perform in commercial conditions: interruptions, accents, background noise, weak connections, emotional nuance, complex tool handoffs, and messy human behaviour.
The company also flags limitations around long sessions, connectivity, safety, and scaling larger models. Those constraints matter. Low latency is only useful if the agent remains accurate, safe, and operationally reliable.
Our view: slow AI will soon feel broken
The real trend in AI is no longer only about making models more intelligent. It is about making AI more human in the way it interacts: faster, more responsive, more aware of context, and able to process multiple signals at once.
The next leap may not come only from larger reasoning models. It may come from AI that behaves more like a live collaborator: listening, seeing, reacting, adapting, and coordinating in real time.
Interaction speed and multimodal behaviour may soon matter as much as raw intelligence for agents.
Because humans will get tired of slow AI.
People tolerate delays because the technology is still new. That patience will not last forever. Once customers and operators experience AI that responds naturally, slow AI will not feel less advanced. It will feel broken.
Why this matters for Move Agent
For Move Agent, this is exactly the kind of AI research worth paying attention to.
The removals and storage industry does not need AI for the sake of AI. It needs technology that reduces friction, responds faster, understands context, and helps people move from enquiry to action with less delay.
This matters across the operational workflow:
- Lead qualification depends on momentum. If a customer is ready to explain a move, the system should capture the details before the opportunity cools.
- Survey preparation depends on context. Rooms, access issues, inventory notes, timing constraints, and customer corrections rarely arrive in a neat order.
- Customer support depends on responsiveness. People moving house are often time-pressured, uncertain, and looking for clear next steps.
- Surveyor support depends on live information. The value is not just collecting data, but helping the operator understand what matters while the job is still being shaped.
- Job planning depends on structured follow-through. The agent should help turn messy enquiry data into operationally useful information.
Move Agent is built for practical removals operations, not abstract AI demos. The lesson from Thinking Machines is that useful AI agents will need to be fast enough and context-aware enough to fit naturally into the work.
Whether it is qualifying a lead, supporting a customer, helping a surveyor, or preparing a job for review, the future of AI in removals will depend on how naturally and quickly it can work alongside people.
That is the real shift.
Sources
This article discusses research published by Thinking Machines Lab.
Thinking Machines Lab, “Interaction Models: A Scalable Approach to Human-AI Collaboration”, Thinking Machines Lab: Connectionism, May 2026. DOI: 10.64434/tml.20260511.
https://thinkingmachines.ai/blog/interaction-models/
Stivers, T. et al. “Universals and cultural variation in turn-taking in conversation.” Proceedings of the National Academy of Sciences, 2009.
https://www.pnas.org/doi/10.1073/pnas.0903616106