What did Thinking Machines announce in May 2026?

Thinking Machines Lab announced a research preview of interaction models: AI systems designed to handle audio, video, and text continuously, with real-time perception and response rather than a strict turn-based interface.

Why does turn-taking latency matter for AI agents?

Latency changes how an AI assistant feels in live work. Small delays can make a voice or multimodal agent feel mechanical, while faster response can make it feel present, collaborative, and easier to use during active workflows.

Is Move Agent using Thinking Machines models?

This article is a research commentary, not a product or partnership announcement. The relevance for Move Agent is the direction of travel: responsive, multimodal, context-aware AI matters for removals operations.

Why is real-time AI important for removals and storage companies?

Removals workflows depend on quick qualification, accurate context, and practical follow-up. Customers correct themselves, provide incomplete details, ask timing questions, and share visual information. AI agents need to handle that fluidly to reduce friction.

Thinking Machines and the Real Shift in AI: Interaction Speed May Matter as Much as Intelligence

Thinking Machines Lab is the AI research company founded by Mira Murati, the former CTO of OpenAI and one of the most visible figures behind ChatGPT’s development. Since leaving OpenAI, Murati has built Thinking Machines as a new AI lab focused on making advanced AI systems more useful, collaborative, and natural to work with.

On 11 May 2026, Thinking Machines announced a research preview of a new type of interaction model. Rather than simply making AI smarter on traditional text benchmarks, the work focuses on how AI behaves in live interaction: how quickly it responds, how naturally it handles voice, video, and text, and whether it can work with people in a more fluid way.

That is why this announcement is interesting for Move Agent. It suggests that the next AI race may not only be about who has the most intelligent model. It may also be about who can build AI that feels natural enough to use in live work.

The latency number that matters

One number stands out in the Thinking Machines benchmark table.

Thinking Machines reports that TML-Interaction-Small achieved 0.40 seconds turn-taking latency on FD-bench V1. In plain terms, this measures the delay between a person finishing a spoken turn and the model beginning to respond.

The reported comparison table includes:

TML-Interaction-Small

0.40s

Gemini-3.1-flash-live-preview minimal

0.57s

GPT-realtime-1.5

0.59s

Gemini-3.1-flash-live-preview high

0.94s

GPT-realtime-2.0 minimal

1.18s

GPT-realtime-2.0 xhigh

1.63s

Qwen 3.5 OMNI-plus-realtime

2.14s

These are vendor-reported research benchmarks, so they should not be treated as independent proof of real-world product performance. But the direction matters.

In conversation, small delays are not small. Human dialogue is built around rapid timing, overlap, correction, and prediction. Stivers et al. (2009) reports that the average human response offset across ten spoken languages was about 208ms, with Japanese at about 7.29ms and Danish at about 468.88ms for the measured question-response sequences.

The point is not that AI must exactly copy human timing. The point is that people are sensitive to conversational rhythm. Once AI gets close enough to natural timing, slower agents will feel less like assistants and more like forms with a voice interface.

The real shift is from turns to presence

Most AI still feels turn-based:

user speaks -> AI waits -> AI thinks -> AI replies

That works for research, writing, coding, and structured analysis. It works whenever the user can package the task neatly and wait for an answer.

Live operations are different. People pause, interrupt themselves, change direction, point at things, ask side questions, correct details, and show new information. The more natural pattern is closer to:

user speaks, pauses, moves, shows, interrupts, and changes direction while the AI keeps context and responds at the right moment

That is a different interaction model. Thinking Machines describes a system designed around continuous audio, video, and text streams split into small time-aligned micro-turns. The company also describes a split between a real-time interaction model and a background model that can handle deeper reasoning, tool use, browsing, and longer tasks while the interaction layer remains present with the user.

That architecture is important because it separates two needs that often conflict: immediate presence and deeper intelligence.

Beyond voice: real-time multimodal AI

This is not only about faster voice response.

Thinking Machines is pitching a broader real-time multimodal architecture covering audio, video, text, simultaneous listening and speaking, visual proactivity, time awareness, background reasoning, and tool use.

That matters because most AI products still feel like submitting a request and waiting for a response. Even when the model is powerful, the interaction pattern often feels mechanical.

For voice agents, customer support, sales calls, training, tutoring, operational assistants, and field workflows, the experience has to feel alive. A slow AI assistant quickly feels like a blocker. A responsive one starts to feel present.

The hype and the reality

This does not mean Thinking Machines has won the voice AI race. That would be hype.

The announcement is best understood as a research preview of a real-time multimodal interaction model, not as a fully proven mass-market phone-agent product. Thinking Machines says it will open a limited research preview in the coming months, with a wider release planned later in 2026.

The latency numbers are impressive, but the real test will be how systems like this perform in commercial conditions: interruptions, accents, background noise, weak connections, emotional nuance, complex tool handoffs, and messy human behaviour.

The company also flags limitations around long sessions, connectivity, safety, and scaling larger models. Those constraints matter. Low latency is only useful if the agent remains accurate, safe, and operationally reliable.

Our view: slow AI will soon feel broken

The real trend in AI is no longer only about making models more intelligent. It is about making AI more human in the way it interacts: faster, more responsive, more aware of context, and able to process multiple signals at once.

The next leap may not come only from larger reasoning models. It may come from AI that behaves more like a live collaborator: listening, seeing, reacting, adapting, and coordinating in real time.

Interaction speed and multimodal behaviour may soon matter as much as raw intelligence for agents.

Because humans will get tired of slow AI.

People tolerate delays because the technology is still new. That patience will not last forever. Once customers and operators experience AI that responds naturally, slow AI will not feel less advanced. It will feel broken.

Why this matters for Move Agent

For Move Agent, this is exactly the kind of AI research worth paying attention to.

The removals and storage industry does not need AI for the sake of AI. It needs technology that reduces friction, responds faster, understands context, and helps people move from enquiry to action with less delay.

This matters across the operational workflow:

Lead qualification depends on momentum. If a customer is ready to explain a move, the system should capture the details before the opportunity cools.
Survey preparation depends on context. Rooms, access issues, inventory notes, timing constraints, and customer corrections rarely arrive in a neat order.
Customer support depends on responsiveness. People moving house are often time-pressured, uncertain, and looking for clear next steps.
Surveyor support depends on live information. The value is not just collecting data, but helping the operator understand what matters while the job is still being shaped.
Job planning depends on structured follow-through. The agent should help turn messy enquiry data into operationally useful information.

Move Agent is built for practical removals operations, not abstract AI demos. The lesson from Thinking Machines is that useful AI agents will need to be fast enough and context-aware enough to fit naturally into the work.

Whether it is qualifying a lead, supporting a customer, helping a surveyor, or preparing a job for review, the future of AI in removals will depend on how naturally and quickly it can work alongside people.

That is the real shift.

Sources

This article discusses research published by Thinking Machines Lab.

Thinking Machines Lab, “Interaction Models: A Scalable Approach to Human-AI Collaboration”, Thinking Machines Lab: Connectionism, May 2026. DOI: 10.64434/tml.20260511.
https://thinkingmachines.ai/blog/interaction-models/

Stivers, T. et al. “Universals and cultural variation in turn-taking in conversation.” Proceedings of the National Academy of Sciences, 2009.
https://www.pnas.org/doi/10.1073/pnas.0903616106