FlashBack: Consistency Model-Accelerated Shared Autonomy

Toyota Technological Institute at Chicago
Interpolate start reference image.

Consistency Shared Autonomy (CSA) achieves millisecond‑level inference for high‑precision tasks using a single, task‑agnostic training pipeline with fixed hyperparameters—no tuning required.

Shared-Control

CSA enables the user to complete the task by providing only high-level input.

Real Robot Peg Insertion

We conducted a user study to evaluate the performance of CSA on a real-robot peg insertion task. The user was asked to insert the peg into a hole while observing a camera feed of the scene. CSA, significantly improved the success rate and the time to complete the task.

Abstract

Shared autonomy is an enabling technology that provides users with control authority over robots that would otherwise be difficult if not impossible to directly control. Yet, standard methods make assumptions that limit their adoption in practice -- for example, prior knowledge of the user's goals or the objective (i.e., reward) function that they wish to optimize, knowledge of the user's policy, or query-level access to the user during training.

Diffusion-based approaches to shared autonomy do not make such assumptions and instead only require access to demonstrations of desired behaviors, while allowing the user to maintain control authority. However, these advantages have come at the expense of high computational complexity, which has made real-time shared autonomy all but impossible.

To overcome this limitation, we propose Consistency Shared Autonomy (CSA), a shared autonomy framework that employs a consistency model-based formulation of diffusion. Key to CSA is that it employs the distilled probability flow of ordinary differential equations (PF ODE) to generate high-fidelity samples in a single step. This results in inference speeds significantly than what is possible with previous diffusion-based approaches to shared autonomy, enabling real-time assistance in complex domains with only a single function evaluation. Further, by intervening on flawed actions at intermediate states of the PF ODE, CSA enables varying levels of assistance. We evaluate CSA on a variety of challenging simulated and real-world robot control problems, demonstrating significant improvements over state-of-the-art methods both in terms of task performance and computational efficiency.

Method

ODE-based diffusion distillation

First, we train an ODE‑based diffusion teacher model $g(a^n,n)\mapsto \hat{a}^{n-1}$ that incrementally denoises the noisy action $a^n$. Because the ODE denoising trajectory is deterministic, we can then train a student model that distills this trajectory, producing the clean action in a single step $f(a^n,n)\mapsto a^0$, which significantly improves efficiency.

Partial diffusion as inference

During inference, we model the given user-provided action $a^u$ along the learned denoising trajectory: $a^u\sim a^0+k\cdot \mathcal{N}(0,I), k\in\{0,\dots,T\}$. Here, $k$—the partial diffusion ratio—sets the level of assistance, trading off fidelity (i.e., preserving the user’s intent) with conformity (i.e., aligning with expert actions). The student model then predicts the corrected action, $f(a^u, k) \mapsto a^0$.

Intention-preserving via next state prediction

Next state prediction

During training, CSA randomly conditions the model on either $\{\mathrm{State}_t\}$ or $\{\mathrm{State}_t, \mathrm{State}_{t+1}\}$ with a fixed dropout probability, enabling it to capture the user’s instantaneous intent.

Preserving intent

Unlike prior approaches to diffusion-based shared autonomy, CSA leverages next‑state prediction to preserve action fidelity while enhancing conformity, making performance largely insensitive to the chosen assistance level ($k$).

Interpolate start reference image.

Performance

Let $\alpha = \tfrac{k}{T}$ denote the assistance level and $\mathrm{CSA}^{\dagger}$ denote the variant of CSA that includes next‑state prediction. Relative to vanilla $\mathrm{CSA}$, $\mathrm{CSA}^{\dagger}$ is markedly less sensitive to the choice of $\alpha$ (i.e., amount of assistance).

Interpolate start reference image.

Inference time

By distilling the ODE, CSA attains millisecond‑level inference. On Lunar Lander, it delivers over a 15$\times$ speed‑up compared with the prior diffusion-based shared autonomy method (see statistics below). Full details are available in the paper.

Interpolate start reference image.

BibTeX

If you find our work useful in your research, please consider citing the paper as follows:

@article{sun25,
  author    = {Luzhe Sun and Jingtian Ji and Xiangshan Tan and Matthew R. Walter},
  title     = {{FlashBack}: {C}onsistency Model-Accelerated Shared Autonomy},
  journal   = {arXiv preprint arXiv:2505.16892},
  year      = {2025},
}