Flow-based Policy Adaptation without Policy Updates

Sun, Luzhe; Ji, Jingtian; Chen, Haoran; Zhou, Jiawei; Walter, Matthew R.

Overview

Real Robot: Charger Insertion

01

VLA(Finetuned Flower)

Base VLA policy

Zoom-in view

Normal view

02

VLA+FPAS

Flow-prior action sampling

Zoom-in view

Normal view

03

VLA+FEEG

Energy-guided flow editing

Zoom-in view

Normal view

04

VLA+IFAE

Inversion-free editing

Zoom-in view

Normal view

Real Robot: Cup Serve

01

VLA(Finetuned Flower)

Collision failure

Red: collision after 2.5s

02

VLA+FPAS

Minor collision

Yellow: minor collision after 2.5s

03

VLA+FEEG

Successful serving

Green: final success state

04

VLA+IFAE

Successful serving

Green: final success state

Abstract

Leveraging prior knowledge from pretrained policies, foundation models, or human operators offers an efficient alternative to learning robot skills from scratch. However, these agents often provide actions that are suboptimal, noisy, or misaligned with task-specific expert behavior. We propose GLOVES, a family of flow-based adaptation methods that correct non-expert actions by transporting them toward an expert action distribution. Rather than replacing agentic control with full autonomy, GLOVES performs selective action-level adaptation, improving task success while preserving agent intent. The learned flow also provides a natural in-distribution scoring mechanism through reverse flow evaluation. We use this signal as an intervention gate: actions that appear consistent with the expert distribution are passed through unchanged, while anomalous or out-of-distribution (OOD) actions are corrected. In this way, assistance is only provided when necessary. GLOVES requires only limited expert supervision, using a small number of demonstrations or reusable successful skill segments. By learning local expert action patterns and stitching them during execution, GLOVES provides a lightweight shared-control module for robust action adaptation across tasks and environments.

Methods

OOD Detection

GLOVES reuses the expert flow as an intervention gate. For a proposed action chunk \(x\) under context \(c\), we run the learned flow backward and use the negative prior log-likelihood as a nonconformity score: \[ \hat z(x,c)=F_\theta^{-1}(x;c),\qquad s(x,c)\propto \|\hat z(x,c)\|_2^2. \] Given a calibration set of \(n\) expert action-context pairs \(\mathcal{D}_{\mathrm{cal}}=\{(x_i,c_i)\}_{i=1}^n\), we compute \(s_i=s(x_i,c_i)\). Conformal prediction then gives \[ p(x,c)= \frac{1+\sum_{i=1}^{n}\mathbf{1}\{s_i\ge s(x,c)\}}{n+1}. \] We execute chunks with \(p(x,c)>\alpha\) unchanged and adapt the rest.