question about hybrid training strategy

Thanks for open-sourcing DM0 !

I am curious about where do we implement the hybrid training strategy in the paper "DM0 employs a hybrid training strategy: for embodied data, gradients from the action expert are not backpropagated to the VLM to preserve generalized representations" ? Do you freeze VLM to achieve gradient flow ?

Thanks