Generating large-scale synthetic data in simulation is a feasible alternative to collecting/labelling real data for training vision-based deep learning models, albeit the modelling inaccuracies do not generalize to the physical world. In this paper, we present a Domain-Invariant Robot Learning (DIRL) algorithm to adapt deep models to the physical environment with a small amount of real data. Existing approaches that only mitigate the covariate shift by aligning the marginal distributions across the domains and assume the conditional distributions to be domain-invariant can lead to ambiguous transfer in real scenarios. We propose to jointly align the marginal (input domains) and the conditional (output labels) distributions to mitigate the covariate and the conditional shift across the domains with adversarial learning, and combine it with a triplet distribution loss to make the conditional distributions disjoint in the shared feature space. Experiments on digit domains yield state-of-the-art performance on challenging benchmarks, while sim-to-real transfer of object recognition for vision-based decluttering with a mobile robot improves from 26.8 % to 91.0 %.
Domain Invariant Robot Learning (DIRL)
_________________________conventional supervised learning on labeled source data_____________________________
____conventional domain alignment with marginal distributions: (top) cross-label mismatch, (bottom) label shift____
________________DIRL: domain and policy alignment with marginal and conditional distributions_________________
Conceptual illustration of domain-invariant robot learning on 2D synthetic data: (top) conventional supervised learning on source domain does not generalize to the target domain drawn from a different distribution, (middle) unsupervised alignment of marginal distributions across domains leads to cross mapping of class categories with negative transfer, (bottom) DIRL leverages upon a few labeled target examples to semantically align both the marginal and the class distributions for semi-supervised domain adaptation.
Benchmarks on Digit Domains
T-SNE visualization of MNIST -> MNISTM (source in blue, target in red). DIRL compactly clusters the class distributions across datasets for transfer learning in comparison to DANN and source only transfer.
Normalized average test accuracy on target domains of Digits datasets with unsupervised and semi-supervised domain adaptation. DIRL performs well across all target domains in comparison to other baselines.
Vision-Based Decluttering by Sim-to-Real Transfer
DIRL aligns marginal and conditional distributions of source and target domains, and uses a soft metric learning triplet loss to make the feature distributions disjoint in a shared feature space.
Performance evaluation of domain-invariant object recognition by sim-to-real transfer on target test set using mean Average Precision (mAP), classification accuracy on synthetic test images sim_eval, real test images real_eval and silhouette score SS. DIRL performs better among other compared approaches across both domains.
(left) Experimental setup for decluttering objects into bins with HSR, (right) Object recognition and grasp planning model output on a simulated image on (top) and real image on (bottom).
Toyota HSR picks 86.5 % of objects over 200 attempts with the object recognition network concatenated with the grasp planning network, in comparison to 76.2 % accuracy with the PCA baseline.
For further inquiries, please reach out at: ajay.tanwani at berkeley.edu