Few-shot In-Context Preference Learning Using Large Language Models: Environment Details

3 Dec 2024

Table of Links

A. Appendix

A.1. Full Prompts and A.2 ICPL Details

In Table 4, we present the observation and action dimensions, along with the task description and task metrics for 9 tasks in IsaacGym.

Table 4: Details of IsaacGym Tasks.

Authors:

(1) Chao Yu, Tsinghua University;

(2) Hong Lu, Tsinghua University;

(3) Jiaxuan Gao, Tsinghua University;

(4) Qixin Tan, Tsinghua University;

(5) Xinting Yang, Tsinghua University;

(6) Yu Wang, with equal advising from Tsinghua University;

(7) Yi Wu, with equal advising from Tsinghua University and the Shanghai Qi Zhi Institute;

(8) Eugene Vinitsky, with equal advising from New York University ([email protected]).

This paper is available on arxiv under CC 4.0 license.

ICPL Baseline Methods: Disagreement Sampling and PrefPPO for Reward Learning

Tracking Reward Function Improvement with Proxy Human Preferences in ICPL