Li Jiang's Homepage

Publications (* indicates equal contribution)

Full publication list on Google Scholar.

LLM

Preprint 2026
Self-Refined Distillation efficiency
Li Jiang, Haoran Xu, Yichuan Ding, Amy Zhang
Submitted to NeurIPS 2026
TL;DRDistills a teacher into a student through iterative self-refinement, tightening the student's output distribution without external supervision.
Preprint 2026
LLM Human Response Alignment: A Multi-Sample Debiasing Framework safety
Li Jiang, Xiao Liu
Submitted to NeurIPS 2026
TL;DRDebiases LLM alignment by aggregating multiple sampled responses, reducing single-sample preference noise during human-feedback training.
ICML 2026
Which Heads Matter for Reasoning? RL-Guided KV Cache Compression efficiency
Wenjie Du, Li Jiang, Keda Tao, Xue Liu, Huan Wang
TL;DRUses RL as a probe to identify reasoning-critical attention heads, then aggressively compresses non-critical KV caches — 20–50% cache reduction with minimal accuracy drop.
COLM 2024
Hummer: Towards Limited Competitive Preference Dataset safety
Li Jiang*, Yusen Wu*, Junwu Xiong, Jingqing Ruan, Yichuan Ding, Qingpei Guo, Zujie Wen, Jun Zhou, Xiaotie Deng
TL;DRReduces conflicting preference signals across alignment objectives via a low-competition preference dataset, improving multi-attribute RLHF stability.

Reinforcement Learning

ICLR 2023
Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization offline-rl
Haoran Xu, Li Jiang, Jianxiong Li, Zhuoran Yang, Zhaoran Wang, Victor Wai Kin Chan, Xianyuan Zhan
TL;DRDerives a unified in-sample offline RL framework from implicit value regularization, avoiding queries on out-of-distribution actions while matching SOTA on D4RL.
Notable Top 5%
NeurIPS 2022
A Policy-Guided Imitation Approach for Offline Reinforcement Learning offline-rl
Li Jiang*, Haoran Xu*, Jianxiong Li, Xianyuan Zhan
TL;DRDecouples offline RL into a guide-policy that plans optimal next states and an execute-policy that imitates them, sidestepping value extrapolation error.
Oral, Top 2%
ICC 2023
An Efficient Multi-Agent Optimization Approach for Coordinated MIMO Beamforming multi-agent
Li Jiang, Xiangsen Wang, Aidong Yang, Ye Ouyang, Xianyuan Zhan
TL;DRCasts coordinated MIMO beamforming as cooperative multi-agent RL, achieving higher sum-rate than convex-optimization baselines with low online cost.
NeurIPS 2023
Exploiting Fundamental Symmetry for Sample-Efficient Offline RL offline-rl
Peng Cheng, Zhihao Wu, Wenjia Zhang, Shoucheng Song, Han Wang, Youfang Lin, Li Jiang
TL;DRLeverages time-reversal symmetry of dynamics as a self-supervised signal, boosting sample efficiency of offline RL under small-data regimes.
IEEE Transactions On Games 2023
Curriculum Goal-conditioned Imitation for Offline Reinforcement Learning offline-rl
Li Jiang, Xiaoyun Feng, Xudong Yu, Haoran Xu, Xiaoyan Sun, Jie Wang, Xianyuan Zhan
TL;DRTrains a goal-conditioned policy via curriculum-relabeled imitation, enabling stable offline RL on long-horizon goal-reaching tasks.