We propose DisCo for referring human dance generation, which can generate human dance images/videos with the following three properties: (a) Faithfulness: retaining the appearance of foreground (FG) and background (BG) in consistent to the reference image while precisely following the pose; (b) Generalizability: generalizable to unseen human subject FG, BG and pose; (c) Compositionality: adapting