DialogSum

Official

Summary

DialogSum is a large-scale labeled dialogue summarization dataset. Dialogue data are collected from three public dialogue corpora, namely DailyDialog, DREAM and MuTual, as well as an English-speaking practice website. These datasets contain face-to-face spoken dialogues that cover a wide range of daily-life topics, including schooling, work, medication, shopping, leisure, travel. Most conversations take place between friends, colleagues, and between service providers and customers. We clean and preprocess the dialogue data into a unified format, and ask annotators to summarize them from an observer perspective. Topics are also manually labeled for each dialogue.