DialogSum
Official
GitHub - cylnlp/dialogsum: DialogSum: A Real-life Scenario Dialogue Summarization Dataset - Findings of ACL 2021
DialogSum is a large-scale dialogue summarization dataset, consisting of 13,460 dialogues with corresponding manually labeled summaries and topics. You can directly download the data from this fold, or from the Hugging face Dataset. This work is accepted by ACL findings 2021. You may find the paper here.
Summary
DialogSum is a large-scale labeled dialogue summarization dataset. Dialogue data are collected from three public dialogue corpora, namely DailyDialog, DREAM and MuTual, as well as an English-speaking practice website. These datasets contain face-to-face spoken dialogues that cover a wide range of daily-life topics, including schooling, work, medication, shopping, leisure, travel. Most conversations take place between friends, colleagues, and between service providers and customers. We clean and preprocess the dialogue data into a unified format, and ask annotators to summarize them from an observer perspective. Topics are also manually labeled for each dialogue.



Comparison




Experiments



Performance





Further Readings
Papers with Code - DialogSum Dataset
DialogSum is a large-scale dialogue summarization dataset, consisting of 13,460 dialogues with corresponding manually labeled summaries and topics. This work is accepted by ACL findings 2021. You may find the paper here: https://arxiv.org/pdf/2105.06762.pdf. If you want to use our dataset, please cite our paper.

Summarization - OpenNMT-py documentation
Note: The process and results below are presented in the paper Bottom-Up Abstractive Summarization . Please consider citing it if you follow these instructions. This document describes how to replicate summarization experiments on the CNN-DM and gigaword datasets using OpenNMT-py. In the following, we assume access to a tokenized form of the corpus split into train/valid/test set.
https://opennmt.net/OpenNMT-py/examples/Summarization.html