SAMSum Corpus
Official

Models

Papers
SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization
Calibrating Sequence likelihood Improves Conditional Language Generation
UL2: Unifying Language Learning Paradigms
SummerTime: Text Summarization Toolkit for Non-experts
Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization
MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization
Summary
This paper introduces the Samsung Abstractive Messenger Summarization Corpus, a new dataset with abstractive dialogue summaries. We investigate the challenges it poses for automated summarization by testing several models and comparing their results with those obtained on a corpus of news articles.

Inspired by the Lead-n model, we propose a few different simple models:
- MIDDLE-n, which takes n utterances from the middle of the dialogue
- LONGEST-n, treating only n longest utterances in order of length as a summary
- LONGER-THAN-n, taking only utterances longer than n characters in order of length (if there is no such long utterance in the dialogue, takes the longest one)
- MOST-ACTIVE-PERSON, which treats all utterances of the most active person in the dialogue as a summary

Experiments



Performance




Further Readings




