SAMSum Corpus

Official

Models

Papers

SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

https://arxiv.org/pdf/1911.12237v2.pdf

Calibrating Sequence likelihood Improves Conditional Language Generation

https://arxiv.org/pdf/2210.00045v1.pdf

UL2: Unifying Language Learning Paradigms

https://arxiv.org/pdf/2205.05131v2.pdf

SummerTime: Text Summarization Toolkit for Non-experts

https://arxiv.org/pdf/2108.12738v2.pdf

Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization

https://arxiv.org/pdf/2105.12544v2.pdf

MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization

https://arxiv.org/pdf/2103.06410v2.pdf

Summary

This paper introduces the Samsung Abstractive Messenger Summarization Corpus, a new dataset with abstractive dialogue summaries. We investigate the challenges it poses for automated summarization by testing several models and comparing their results with those obtained on a corpus of news articles.

Inspired by the Lead-n model, we propose a few different simple models:

MIDDLE-n, which takes n utterances from the middle of the dialogue

LONGEST-n, treating only n longest utterances in order of length as a summary

LONGER-THAN-n, taking only utterances longer than n characters in order of length (if there is no such long utterance in the dialogue, takes the longest one)

MOST-ACTIVE-PERSON, which treats all utterances of the most active person in the dialogue as a summary

Experiments

Performance

Further Readings

Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

http://pfliu.com/pl-summarization/summ_data.html