๐Ÿ“„

SAMSum Corpus

Official

samsum ยท Datasets at Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/datasets/samsum

Models

Models - Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/models?search=samsum

Papers

SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

https://arxiv.org/pdf/1911.12237v2.pdf

Calibrating Sequence likelihood Improves Conditional Language Generation

https://arxiv.org/pdf/2210.00045v1.pdf

UL2: Unifying Language Learning Paradigms

https://arxiv.org/pdf/2205.05131v2.pdf

SummerTime: Text Summarization Toolkit for Non-experts

https://arxiv.org/pdf/2108.12738v2.pdf

Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization

https://arxiv.org/pdf/2105.12544v2.pdf

MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization

https://arxiv.org/pdf/2103.06410v2.pdf

Summary

This paper introduces the Samsung Abstractive Messenger Summarization Corpus, a new dataset with abstractive dialogue summaries. We investigate the challenges it poses for automated summarization by testing several models and comparing their results with those obtained on a corpus of news articles.

Inspired by the Lead-n model, we propose a few different simple models:

Experiments

GitHub - abisee/pointer-generator: Code for the ACL 2017 paper "Get To The Point: Summarization with Pointer-Generator Networks"
Note: this code is no longer actively maintained. However, feel free to use the Issues section to discuss the code with other users. Some users have updated this code for newer versions of Tensorflow and Python - see information below and Issues section.
https://github.com/abisee/pointer-generator
GitHub - OpenNMT/OpenNMT-py: Open Source Neural Machine Translation in PyTorch
OpenNMT-py is the PyTorch version of the OpenNMT project, an open-source (MIT) neural machine translation framework. It is designed to be research friendly to try out new ideas in translation, summary, morphology, and many other domains. Some companies have proven the code to be production ready. We love contributions!
https://github.com/OpenNMT/OpenNMT-py
GitHub - ChenRocks/fast_abs_rl: Code for ACL 2018 paper: "Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting. Chen and Bansal"
This repository contains the code for our ACL 2018 paper: Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting .
https://github.com/ChenRocks/fast_abs_rl
GitHub - glample/fastBPE: Fast BPE
C++ implementation of Neural Machine Translation of Rare Words with Subword Units, with Python API.
https://github.com/glample/fastBPE
py-rouge
A full Python implementation of the ROUGE metric, producing same results as in the official perl implementation. The original Porter stemmer in NLTK is slightly different than the one use in the official ROUGE perl script as it has been written by end. Therefore, there might be slightly different stems for certain words.
https://pypi.org/project/py-rouge/

Performance

Further Readings

Exploring Datasets And Approaches For Conversational Summarization
In this blog we will be exploring the available datasets. The AMI Meeting Corpus consists of 100 hours of meeting recordings. The recordings use a range of signals synchronized to a common timeline. These include close-talking and far-field microphones, individual and room-view video cameras, and output from a slide projector and an electronic whiteboard.
https://ashwinpathak20.github.io/multimodal/2021/10/blog-post-13/
Papers with Code - SAMSum Corpus Dataset
A new dataset with abstractive dialogue summaries.
https://paperswithcode.com/dataset/samsum-corpus
Conversational Summarization with Natural Language Processing
Imagine a world where car companies release better and faster cars every month, but they have a completely different set of controls, and they don't give anyone instructions on how to operate them. This is the current state of data science and machine learning in natural language processing.
https://medium.com/rocket-mortgage-technology-blog/conversational-summarization-with-natural-language-processing-c073a6bcaa3a
NLP2019
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
http://pfliu.com/pl-summarization/summ_data.html