📄

BART

Official

GitHub - facebookresearch/GENRE: Autoregressive Entity Retrieval
The GENRE (Generative ENtity REtrieval) system as presented in Autoregressive Entity Retrieval implemented in pytorch. The mGENRE system as presented in Multilingual Autoregressive Entity Linking Please consider citing our works if you use code from this repository. In a nutshell, (m)GENRE uses a sequence-to-sequence approach to entity retrieval (e.g., linking), based on fine-tuned BART architecture or mBART (for multilingual).
https://github.com/facebookresearch/GENRE

Summary

BART is a denoising autoencoder for pre-training sequence-to-sequence models. BART is trained by corrupting text with an arbitrary noise function and learning a model to reconstruct the original text.

It uses a standard Transformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and many other more recent pre-training schemes.

BART also presents a new scheme for machine translation where a BART model is stacked above a few additional transformer layers. These layers are trained to essentially translate the foreign language to noised English, by propagation through BART, thereby using BART as a pre-trained target-side language model.

Architecture

It uses standard sequence-to-sequence Transformer architecture except, following GPT, they modify ReLU activation functions to GeLUs and initialise parameters from N(0,0.02). The architecture is closely related to BERT with the following differences:

Pre-Training Task

BART pre-trains a model combining Bidirectional and Auto-Regressive Transformers. Pre-training has two stages:

BART optimizes a reconstruction loss - the cross-entropy between the decoder’s output and the original document. BART allows us to apply any type of document corruption.

Experiments

Performance

We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme where spans of text are replaced with a single mask token.

Fine-Tuning

The representations produced can be used in several ways for downstream applications such as sequence classification, token classification, sequence generation and machine translation.

Further Readings

Text Summarization using Facebook BART Large CNN
Text Summarization using Facebook BART Large CNN text summarization is a natural language processing (NLP) technique that enables users to quickly and accurately summarize vast amounts of text without losing the crux of the topic. We've all read articles and other lengthy writings that completely divert our attention from the topic at hand because of a tonne of extraneous information.
https://techblog.geekyants.com/text-summarization-using-facebook-bart-large-cnn
BART: Are all pretraining techniques created equal?
Paper summary: BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , Oct. 2019. ( link) In this paper, Lewis et al. present valuable comparative work on different pre-training techniques and show how this kind of work can be used to guide large pre-training experiments reaching state-of-the-art (SOTA) results.
https://dair.ai/BART-Summary/
paper summary: "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation...
While the model architecture is quite simple, the key contribution of this work is an elaborate experimentation on the various pretraining tasks. While many other papers were about "oh we used this pretraining task along with others and got better performance! WOW", this paper is more about "from all those many pretraining tasks, which are really helpful and effective?"
https://medium.com/mlearning-ai/paper-summary-bart-denoising-sequence-to-sequence-pre-training-for-natural-language-generation-69e41dfbb7fe
Transformers BART Model Explained for Text Summarization
Generalizing BERT (due to the bidirectional encoder) and GPT2 (with the left to right decoder) - Enter the World of a Mysterious new Seq2Seq Model - BART Models Abstractive Text Summarization using Transformers-BART Model Downloadable solution code | Explanatory videos | Tech Support Start Project HuggingFace Transformer models provide an easy-to-use implementation of some of the best performing models in natural language processing.
https://www.projectpro.io/article/transformers-bart-model-explained/553
Introducing BART
For the past few weeks, I worked on integrating BART into transformers. This post covers the high-level differences between BART and its predecessors and how to use the new BartForConditionalGeneration to summarize documents. Leave a comment below if you have any questions! In October 2019, teams from Google and Facebook published new transformer papers: T5 and BART.
https://sshleifer.github.io/blog_v2/jupyter/2020/03/12/bart.html