Synthesizer: Rethinking Self-Attention in Transformer Models

By Yi Tay, Dara Bahri, Donald Metzler et al (Google Research), 2020

Contrary to the common consensus that self-attention is largely responsible for the superior performance of Transformer models on various NLP tasks, this paper suggests that substituting outputs of self-attention layers with random or simply synthesized data is sufficient to achieve similar results with better efficiency.

(more…)

Continue Reading

Supervised Contrastive Learning

By Prannay Khosla, Piotr Teterwak, Chen Wang et al (Google Research), 2020

The authors use contrastive loss, which has recently been shown to be very effective at learning deep neural network representations in the self-supervised setting, for supervised learning, and achieve better results than those obtained with cross entropy loss for ResNet-50 and ResNet-200.

(more…)

Continue Reading

ResNeSt: Split-Attention Networks

By Hang Zhang1, Chongruo Wu2, Zhongyue Zhang1 et al (1Amazon and 2UC Davis), 2020

The authors suggest a new ResNet-like network architecture that incorporates attention across groups of feature maps. Compared to previous attention models such as SENet and SKNet, the new attention block applies the squeeze-and-attention operation separately to each of the selected groups, which is done in a computationally efficient way and implemented in a simple modular structure.

(more…)

Continue Reading

ELECTRA: Pre-training Text Encoders As Discriminators Rather Than Generators

By Kevin Clark1, Minh-Thang Luong2, Quoc V. Le2, and Christopher D. Manning1 (1Stanford University and 2Google Brain), 2020

This paper describes a new training approach for Transformer network architectures used for language modeling tasks. The authors demonstrate that their technique results in greatly improved training efficiency and better performance on common benchmark datasets (GLUE, SQuAD) compared to other state-of-the-art NLP models of similar size.

(more…)

Continue Reading

N-BEATS: Neural Basis Expansion Analysis For Interpretable Time Series Forecasting

By Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados (Element AI), and Yoshua Bengio (MILA), 2019

This paper presents a block-based deep neural architecture for univariate time series point forecasting that is similar to very deep models (e.g. ResNet) used in more common deep learning applications such as image recognition. Furthermore, the authors demonstrate how their approach can be used to build predictive models that are interpretable.

(more…)

Continue Reading

Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network

By Jungkyu Lee, Taeryun Won, and Kiho Hong, Clova Vision, NAVER Corp, 2019

A great review of many state-of-the-art tricks that can be used to improve the performance of a deep convolutional network (ResNet), combined with actual implementation details, source code, and performance results. A must read for all Kaggle competitors or anyone who wants to achieve maximum performance on computer vision tasks.

(more…)

Continue Reading

No more content

No more pages to load