Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

What Attention Dose Not Look At Is All You Need

5 minute read

Published:

Motivation

  • This post proposes a hypothesis about how BERT learns syntactic representations, mainly based on an analysis of BERT’s attention (Clark et al., 2019).
  • As shown in Figure 1, attention heads often focus on special tokens: early heads attend to [CLS]; middle heads attend to [SEP]; deep heads attend to periods and commas.
  • Inspired by this, we suppose there are three phases of heads focus on different representations.

portfolio

publications

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.