All articles

Applica at SemEval-2020 Task 11: On RoBERTa-CRF, Span CLS, and whether self-training helps them

SemEval is an ongoing series of evaluations on computational semantic systems that gathers the top scientists to compete in shared tasks every year. Currently Applica sits at the #1 and #2 spots on the leaderboard for SemEval’s 2020 task 11, an objective test aimed at detection of propaganda techniques in news articles.

The team that developed Applica’s 2D Contextual Awareness applied our proprietary technology in a similar manner for this project. The final paper details the winning systems that were involved for the propaganda Technique Classification (TC) task and the second-placed system for the propaganda Span Identification (SI) task.

Specifically, the purpose of the TC task was to identify an applied propaganda technique given propaganda text fragment. The goal of the SI task was to find specific text fragments which contain at least one propaganda technique.

Both of the developed solutions used a semi-supervised learning technique of self-training. Interestingly, although CRF is barely used with transformer-based language models, the SI task was approached with RoBERTa-CRF architecture.

An ensemble of RoBERTa-based models was proposed for the TC task, with one of them making use of the Span CLS layers we introduce in the paper. In addition to describing the submitted systems, an impact of architectural decisions and training schemes is investigated along with remarks regarding training models of the same or better quality with lower computational budget. Finally, the results of error analysis are also presented.

Download the full paper to discover detailed insights about this stimulating project. If you have questions or comments, please feel free to contact me.