Wednesday, September 15, 2021
Transformer-based models have been dominant in the NLP landscape due to their state of the art performance on a wide variety of benchmarks and tasks. However, deploying such large models at scale can be quite difficult and costly. Learn about the techniques that we've utilized at Stream to overcome these challenges and moderate real-time chat messages efficiently on relatively inexpensive hardware. While this talk will focus on the BERT and its offshoots, many of these techniques can also be applied to other models.