Thursday, October 28, 2021
NLP is a key component in many data science systems that must understand or reason about text. This hands-on tutorial uses the open-source Spark NLP library to explore advanced NLP in Python. Spark NLP provides state-of-the-art accuracy, speed, and scalability for language understanding by delivering production-grade implementations of some of the most recent research in applied deep learning. It's the most widely used NLP library in the enterprise today. You'll edit and extend a set of executable Python notebooks by implementing these common NLP tasks: named entity recognition, sentiment analysis, spell checking and correction, document classification, and multilingual and multi domain support. The discussion of each NLP task includes the latest advances in deep learning used to tackle it, including the prebuilt use of BERT embeddings within Spark NLP, using tuned embeddings, and 'post-BERT' research results like XLNet, ALBERT, and roBERTa. Spark NLP builds on the Apache Spark and TensorFlow ecosystems, and as such it's the only open-source NLP library that can natively scale to use any Spark cluster, as well as take advantage of the latest processors from Intel and Nvidia. You'll run the notebooks locally on your laptop, but we'll explain and show a complete case study and benchmarks on how to scale an NLP pipeline for both training and inference.