RTI International’s gobbli bridges natural language processing techniques and real-world challenges to innovate in the text classification field.
How did we get here?
Imagine having to completely relearn how to read every time you picked up a new book. Until recently, this was the process used by machine learning algorithms to solve problems like sentiment analysis or document classification. These algorithms—unable to carry information between tasks or understand anything about general language—were only capable of solving the narrow problem defined by the data they learned from.
In the last year, there’s been a complete paradigm shift in how these problems are solved, known by experts as the ImageNet moment. For example, computer vision now allows machines to detect pneumonia from X-rays, classify diabetic retinopathy in fungus photographs, and catalog satellite imagery into residential areas for surveys. But while the computer vision applications have had time to flourish, the world of text is just getting a first taste of potential improvements.
At the heart of recent improvements is a concept known as transfer learning. In machine learning, an algorithm uses patterns in example data to learn statistical rules that associate an input with an output. For each new task, a model has to relearn unique rules for that specific task and can’t carry information between problems. Transfer learning solves this problem by first modeling how an entire language works and then fine-tuning this language model for specific tasks. In this way, the algorithm is able to learn from text similar to how humans learn: using our general understanding of language applied to a specific task, without having to relearn every time we start a new task.