What is SMITH — Google’s Latest Natural Language Processing Algorithm
Meet SMITH, Google's latest advancement in natural language processing.
In November of 2019, we called BERT the Swiss army knife of natural language processing tools. Unlike many NLP “utensils” before it, BERT was the first algorithm to leverage bidirectional processing to define the meaning of a single word based on the contextual cues around it. (Bidirectional processing means it reads each combination of words from left-to-right and right-to-left)
As explained by Search Engine Land, BERT was able to understand the fundamental difference between the “to” in “nine-to-five” and the “to” in “quarter to nine.” Recently, Google shared a research paper on a new NLP algorithm called SMITH, one that can allegedly outperform BERT.
BERT may be the Swiss army knife of NLP in that it can mince words and uncork meanings. But SMITH is the Instant Pot — you throw in your ingredients, and it’ll cook the whole dang meal!
How Does SMITH Work?
In layman’s terms — SMITH does with passages what BERT does with words. It compares the sentences before, after, and even away from a given passage to better inform its interpretation of what it means.
What Does SMITH DO that BERT Can’t?
BERT understands words based on passage content.
SMITH understands passages based on document content.
BERT is adept at understanding conversational queries where the use and placement of a single word or preposition carries a lot of meaning. This is useful for short-to-short or short-to-long semantic matching, like ranking search results in order of relevance based on a short query, or when a chatbot answers your question with copy straight-out-of-the-website’s FAQs.
SMITH, on the other hand, is good at making long-to-long semantic connections. For a (very hypothetical and mildly dystopian) example, imagine an algorithm that can interpret the contents of the email you’re drafting and automatically predict which document in your Google Drive is the most-relevant attachment for it. That’s the type of connection that SMITH has the potential to make, given the right data set.
Will SMITH Replace BERT?
Don’t get us wrong, Instant Pots are great — but we wouldn’t trade our Swiss army knives for one of these handy appliances. Instead, we’d wait for the right promo code to come along and find a way to fit both into our budget.
Likewise, SMITH is unlikely to replace BERT. For various technical reasons that can be simply summarized by this statement from the research paper, “processing of long texts is more likely to trigger practical issues like out of TPU/GPU memories without careful model design.” NLP algorithms for long-to-long semantic matching take a lot of brain- and AI-power, which will likely make implementing SMITH on a large-scale challenging.
Additionally, given BERT’s effectiveness at short-to-short and short-to-long semantic matching, it is likely to remain in use. However, since the research clearly states that SMITH outperforms Google’s current state-of-the-art NLP, we won’t be surprised if/when it starts being implemented.
Long story short, BERT and SMITH are not mutually exclusive, and the possibility of them being used in tandem is a very real one. As usual, until we get confirmation from an internal resource at Google Search (ahemm…@JohnMu), we can’t be certain of if SMITH will be integrated into the algorithm or how it will be applied.
How to Write for SMITH
We can say for sure that Google is going to great lengths to better process natural language patterns. Perhaps this is because of the rise in voice search, the podcast pandemic, or both! Regardless, as search engines make strides towards processing content like we do, creating content that’s relevant, useful, and engaging for humans becomes increasingly vital.