Saturday, September 14, 2013

Natural Language Processing - Making Your Computer Understand You

One of the more interesting branches of analytics is natural language processing (henceforth NLP). In English (haha!), NLP is getting computers to understand language. This is not a trivial task. Think about how many words there are, how many years it took you to master language, how difficult it can be to explain grammar rules, etc. Now make a computer understand all of that.

Despite the challenges of NLP, many victories have been made, and many amazing things developed. We haven't quite reached the era of a Star Trek-style computer, but our phones can now handle basic commands (sometimes), and technology like IBM's Watson are now being adapted to things beside Jeopardy. It's only a matter of a few years before something as powerful as Watson is available on your smartphone.

Such stuff may seem to be the realm of researches, far removed from the work of data scientists in the trenches of industry. But that's not the case at all. Tools like the Natural Language Toolkit (NLTK) make experimenting with NLP fun and easy (if you know Python), and there's a free book available on the site that even includes an intro to Python. And recently, Google open-sourced a a set of algorithms in an open source project they're calling word2vec. Word2vec requires some more technical weightlifting to get started with than the NLTK, but it's also some very cutting edge stuff -- algorithms fresh from the annals of Google Research. Word2vec is especially cool, because it will determine relationships between lexical ideas on it's own. GigaOm has more to say about why word2vec is especially fascinating.

In a few years, you might actually want to use Siri.

1 comment:

  1. This actually looks really fun... I wonder if we could incorporate it into one of our projects.