开源日报每天推荐一个 GitHub 优质开源项目和一篇精选英文科技或编程文章原文，坚持阅读《开源日报》，保持每日学习的好习惯。
今日推荐开源项目：《欲知此事须躬行 simple-computer》
今日推荐英文原文：《Getting to Know Natural Language Understanding》

今日推荐开源项目：《欲知此事须躬行 simple-computer》传送门：GitHub链接
推荐理由：But How Do It Know?——这是一本介绍计算机如何工作的书。而这个项目则是项目作者为了模拟书中提到的计算机而创造的。虽然自己从简单到复杂的实现它是相当麻烦的一件事情，但是从中可以学到一些只有在实践中才能获得的知识，在获得新知识之后，在实践中检验它是最好的。纸上得来终觉浅，欲知此事须躬行。

今日推荐英文原文：《Getting to Know Natural Language Understanding》作者：#ODSC - Open Data Science
原文链接：https://medium.com/@ODSC/getting-to-know-natural-language-understanding-f18a0dc5c97d
推荐理由：关于自然语言处理的简介

Getting to Know Natural Language Understanding

We like to imagine talking to computers the way Picard spoke to Data in Next Generation, but in reality, natural language processing is more than just teaching a computer to understand words. The subtext of how and why we use the words we do is notoriously difficult for computers to comprehend. Instead of Data, we get frustrations with our assistants and endless SNL jokes.

Related article: An Introduction to Natural Language Processing (NLP)：https://opendatascience.com/an-introduction-to-natural-language-processing-nlp/

The Challenges of AI Language Processing

Natural Language Understanding (NLU) is a subfield of NLP concerned with teaching computers to comprehend the deeper contextual meanings of human communication. It’s considered an AI-hard problem for a few notable reasons. Let’s take a look at why computers can win chess matches against world champions and calculate billions of bits of data in seconds but can’t seem to grasp sarcasm.

Humans Make Mistakes

The first obstacle is teaching a computer to understand despite typos and misspellings. Humans aren’t always accurate in what they write, but a simple typo that you could skip right over without missing a beat could be enough to trip up the filters for computer understanding.

Human Speech Requires Context

We mentioned sarcasm above, but understanding the true meaning of utterances requires a strong understanding of context. Not only do sarcastic replies affect the outcome but not every negative utterance involves the presence of an explicitly negative word. To ask “How was lunch?” and receive a reply “I spend the entire time waiting at the doctor” is clear to you (lunch was bad) but not necessarily to a computer trained to search for negative words (no, not for example).

Human Language is Irregular

Language understanding also requires input from variances in the same language. British English and American English have overall similarities, but a few things different, including spelling and meaning, can trip up a computer. And those are just two of the many, many versions of English, which in itself is a non-standard language and still remains the most parsed language in all of NLP. What about the others?

Related article: The Promise of Retrofitting: Building Better Models for Natural Language Processing：https://opendatascience.com/models-for-natural-language-processing/

What Is Natural Language Understanding?

Natural Language Processing is the system we use to handle machine/human interactions, but NLU is a bit more narrow than that. When you’re in doubt, use NLU to refer to the simple act of machines understanding what we say.

NLU is post-processing. Once your algorithms have scrubbed the text, adding part of speech tagging, for example, you begin to work with the real context of what’s going on. This post-processing is what starts to reveal to the computer the true meanings of text and not just surface understanding.

NLU is a huge problem and an ongoing research area because the ability of computers to recognize and process human language at human-like accuracy has an enormous possibility. Computers could finally stand in for low paid customer service agents, capable of understanding human speech and its intent.

In language teaching, students often complain that they can understand their teacher’s language, but that understanding doesn’t transfer when they walk outside the classroom. Computers are similar to these language students. When researchers formulate test texts, for example, they may unconsciously formulate them in ways that avoid those three common problems above, a luxury not afforded in a real-world context. A Twitter user isn’t going to scrub tweets of misspellings and ambiguous language before publishing, but that’s precisely what the computer must understand.

The subfield relies heavily on both training lexicons and semantic theory. We can quantify semantics to an extent as long as we have large amounts of training data to provide context. As computers consume this training data, deep learning begins to make sense of intent.

The biggest draw for NLU is a computer’s ability to interact with humans unsupervised. The algorithms classify speech into a structured ontology, but AI takes over to organize the intent behind the words. This method of deep learning allows computers to learn context and create rules based on more substantial amounts of input through training.

What Are The Implications?

Aside from everyone having their very own Data? Cracking Natural Language Understanding is the key piece of computers learning to understand human language without extraordinary intervention from humans themselves.

NLU can be used to provide predictive insights for businesses by analyzing the unstructured data feeds of things like news reports, for example. This capability is especially true in areas such as high-frequency trading where trades are handled by automated systems.

Unlocking NLU also rockets our AI assistants like Siri and Alexa into what finally counts as real human interaction. Siri still contains numerous errors exploited for humor by places like SNL, and those errors plague developers in search of human-like accuracy. If developers want off the SNL joke series, cracking AI is the key.

Humans are still reigning champions for understanding language despite roadblocks (mispronunciations, misspellings, colloquialisms, implicit meaning), but the NLU problem could unlock the final door we need for machines to step up to our level.

下载开源日报APP：https://openingsource.org/2579/
加入我们：https://openingsource.org/about/join/
关注我们：https://openingsource.org/about/love/

simple-computer

开源日报第436期：《欲知此事须躬行 simple-computer》