Natural Language Processing

Natural language processing (NLP) is a significant subfield of machine learning, which deals with the interactions between machine (computer) and human (natural) languages. Natural languages are not limited to speech and conversation. They can be in writing and sign languages as well. Nowadays, NLP has been broadly involved in our daily lives: we cannot live without machine translation; weather forecast scripts are automatically generated; we find voice search convenient; we get the answer to a question quickly thanks to the intelligent question answering system; speech-to-text technology helps students with special needs.

The post is taken from the book Python Machine Learning By Example by Packt Publishing written by Yuxi (Hayden) Liu. In this book, you will learn the fundamentals of machine learning and master the art of building your own machine learning systems with an example-based practical guide.

In this post, you will learn about NLP and its powerful NLP libraries in Python.

Understanding a language might be difficult, but would it be easier to automatically translate texts from one language to another? In my first ever programming course, the lab booklet had the algorithm for coarse machine translation. We can imagine that this type of translation involved consulting dictionaries and generating new text. A more practically feasible approach would be to gather texts that are already translated by humans and train a computer program on these texts. In 1954, scientists claimed in the Georgetown–IBM experiment that machine translation would be solved within three to five years.

Unfortunately, a machine translation system that can beat human translators doesn’t exist yet. But machine translation has been greatly evolving since the introduction of deep learning.

Conversational agents or chatbots are another hot topic in NLP. The fact that computers are able to have a conversation with us has reshaped the way businesses are run. In 2016, Microsoft's AI chatbot Tay was unleased to mimic a teenage girl and converse with users on Twitter in real time. She learned how to speak from what users posted and commented on Twitter. However, she was overwhelmed by tweets from trolls and automatically learned their bad behavior and started to output inappropriate things on her feeds. She ended up being terminated within 24 hours.

An important use case for NLP at a much lower level compared to the previous cases is part of speech tagging. A part of speech (POS) is a grammatical word category such as a noun or verb. Part of speech tagging tries to determine the appropriate tag for each word in a sentence or a larger document. The following table gives examples of English POS:

Part of speech

Examples

Noun

David, machine

Pronoun

Them, her

Adjective

Awesome, amazing

Verb

Read, write

Adverb

Very, quite

Preposition

Out, at

Conjunction

And, but

Interjection

Unfortunately, luckily

Article

A, the

 

Touring powerful NLP libraries in Python

After a short list of real-world applications of NLP, we will now be learning the essential stack of Python NLP libraries. These packages handle a wide range of NLP tasks as mentioned above as well as others such as sentiment analysis, text classification, named entity recognition, and many more.

The most famous NLP libraries in Python include Natural Language Toolkit (NLTK), Gensim and TextBlob. The scikit-learn library also has NLP related features. NLTK was originally developed for education purposes and is now being widely used in industries as well. There is a saying that you can't talk about NLP without mentioning NLTK. It is the most famous and leading platform for building Python-based NLP applications. We can install it simply by running the sudo pip install -U nltk command in Terminal.

NLTK comes with over 50 collections of large and well-structured text datasets, which are called corpora in NLP. Corpora can be used as dictionaries for word occurrences checking and as training pools for model learning and validating. Some useful and interesting corpora include Web text corpus, Twitter samples, Shakespeare corpus sample, Sentiment Polarity, Names corpus (it contains lists of popular names, which we will be exploring very shortly), Wordnet, and the Reuters benchmark corpus. The full list can be found here. Before using any of these corpus resources, we first need to download it by running the following scripts in Python interpreter:

>>> import nltk

>>> nltk.download()

A new window will pop up and ask us which package or specific corpus to download:

 

Installing the whole package, which is popular, is strongly recommended since it contains all important corpora needed for our current study and future research. Once the package is installed, we can now look at its Names corpus: First, import the corpus:

>>> from nltk.corpus import names

The first ten names in the list can be displayed with the following:

>>> print names.words()[:10]

[u'Abagael', u'Abagail', u'Abbe', u'Abbey', u'Abbi', u'Abbie',

u'Abby', u'Abigael', u'Abigail', u'Abigale']

There are in total 7,944 names:

>>> print len(names.words())

7944

Other corpora are also fun to explore.

Besides the easy-to-use and abundant corpora pool, more importantly, NLTK is responsible for conquering many NLP and text analysis tasks, including the following:

  • Tokenization: Given a text sequence, tokenization is the task of breaking it into fragments separated with whitespaces. Meanwhile, certain characters are usually removed, such as punctuations, digits, emoticons. These fragments are the so-called tokens used for further processing. Moreover, tokens composed of one word are also called unigrams in computational linguistics; bigrams are composed of two consecutive words, trigrams of three consecutive words, and n-grams of n consecutive words. Here is an example of tokenization:

 

  • POS tagging: We can apply an off-the-shelf tagger or combine multiple NLTK taggers to customize the tagging process. It is easy to directly use the built-in tagging function pos_tag, as in pos_tag(input_tokens) for instance. But behind the scene, it is actually a prediction from a prebuilt supervised learning model. The model is trained based on a large corpus composed of words that are correctly tagged.
  • Named entities recognition: Given a text sequence, the task of named entities recognition is to locate and identify words or phrases that are of definitive categories, such as names of persons, companies, and locations. We will briefly mention it again in the next chapter.
  • Stemming and lemmatization: Stemming is a process of reverting an inflected or derived word to its root form. For instance, machine is the stem of machines, learning and learned are generated from learn. Lemmatization is a cautious version of stemming. It considers the POS of the word when conducting stemming. We will discuss these two text preprocessing techniques in further detail shortly. For now, let’s quickly look at how they are implemented respectively in NLTK:

First, import one of the three built-in stemmer algorithms (Lancaster Stemmer and SnowballStemmer are the rest two), and initialize a stemmer:

>>> from nltk.stem.porter import PorterStemmer

>>> porter_stemmer = PorterStemmer()

Stem machines, learning:

>>> porter_stemmer.stem('machines')

u'machin'

>>> porter_stemmer.stem('learning')

u'learn'

Note that stemming sometimes involves chopping off letters, if necessary, as we can see in machine.

Now import a lemmatization algorithm based on Wordnet corpus built-in, and initialize an lemmatizer:

>>> from nltk.stem import WordNetLemmatizer

>>> lemmatizer = WordNetLemmatizer()

Similarly, lemmatize machines, learning:

>>> lemmatizer.lemmatize('machines')

u'machine'

>>> lemmatizer.lemmatize('learning')

'learning'

Why learning is unchanged? It turns out that this algorithm only lematizes on nouns by default.

Gensim, developed by Radim Rehurek, has gained popularity in recent years. It was initially designed in 2008 to generate a list of similar articles, given an article, hence the name of this library (generate similar to Gensim). It was later drastically improved by Radim Rehurek in terms of its efficiency and scalability. Again, we can easily install it via pip by running the command pip install --upgrade genism in terminal. Just make sure the dependencies NumPy and SciPy are already installed.

Gensim is famous for its powerful semantic and topic modeling algorithms. Topic modeling is a typical text-mining task of discovering the hidden semantic structures in a document. Semantic structure in plain English is the distribution of word occurrences. It is obviously an unsupervised learning task. What we need to do is feed in plain text and let the model figure out the abstract topics.

In addition to the robust semantic modelling methods, Gensim also provides the following functionalities:

  • Similarity querying, which retrieves objects that are similar to the given query object
  • Word vectorization, which is an innovative way to represent words while preserving word co-occurrence features
  • Distributed computing, which makes it feasible to efficiently learn from millions of documents

TextBlob is a relatively new library built on top of NLTK. It simplifies NLP and text analysis with easy-to-use built-in functions and methods and also wrappers around common tasks. We can install TextBlob by running the pip install -U textblob command in terminal.

Additionally, TextBlob has some useful features, which are not available in NLTK currently, such as spell checking and correction, language detection and translation.

Last but not least, scikit-learn provides all text processing features we need, such as tokenization, besides the comprehensive machine learning functionalities. Plus, it comes with a built-in loader for the 20 newsgroups dataset.

In this post, we learned about NLP and also about its powerful libraries in Python. To learn more about click-through prediction with Tree-based algorithms in Python, read our book Python Machine Learning By Example by Packt Publishing.

 

Analyzing Text Data in Just Two Lines of Code

Back in the days before the era — when a Neural Network was more of a scary, enigmatic mathematical curiosity than a powerful tool — there were surprisingly many relatively successful applications of classical mining algorithms in the Natural Language Processing Algorithms (NLP) domain. It seemed that problems like spam filtering or part of speech tagging could be solved using rather straightforward and interpretable models.

But not every problem can be solved this way. Simple models fail to adequately capture linguistic subtleties like context, idioms, or irony (though humans often fail at that one too). Algorithms based on overall summarization (e.g. bag-of-words) turned out to be not powerful enough to capture sequential nature of text , whereas n-grams struggled to model general context and suffered severely from a curse of dimensionality. Even HMM-based models had trouble overcoming these issues due to their their memorylessness.

First breakthrough – Word2Vec

One of the main challenges in language analysis is the method of transforming text into numerical input, which makes modeling feasible. It is not a problem in computer vision tasks due to the fact that in an image, each pixel is represented by three numbers depicting the saturations of three base colors. For many years, researchers tried numerous algorithms for finding so called embeddings, which refer, in general, to representing text as vectors. At first, most of these methods were based on counting words or short sequences of words (n-grams).

The initial approach to tackle this problem is one-hot encoding, where each word from the vocabulary is represented as a unique binary vector with only one nonzero entry. A simple generalization is to encode n-grams (sequence of n consecutive words) instead of single words. The major disadvantage to this method is very high dimensionality, each vector has a size of the vocabulary (or even bigger in case of n-grams) which makes modeling difficult. Another drawback to this approach is lack of semantic information. This means that all vectors representing single words are equidistant. In this embedding, space synonyms are just as far from each other as completely unrelated words. Using this kind of word representations unnecessarily makes tasks much more difficult as it forces your model to memorize particular words instead of trying to capture the semantics.

word2vec
Figure 1: Word2Vec representations of words projected onto a two-dimensional space.

The first major leap forward for natural language processing algorithm came in 2013 with the introduction of Word2Vec – a neural network based model used exclusively for producing embeddings. Imagine starting from a sequence of words, removing the middle one, and having a model predict it only by looking at context words (i.e. Continuous Bag of Words, CBOW). The alternative version of that model is asking to predict the context given the middle word (skip-gram). This idea is counterintuitive because such model might be used in information retrieval tasks (a certain word is missing and the problem is to predict it using its context), but that’s rarely the case. Instead, it turns out that if you initialize your embeddings randomly and then use them as learnable parameters in training CBOW or a skip-gram model, you obtain a vector representation of each word that can be used for any task. Those powerful representations emerge during training, because the model is forced to recognize words that appear in the same context. This way you avoid memorizing particular words, but rather convey semantic meaning of the word explained not by a word itself, but by its context.

In 2014 Stanford’s research group challenged Word2Vec with a strong competitor: GloVe. They proposed a different approach, arguing that the best way to encode semantic meaning of words in vectors is through global word-word co-occurrence matrix as opposed to local co-occurrences as in Word2Vec. As you can see in Figure 2 the ratio of co-occurrence probabilities is able to discriminate words when compared to the context word. It is around 1 when both target words co-occur very often or very rarely with the context word. Only when the context word co-occurs with one of the target words is the ratio either very small or very big. This is the intuition behind GloVe. The exact algorithm involves representing words as vectors in a way that their difference, multiplied by a context word, is equal to the ratio of the co-occurrence probabilities.

Figure 2: Ratios of co-occurence probabilities semantically discriminating words – motivation behid GloVe

 

Further improvements

Even though the new powerful Word2Vec representation boosted the performance of many classical algorithms, there was still a need for a solution capable of capturing sequential dependencies in a text (both long- and short-term). The first concept for this problem was so-called vanilla Recurrent Neural Networks (RNNs). Vanilla RNNs take advantage of the temporal nature of text data by feeding words to the network sequentially while using the information about previous words stored in a hidden-state.

RNN unrolled
Figure 3: A recurrent neural network. Image courtesy of an excellent Colah’s.

These networks proved very effective in handling local temporal dependencies, but performed quite poorly when presented with long sequences. This failure was caused by the fact that after each time step, the content of the hidden-state was overwritten by the output of the network. To address this issue, computer scientists and researchers designed a new RNN architecture called long-short term memory (LSTM). LSTM deals with the problem by introducing an extra unit in the network called a memory cell, a mechanism that is responsible for storing long term dependencies and several gatesresponsible for control of the information flow in the unit. How this works is at each time step, the forget gate generates a fraction which depicts an amount of memory cell content to forget. Next, the input gate determines how much of the input will be added to the content of the memory cell. Finally, the output gate decides how much of the memory cell content to generate as the whole unit’s output. All the gates act like regular neural network layers with learnable parameters, which means that over time, the network adapts and is better at deciding what kind of input is relevant for the task and what information can and should be forgotten.

LSTMs have actually been around since late 1990s, but they are quite expensive computationally and memory wise, so it is only recently, thanks to remarkable advances in hardware, that it became feasible to train LSTM networks in reasonable time. Nowadays, there exist many variations of LSTM such as mLSTM, which introduces multiplicative dependency on the input or GRU which, thanks to an intelligent simplification of the memory cell update mechanism, significantly decreased the number of trainable parameters.

After a short while it became clear that these models significantly outperform classic approaches, but researchers were hungry for more. They started to study the astounding success of Convolutional Neural Networks in Computer Vision and wondered whether those concepts could be incorporated into NLP. It quickly turned out that a simple replacement of 2D filters (processing a small segment of the image, e.g. regions of 3×3 pixels) with 1D filters (processing a small part of the sentence, e.g. 5 consecutive words) made it possible. Similarly to 2D CNNs, these models learn more and more abstract features as the network gets deeper with the first layer processing raw input and all subsequent layers processing outputs of its predecessor. Of course, a single word embedding (embedding space is usually around 300 dimensions) carries much more information than a single pixel, which means that it not necessary to use such deep networks as in the case of images. You may think of it as the embedding doing the job supposed to be done by first few layers, so they can be skipped. Those intuitions proved correct in experiments on various tasks. 1D CNNs were much lighter and more accurate than RNNs and could be trained even an order of magnitude faster due to an easier parallelization.

👀 Convolutional Neural Networks, were first used to solve Computer Vision problems and remain state-of-the-art in that space. Learn more about their applications and capabilities here.

Despite incredible contributions made by CNNs, the networks still suffered from several drawbacks. In a classic setup, a convolutional network consists of several convolutional layers which are responsible for creating so-called feature maps and a module transforming it into predictions. Feature maps are essentially high level features extracted from text (or image) preserving the location where it emerged in the text (or image). The prediction module performs aggregating operations on feature maps and either ignores the location of the feature (fully convolutional networks) or more commonly: learns where particular features appear most often (fully connected modules). The problem with these approaches arises for example in the Question Answering task, where the model is supposed to produce the answer given the text and a question. In this case, it is difficult and often unnecessary to store all information carried by the text in a single text, as is done by classic prediction modules. Instead, we would like to focus on a particle part of text where the most crucial information is stored for a particular question. This problem is addressed by Attention Mechanism, which weighs parts of the text depending on what may be relevant based on the input. This approach has also been found useful for classic applications like text classification or translation. Will Attention transform the NLP field? llya Sutskever, Co-founder and Research Director of OpenAI stated in an interview:

“I am very excited by the recently introduced attention models, due to their simplicity and due to the fact that they work so well. Although these models are new, I have no doubt that they are here to stay, and that they will play a very important role in the future of deep learning.”

llya Sutskever, OpenAI

Typical NLP problems

There are a variety of language tasks that, while simple and second-nature to humans, are very difficult for a machine. The confusion is mostly due to linguistic nuances like irony and idioms. Let’s take a look at some of the areas of NLP that researchers are trying to tackle (roughly in order of their complexity):

The most common and possibly easiest one is sentiment analysis. This is, essentially, determining the attitude or emotional reaction of a speaker/writer toward a particular topic (or in general). Possible sentiments are positive, neutral, and negative. Check out this great article about using Deep Convolutional Neural Networks for gauging sentiment in tweets. Another interesting experiment showed that a Deep Recurrent Net could the learn sentiment by accident.

sentiment prediction
Figure 4: Activation of a neuron from a net used to generate next character of text. It is clear that it learned the sentiment even though it was trained in an entirely unsupervised environment.

A natural generalization of the previous case is document classification, where instead of assigning one of three possible flags to each article, we solve an ordinary classification problem. According to a comprehensive comparison of algorithms, it is safe to say that Deep Learning is the way to go fortext classification.

Natural Language Processing Algorithms for Machine Translation

Now, we move on to the real deal: Machine Translation. Machine Translation has posed a serious challenge for quite some time. It is important to understand that this an entirely different task than the two previous ones we’ve discussed. For this task, we require a model to predict a sequence of words, instead of a label. Machine Translation makes clear what the fuss is all about with Deep Learning, as it has been an incredible breakthrough when it comes to sequential data. In this blog post you can read more about how — yep, you guessed it — Recurrent Neural Networks tackle translation, and in this one you can learn about how they achieve state-of-the-art results.

Say you need an automatic text summarization model, and you want it to extract only the most important parts of a text while preserving all of the meaning. This requires an algorithm that can understand the entire text while focusing on the specific parts that carry most of the meaning. This problem is neatly solved by previously mentioned attention mechanisms, which can be introduced as modules inside an end-to-end solution.

Lastly, there is question answering, which comes as close to Artificial Intelligence as you can get. For this task, not only does the model need to understand a question, but it is also required to have a full understanding of a text of interest and know exactly where to look to produce an answer. For a detailed explanation of a question answering solution (using Deep Learning, of course), check out this article.

attention mechanism rnn
Figure 5: Beautiful visualization of an attention mechanism in a recurrent neural network trained to translate English to French.

Since Deep Learning offers vector representations for various kinds of data (e.g., text and images), you can build models to specialize in different domains. This is how researchers came up with visual question answering. The task is “trivial”: just answer a question about an image. Sounds like a job for a 7-year-old, right? Nonetheless, deep models are the first to produce any reasonable results without human supervision. Results and a description of such a model are in this paper.

Natural Language Generation

You may have noticed that all of the above tasks share a common denominator. For sentiment analysis an article is always positive, negative or neutral. In document classification each example belongs to one class. This means that these problems belong to a family of problems called supervised learning. Where the model is presented with an example and a correct value associated with it. Things get tricky when you want your model to generate text.

Andrej Karpathy provides a comprehensive review of how RNNs tackle this problem in his excellent blog post. He shows examples of deep learning used to generate new Shakespeare novels or how to produce source code that seems to be written by a human, but actually doesn’t do anything. These are great examples that show how powerful such a model can be, but there are also real life business applications of these algorithms. Imagine you want to target clients with ads and you don’t want them to be generic by copying and pasting the same message to everyone. There is definitely no time for writing thousands of different versions of it, so an ad generating tool may come in handy.

RNNs seem to perform reasonably well at producing text at a character level, which means that the network predicts consecutive letters (also spaces, punctuation and so on) without actually being aware of a concept of word. However, it turned out that those models really struggled with sound generation. That is because to produce a word you need only few letters, but when producing sound in high quality, with even 16kHz sampling, there are hundreds or maybe even thousands points that form a spoken word. Again, researchers turned to CNNs and again with great success. Mathematicians at DeepMind developed a very sophisticated convolutional generative WaveNet model, which deals with a very large receptive field (length of the actual raw input) problem by using a so-called attrous convolutions, which increase the receptive field exponentially with each layer. This is currently the state-of-the-art model significantly outperforming all other available baselines, but is very expensive to use, i.e. it takes 90 seconds to generate 1 second of raw audio. This means that there is still a lot of room for improvement, but we’re definitely on the right track.

Recap

So, now you know. Deep Learning appeared in NLP relatively recently due to computational issues, and we needed to learn much more about Neural Networks to understand their capabilities. But once we did, it changed the game forever.

Read Source Article: Sigmoidal

Global Natural Language Processing (Nlp) Market Report presents an irreplaceable and sheer analysis for Natural Language Processing (Nlp) industry. The study report comprises evaluation of numerous influential factors including industry overview in terms of historic and present situation, key manufacturers, product/service application and types, key regions and marketplaces, forecast estimation for global market share, revenue and CAGR.

Get Global Natural Language Processing (Nlp) Market Research Sample Report : https://www.marketresearchexplore.com/report/global-natural-language-processing-nlp-industry-market-research-report/172385#enquiry

The report also sheds light on the evaluation of growth opportunities, challenges, market threats and constraining factors of the market. It studies local regional as well as global market and emerging segments, and market dynamics also. Additionally, it offers insight into the competitive landscape, market driving factors, industrial environment, and the latest and upcoming technological advancements to determine the overall scenario of industry and move forward to form lucrative business strategies effortlessly.

Major Players in Natural Language Processing (Nlp) Market are :

  • NetBase Solutions
  • Apple Incorporation
  • 3M
  • Microsoft Corporation
  • Verint Systems
  • IBM Incorporation
  • Dolbey Systems
  • SAS Institute Inc.
  • Google
  • HP

Most widely used downstream fields of Natural Language Processing (Nlp) Market covered in this report are :

  • Automotive
  • Healthcare
  • Banking Financial Services and Insurance (BFSI)
  • IT and Telecom
  • Defense & Aerospace
  • Others

Along with key manufacturers, their profiles, global market share, production volume, gross sales margin, revenue, manufacturing plants, and their capacities, material sourcing strategy, newly implemented technologies are also discussed in this report.

Browse Global Natural Language Processing (Nlp) Market Report at : https://www.marketresearchexplore.com/report/global-natural-language-processing-nlp-industry-market-research-report/172385

It is focused on active contenders in Natural Language Processing (Nlp) industry and provides analysis for their production methodologies, manufacturing plants, and capacities, product cost, raw material sources, value chain analysis, effective business plans, product/service distribution pattern. Player’s profiling including product specification, sales, gross margin, share in the global market, revenue, and CAGR also.

If you have any customized requirement need to be added regarding Natural Language Processing (Nlp) , we will be happy to include this free of cost to enrich the final study.

  • Triangulate with your Own Data
  • Gain a Deeper Dive on a Specific Application, Geography, Customer or Competitor
  • Get Data as per your Format and Definition
  • Any level of Personalization

Read Source Artice BY STANLEY DailyIndustryUpdates

#AI #NLP #NetBaseSolutions, #AppleIncorporation #DataAnalytics

A natural language processing system helped identify lumbar spine imaging findings, providing significant gains in model sensitivity, according to a study in Academic Radiology.

Researchers evaluated an NLP system built with open-source tools for identifying lumbar spine imaging findings related to low back pain on MRI and X-ray radiology reports from four health systems. The study authors selected 871 reports to form a reference-standard dataset, and four spine experts annotated the presences of 26 findings.  

The researchers calculated inter-rater agreement and finding prevalence from the annotated data, which was split into development (80 percent) and testing (20 percent) sets. The study authors developed an NLP system from both rule-based and machine-learned models, and the system with validated using accuracy metrics such as sensitivity, specificity and area under the receiver operating characteristic curve.

The researchers concluded the NLP system performed well in identifying the 26 lumbar spine findings. Machine-learned models provided substantial gains in model sensitivity with only a slight loss of specificity and overall higher AUC. 

More articles on spine:
3 spine surgeons share advice for improving clinical documentation
Spine surgery's biggest misconception debunked by Dr. Issada Thongtrangan
Leadership in a competitive market: Cleveland Clinic's Dr. Michael Steinmetz on the trends in spine today + the qualities he looks for in tomorrow's leaders

Written by Shayna Korol 

Read Source Article BeckersSpine

#AI #NLP #NaturalLanguageProcessing #Datasets

 

Page 1 of 2

© copyright 2017 www.aimlmarketplace.com. All Rights Reserved.

A Product of HunterTech Ventures