Neural Networks

“Researchers have developed a new representation process on the rotational unit of RUM, a recurrent memory that can be used to solve a broad spectrum of the neural revolution in natural language processing.”

The sentence above was written by a natural language processing technique known as a rotational unit of memory, or RUM.

Welcome to the New New Journalism.

As AI algorithms are put to the test, most are found wanting. This is especially true in domains like natural language processing, which tends to be repetitive, brittle and a long way from the mellifluous tones of HAL, the malevolent exascale computer of 2001: A Space Odyssey infamy (aka, Canadian actor Douglas Rain).

The aforementioned researcher team toils at the Massachusetts Institute of Technology (which should have been mentioned in RUM’s summary!). Among them is Mićo Tatalović, a former Knight Science Journalism fellow at MIT and a former editor at New Scientist magazine.

While developing neural networks to assist with physics problems, the investigators realized their natural language processing approach to the physical world might also be useful for time-consuming tasks like scanning and summarizing scientific papers. What they came up with, and what they describe in the journal Transactions of the Association for Computational Linguistics, is RUM.

“We would notice that every once-in-awhile there is an opportunity to add to the field of AI because of something that we know from physics — a certain mathematical construct or a certain law in physics,” MIT physics professor Marin Soljačić, told human science writer, David L. Chandler. “We noticed that, hey, if we use that, it could actually help with this or that particular AI algorithm.”

The traditional neural networking approach to NLP usually involves techniques like gated recurrent units (GRU) and long short-term memory networks (LSTM). While those techniques have been tweaked, the mimicry of convolutional neural networks continues to fall short for applications like natural language processing.

Enter RUM. While GRU and LSTM rely on the multiplication of matrices to mimic the way humans learn, the MIT researchers point out that neural networks continue to struggle correlating information in large data sets.

In their physics research, they sought to overcome this neural net weakness by instead using a system based on what they described as “rotating vectors in multidimensional space.” In an NLP application, that meant representing each word of text as a vector, and a string of words swung the vector in a different direction in a theoretical space with many dimensions.

The final set of vectors can then be translated into a corresponding string of words. “RUM helps neural networks to do two things very well,” the researchers found. “It helps them to remember better, and it enables them to recall information more accurately.”

The output from RUM is the lead paragraph of this story.

We accept the challenge. Decide for yourself if you prefer RUM’s to our summation, presented in the true spirit of what we journalists call “burying the lead,” to wit:

MIT physics researchers have come across a way to improve AI algorithms through a variation on recurrent neural networks that promises to improve natural language processing for applications like scanning and summarizing scientific papers.

Source: Datanami

The work of a science writer, including this one, includes reading journal papers filled with specialized technical terminology, and figuring out how to explain their contents in language that readers without a scientific background can understand.

Now, a team of scientists at MIT and elsewhere has developed a neural network, a form of artificial intelligence (AI), that can do much the same thing, at least to a limited extent: It can read  and render a plain-English summary in a sentence or two.

Even in this limited form, such a neural network could be useful for helping editors, writers, and scientists scan a large number of papers to get a preliminary sense of what they're about. But the approach the team developed could also find applications in a variety of other areas besides language processing, including machine translation and speech recognition.

The work is described in the journal Transactions of the Association for Computational Linguistics, in a  by Rumen Dangovski and Li Jing, both MIT graduate students; Marin Soljačić, a professor of physics at MIT; Preslav Nakov, a senior scientist at the Qatar Computing Research Institute, HBKU; and Mićo Tatalović, a former Knight Science Journalism fellow at MIT and a former editor at New Scientist magazine.

From AI for physics to natural language

The work came about as a result of an unrelated project, which involved developing new artificial intelligence approaches based on , aimed at tackling certain thorny problems in physics. However, the researchers soon realized that the same approach could be used to address other difficult computational problems, including natural language processing, in ways that might outperform existing neural network systems.

"We have been doing various kinds of work in AI for a few years now," Soljačić says. "We use AI to help with our research, basically to do physics better. And as we got to be more familiar with AI, we would notice that every once in a while there is an opportunity to add to the field of AI because of something that we know from physics—a certain mathematical construct or a certain law in physics. We noticed that hey, if we use that, it could actually help with this or that particular AI algorithm."


This approach could be useful in a variety of specific kinds of tasks, he says, but not all. "We can't say this is useful for all of AI, but there are instances where we can use an insight from physics to improve on a given AI algorithm."

Neural networks in general are an attempt to mimic the way humans learn certain new things: The computer examines many different examples and "learns" what the key underlying patterns are. Such systems are widely used for pattern recognition, such as learning to identify objects depicted in photos.

But neural networks in general have difficulty correlating information from a long string of data, such as is required in interpreting a research paper. Various tricks have been used to improve this capability, including techniques known as long  (LSTM) and gated recurrent units (GRU), but these still fall well short of what's needed for real natural-language processing, the researchers say.

The team came up with an alternative system, which instead of being based on the multiplication of matrices, as most conventional neural networks are, is based on vectors rotating in a multidimensional space. The key concept is something they call a rotational unit of memory (RUM).

Essentially, the system represents each word in the text by a vector in multidimensional space—a line of a certain length pointing in a particular direction. Each subsequent word swings this vector in some direction, represented in a theoretical space that can ultimately have thousands of dimensions. At the end of the process, the final vector or set of vectors is translated back into its corresponding string of words.

"RUM helps neural networks to do two things very well," Nakov says. "It helps them to remember better, and it enables them to recall information more accurately."

After developing the RUM system to help with certain tough physics problems such as the behavior of light in complex engineered materials, "we realized one of the places where we thought this approach could be useful would be natural language processing," says Soljačić, recalling a conversation with Tatalović, who noted that such a tool would be useful for his work as an editor trying to decide which papers to write about. Tatalović was at the time exploring AI in science journalism as his Knight fellowship project.

"And so we tried a few natural language processing tasks on it," Soljačić says. "One that we tried was summarizing articles, and that seems to be working quite well."

The proof is in the reading

As an example, they fed the same research paper through a conventional LSTM-based neural network and through their RUM-based system. The resulting summaries were dramatically different.

The LSTM system yielded this highly repetitive and fairly technical summary: "Baylisascariasis," kills mice, has endangered the allegheny woodrat and has caused disease like blindness or severe consequences. This infection, termed "baylisascariasis," kills mice, has endangered the allegheny woodrat and has caused disease like blindness or severe consequences. This infection, termed "baylisascariasis," kills mice, has endangered the allegheny woodrat.

Based on the same paper, the RUM system produced a much more readable summary, and one that did not include the needless repetition of phrases: Urban raccoons may infect people more than previously assumed. 7 percent of surveyed individuals tested positive for raccoon roundworm antibodies. Over 90 percent of raccoons in Santa Barbara play host to this parasite.

Already, the RUM-based system has been expanded so it can "read" through entire research papers, not just the abstracts, to produce a summary of their contents. The researchers have even tried using the system on their own research paper describing these findings—the paper that this news story is attempting to summarize.

Here is the new neural network's summary: Researchers have developed a new representation process on the rotational unit of RUM, a recurrent memory that can be used to solve a broad spectrum of the neural revolution in natural language processing.

It may not be elegant prose, but it does at least hit the key points of information.

Çağlar Gülçehre, a research scientist at the British AI company Deepmind Technologies, who was not involved in this work, says this research tackles an important problem in neural networks, having to do with relating pieces of information that are widely separated in time or space. "This problem has been a very fundamental issue in AI due to the necessity to do reasoning over long time-delays in sequence-prediction tasks," he says. "Although I do not think this paper completely solves this problem, it shows promising results on the long-term dependency tasks such as question-answering, text summarization, and associative recall."

Gülçehre adds, "Since the experiments conducted and model proposed in this paper are released as open-source on Github, as a result many researchers will be interested in trying it on their own tasks. … To be more specific, potentially the approach proposed in this paper can have very high impact on the fields of  processing and reinforcement learning, where the long-term dependencies are very crucial."

Source: TechXplore

A human brain is not capable of solving complex data and cannot extract information from compound structures. To overcome this lack of ability to resolve complex problems, Warren McCulloch and Walter Pitts came up with a mathematical model. This model is called Artificial Neural Networks(ANN) which falls under Artificial Intelligence. The ANN is a computing system designed to replicate the way humans analyze and work. The processing of multiple data inputs is done by different machine learning algorithms. These algorithms work together under a single framework called the neural network. Neural networks are inspired by the structure of biological neural networks in a human brain. There is an input neuron which acts as an interface for all the other neurons to pass the input. Also, there is an output neuron which accepts all the outputs from different neurons.

Artificial Neural Networks are composed of many nodes. These nodes are connected to each other and function together, by passing information. They consist of a number of layers called Multi-Layer Perceptron(MLP). Here, each layer performs a different function on the received data. These layers include one input layer, one output layer, and one or more hidden layers. The basic data is received by the input layer. The hidden layers extract data from one set of neurons (input layer) and provide the output to another set of neurons (output layer), hence they remain hidden. The hidden layers, as they go deeper, capture all the minute details. This results in discovering various relationships between different inputs. Finally, the output layer provides a result which is simple and understandable.

The Neural Network is first given a set of high-level input data which the system reads and analyzes. The system then detects the properties of the inputs, layer by layer. In case of an image, the first layer may detect the contrast and the next layer may detect the texture. In a similar way, different attributes are detected by different layers. If the data of a cat is provided in the form of images then, the system starts to recognize cats from the rest of the inputs, provided for detecting. The network perceives better when a descriptive input or rules are also provided to it like, a cat has whiskers or has spots. It also adjusts its internal weightings to the answers provided to it. That is, the network adds weight to the input sources that provide the information, based on how reliable they are. This helps in improving its performance each time.

Types of Neural Networks:

The Neural Networks are divided into types based on the number of hidden layers they contain or how deep the network goes. Each type has its own levels of complexity and use cases. Few types of neural networks are Feed-forward neural network, Recurrent neural network, Convolutional neural network and Hopfield networks.

  • Feed-forward neural networks:
    Feed-forward neural network are the basic type of neural networks. The information in this network travels in a unidirectional manner, that is, only from input to processing node to output. The hidden layers may or may not be present in this type, making it more explicable.

  • Recurrent neural networks:
    Recurrent neural networks are much more complex and most widely used networks. The data flows in multiple directions in this network. They store the output data of the processing nodes and learn to improve their functioning.
  • Convolutional neural networks:
    Convolutional neural networks are the ones that are popular today due to their specialty in being able to perform face recognition. They allow encoding attributes into the input, by assuming it to be an image.

Advantages of Neural Networks:

  • Neural Networks have the ability to learn by themselves and produce the output that is not limited to the input provided to them.
  • The input is stored in its own networks instead of a database, hence the loss of data does not affect its working.
  • These networks can learn from examples and apply them when a similar event arises, making them able to work through real-time events.
  • Even if a neuron is not responding or a piece of information is missing, the network can detect the fault and still produce the output.
  • They can perform multiple tasks in parallel without affecting the system performance.

Applications of Neural Networks:

The Artificial Neural Network has been in existence from 1943, when it was initially designed, but has only recently come into light under Artificial Intelligence due to the applications that make it more preferable. These include:

Artificial Neural Networks are currently being used to solve many complex problems and the demand is increasing with time. The wide number of applications starting from face recognition to making decisions are being handled by neural networks.

The more it is exposed to real-time examples, the more it adapts. Neural Networks are capable of learning from faults thereby increasing its capacity to perform well. Hence, neural networks are being preferred more for complex problem-solving.

Source: MarkTechPost 

A new area in artificial intelligence involves using algorithms to automatically design machine-learning systems known as neural networks, which are more accurate and efficient than those developed by human engineers. But this so-called neural architecture search (NAS) technique is computationally expensive.

A state-of-the-art NAS algorithm recently developed by Google to run on a squad of graphical processing units (GPUs) took 48,000 GPU hours to produce a single convolutional neural network, which is used for image classification and detection tasks. Google has the wherewithal to run hundreds of GPUs and other specialized hardware in parallel, but that’s out of reach for many others.

In a paper being presented at the International Conference on Learning Representations in May, MIT researchers describe an NAS algorithm that can directly learn specialized convolutional neural networks (CNNs) for target hardware platforms — when run on a massive image dataset — in only 200 GPU hours, which could enable far broader use of these types of algorithms.

Resource-strapped researchers and companies could benefit from the time- and cost-saving algorithm, the researchers say. The broad goal is “to democratize AI,” says co-author Song Han, an assistant professor of electrical engineering and computer science and a researcher in the Microsystems Technology Laboratories at MIT. “We want to enable both AI experts and nonexperts to efficiently design neural network architectures with a push-button solution that runs fast on a specific hardware.”

Han adds that such NAS algorithms will never replace human engineers. “The aim is to offload the repetitive and tedious work that comes with designing and refining neural network architectures,” says Han, who is joined on the paper by two researchers in his group, Han Cai and Ligeng Zhu.

“Path-level” binarization and pruning

In their work, the researchers developed ways to delete unnecessary neural network design components, to cut computing times and use only a fraction of hardware memory to run a NAS algorithm. An additional innovation ensures each outputted CNN runs more efficiently on specific hardware platforms — CPUs, GPUs, and mobile devices — than those designed by traditional approaches. In tests, the researchers’ CNNs were 1.8 times faster measured on a mobile phone than traditional gold-standard models with similar accuracy.

A CNN’s architecture consists of layers of computation with adjustable parameters, called “filters,” and the possible connections between those filters. Filters process image pixels in grids of squares — such as 3x3, 5x5, or 7x7 — with each filter covering one square. The filters essentially move across the image and combine all the colors of their covered grid of pixels into a single pixel. Different layers may have different-sized filters, and connect to share data in different ways. The output is a condensed image — from the combined information from all the filters — that can be more easily analyzed by a computer.

Because the number of possible architectures to choose from — called the “search space” — is so large, applying NAS to create a neural network on massive image datasets is computationally prohibitive. Engineers typically run NAS on smaller proxy datasets and transfer their learned CNN architectures to the target task. This generalization method reduces the model’s accuracy, however. Moreover, the same outputted architecture also is applied to all hardware platforms, which leads to efficiency issues.

The researchers trained and tested their new NAS algorithm on an image classification task directly in the ImageNet dataset, which contains millions of images in a thousand classes. They first created a search space that contains all possible candidate CNN “paths” — meaning how the layers and filters connect to process the data. This gives the NAS algorithm free rein to find an optimal architecture.

This would typically mean all possible paths must be stored in memory, which would exceed GPU memory limits. To address this, the researchers leverage a technique called “path-level binarization,” which stores only one sampled path at a time and saves an order of magnitude in memory consumption. They combine this binarization with “path-level pruning,” a technique that traditionally learns which “neurons” in a neural network can be deleted without affecting the output. Instead of discarding neurons, however, the researchers’ NAS algorithm prunes entire paths, which completely changes the neural network’s architecture.

In training, all paths are initially given the same probability for selection. The algorithm then traces the paths — storing only one at a time — to note the accuracy and loss (a numerical penalty assigned for incorrect predictions) of their outputs. It then adjusts the probabilities of the paths to optimize both accuracy and efficiency. In the end, the algorithm prunes away all the low-probability paths and keeps only the path with the highest probability — which is the final CNN architecture.


Another key innovation was making the NAS algorithm “hardware-aware,” Han says, meaning it uses the latency on each hardware platform as a feedback signal to optimize the architecture. To measure this latency on mobile devices, for instance, big companies such as Google will employ a “farm” of mobile devices, which is very expensive. The researchers instead built a model that predicts the latency using only a single mobile phone.

For each chosen layer of the network, the algorithm samples the architecture on that latency-prediction model. It then uses that information to design an architecture that runs as quickly as possible, while achieving high accuracy. In experiments, the researchers’ CNN ran nearly twice as fast as a gold-standard model on mobile devices.

One interesting result, Han says, was that their NAS algorithm designed CNN architectures that were long dismissed as being too inefficient — but, in the researchers’ tests, they were actually optimized for certain hardware. For instance, engineers have essentially stopped using 7x7 filters, because they’re computationally more expensive than multiple, smaller filters. Yet, the researchers’ NAS algorithm found architectures with some layers of 7x7 filters ran optimally on GPUs. That’s because GPUs have high parallelization — meaning they compute many calculations simultaneously — so can process a single large filter at once more efficiently than processing multiple small filters one at a time.

“This goes against previous human thinking,” Han says. “The larger the search space, the more unknown things you can find. You don’t know if something will be better than the past human experience. Let the AI figure it out.”

The work was supported, in part, by the MIT Quest for Intelligence, the MIT-IBM Watson AI lab, the MIT-Sensetime Alliance, and Xilinx.

Source: MIT News

In an effort "to democratize AI," researchers at MIT have found a way to use artificial intelligence to train machine-learning systems much more efficiently. Their hope is that the new time- and cost-saving algorithm will allow resource-strapped researchers and companies to automate neural network design. In other words, by bringing the time and cost down, they could make this AI technique more accessible.

Today, AI can design machine learning systems known as neural networks in a process called neural architecture search (NAS). But this technique requires a considerable amount of resources like time, processing power and money. Even for Google, producing a single convolution neural network -- often used for image classification -- takes 48,000 GPU hours. Now, MIT researchers have developed a NAS algorithm that automatically learns a convolution neural network in a fraction of the time -- just 200 GPU hours.

Speeding up the process in which AI designs neural networks could enable more people to use and experiment with NAS, and that could advance the adoption of AI. While this is certainly not uncomplicated, it could be a step toward putting AI and machine learning in the hands of more people and companies, freeing it from the towers of tech giants.

Source: engadget

#AI Algorithm #Artificial Intelligence #ConvolutionNeuralNetworks #Internet #MachineLearning #MIT #NeuralArchitectureSeach #NeuralNetwork #Robots #Tomorrow


Page 1 of 2

© copyright 2017 All Rights Reserved.

A Product of HunterTech Ventures