Speech Regonition

Chinese conglomerate Alibaba is one of the world’s largest ecommerce companies, but it’s increasingly turning its attention to artificial intelligence (AI). In March 2017, it launched an AI services division for health care and manufacturing, and in September its public cloud division — Alibaba Cloud — unveiled plans to set up a dedicated subsidiary and produce a self-developed AI inference chip that could be used for logistics and autonomous driving.

Alibaba has its fingers in plenty of AI pies, needless to say. And during a presentation at NeurIPS 2018 in Montreal this morning, it delivered an update on those cross-company efforts.

 

 

 
 "We’re solving … scenarios [with] unseen difficulties,” said Rong Jin, dean of the Alibaba Institute of Data Science. “AI together with innovation [is helping] to solve some interesting challenges.”

One of those challenges is speech recognition in noisy environments, like a crowded subway system or congested convention center. Alibaba’s solution is part hardware, part software: a far-field microphone array and sophisticated deep learning algorithms that isolate voices in a crowd, drastically reducing error rate.

Compared to the 84 percent accuracy the “best” speech recognition technologies are able to achieve with a mic array alone, Alibaba claims its model is between 94 and 95 percent accurate, even with heavily accented speakers. It has already been deployed as part of a voice-based subway ticketing system in Shanghai, and Alibaba is in talks to bring it to “a number of [additional] cities.”

“Nothing can save you if you don’t get enough signal to be recognized in the first place,” Jin said.

The spoken word isn’t the only domain Alibaba is tackling with AI. Using natural language processing, it’s performing automatic translation in real time, in the cloud, so that Alibaba retail customers in countries such as Russia and the Malay region can converse with human agents in their native tongues. And it’s tapping algorithms to field a portion of the tens of thousands of calls its support centers receive each day with AliMe, Alibaba’s intelligent customer service engine.

AliMe, much like Google’s Duplex, can carry on a phone conversation and answer basic questions without involving a human agent. Perhaps more impressively, in a chatbot context, it’s able to automatically extract text and images from a supplied document with “better than human” performance.

In an onstage demo, a customer asked Dian Xiaomi — Alibaba’s answering bot — about sales promotions for a particular Bluetooth speaker, like what sort of free gifts they’d receive with their purchase and how the gifts would be delivered to their residence. (A version rolling out later this year will add sentiment analysis and automated alerts for priority cases.) Another demo showed a humanoid embodiment of the chatbot — a prototype, Jin told the audience — with coordinated eye, lip, and head movements.

It’s a boon for bustling Alibaba divisions like AliExpress, which has over 150 million users and millions of merchants, and Cainiao, whose human workers and robots fulfill more than a billion orders each year. On Singles’ Day — the November 11 Chinese shopping holiday that this year generated $30.8 billion — Alibaba’s agents receive 5 times the typical number of calls in a 24-hour period, which would have been nearly impossible to juggle without a helping hand from AI.

Dian Xiaomi currently serves almost 3.5 million users a day, Alibaba says.

But natural language processing is just the tip of Alibaba’s AI iceberg. On Xian Yu, the retailer’s used goods marketplace, the company deployed a negotiation bot that talks to buyers to settle on a price.

The bot’s development wasn’t a cakewalk — it needed to learn negotiating strategies and efficient ways to generate text that’d incentivize back-and-forth negotiation — but the end result is impressive. When published to 10 million users on the same platform, the bot had a 20 percent higher chance of making a deal than a typical human being.

“Most of the [users] are not professional sellers,” Jin said. “They don’t know how to set a price or talk to buyers.”

On the inventory management and image search front, Alibaba is leveraging a scalable computer vision architecture to sift through hundreds of millions of entities. Its Cloud Image Search algorithm can recognize objects and find images containing similar or identical ones, and one of its store management apps — which picks out multiple items on a shelf to generate a summary that includes a distribution of different brands — can detect more than 100,000 SKUs with “high accuracy.” (Alibaba’s working toward a goal of 10 million SKUs.)

Both complement Alibaba’s Ali Smart Supply Chain (ASSC), a suite of AI tools that help Alibaba merchants forecast product demand, allocate inventory, and select pricing strategies.

Alibaba’s machine vision work extends to satellite images. Using data gathered from AutoNavi, the largest map and navigation provider in China, with over 70 million users, its systems are able to identify recently constructed buildings, for example, and gather information related to road work and points of interest.

Alibaba is also using computer vision to prevent shoplifting. At its more than 66 Hema brick-and-mortar stores, offline algorithms at its self-checkout kiosks prevent ne’er-do-well customers from scanning only the first item in a basket, or concealing items from the overhead camera’s view.

“The goal is to … have a computer vision system figure out if a customer is intentionally or unintentionally scanning items,” Jin said. “The machine sees that things aren’t scanned.”

It’s powered by a deep learning algorithm — AliFPGA-X100 — that runs on a field-programmable gate array, a reconfigurable integrated circuit within the kiosks. Alibaba says it’s able to process images up to 170 times faster compared than a comparable GPU-based system.

Alibaba is also applying AI to Youku, its video hosting service. Machine learning algorithms automatically generate thumbnails for the roughly 200,000 videos its tens of millions of active users upload each day. And it can target certain audience segments with said thumbnails. Female users might see a different preview image for a given video than male users, for example. This has led to a 15 percent improvement in click-through rates and a 12 percent uptick in dwell time.

Today’s survey comes just over a year after the debut of Alibaba’s new research organization — the Academy for Discovery, Momentum, and Outlook (or DAMO)— aimed at tackling emerging technologies, like machine learning and network security, and the opening of labs in San Mateo, Seattle, Moscow, Tel Aviv, and Singapore. It also closely follows the launch of Alibaba’s Tmall Genie, an AI-powered voice assistant that’s sold over 5 million units since it hit store shelves in July 2017.

And the company is arguably just getting started. Alibaba plans to spend more than $15 billion on research and development by 2020, it told Quartz in October 2017.

Read Source Article By  @KYLE_L_WIGGERS  Venture Beat

#AI #Alibaba #SpeechRecognition #ArtificialIntelligence #DataScience #AIInferenceChip

Speech recognition is a new technological development that allows the users to speak and talk to their devices such as computer and mobile phones and give commands and instructions to be fulfilled by the system. The dedicated software recognizes the commands and converts them into a machine-readable format for performing the asked action. The use of other input methods such as typing, selecting options etc. has seen a drastic fall after the introduction of speech recognition virtual agents such as Cortana by Microsoft, voice recognition feature for google search etc.

How does it work?

A speech recognition software’s algorithms combine both language and acoustic modelling in order to recognize and distinguish different words and provide higher accuracy. Language modelling helps match the spoken words with actual words to avoid any mistake in between the words that sound similar, whereas acoustic modelling helps to recognise the language units with the audio signals.

The current speech recognition system is largely based on hidden Karkov models which help in improving overall efficiency and accuracy.

Uses: -

Speech recognition has tons of applications in distinct industries. Some of them are listed below: -

  1. Military: - The military has been actively using this system in many operations such as training air traffic controllers, in helicopters as well as fighter jets. The pilots use this tech to give commands to the auto-pilot, set steering coordinates as well as adjust radio frequencies.
  2. Education: - Learning a second language, improving spoken proficiency skills, listening to new words pronunciation etc. are some uses of this technology in the education sector. Now the blind students are able to use the computer properly by giving and listening to spoken commands and messages. Having interactions about a particular topic with the computer helps the students to understand the subject better.
  3. Day-to-Day life: - Voice search, speech-to-text, voice calls etc. have made the life of the people really easy and more efficient.

Positives and Negatives: -
Although there are continuous developments in this sector every now and then to make it better, the speech recognition system still has a lot of work to be done in order to make it appeal to an even wider public. The biggest positive of this system is that it is easier to use and now is being more readily available to the public to test it out themselves.

The negative part is the lack of support to many languages other than English and its inability to capture and present words due to different accents and pronunciation style of the people which lead to a higher degree of inaccuracy. Plus, to use it properly, the users must have a quiet background with no noise other than their voice which is practically impossible to achieve.

Conclusion: -

Overall, the industry has seen some massive recent developments which are expected to increase in number in order to make this technology a success in the near future. Features like background noise cancellation, support to non-English languages etc. are required for it to appeal to the people.

   

© copyright 2017 www.aimlmarketplace.com. All Rights Reserved.

A Product of HunterTech Ventures