Extracting Meaning from Sound — Computer Scientists and Hearing Scientists Come Together Right Now

Machines that listen to us, hear us, and act on what they hear are becoming common in our homes.. So far, however, they are only interested in what we say, not how we say it, where we say it, or what other sounds they hear. Richard Lyon describes where we go from here.


Based on positive experiences of marrying auditory front ends to machine-learning back ends, and watching others do the same, I am optimistic that we will see an explosion of sound-understanding applications in coming years. At the same time, however, I see too many half-baked attempts that ignore important properties of sound and hearing, and that expect the machine learning to make up for poor front ends. This is one of reasons that I wrote Human and Machine Hearing.

Machines that listen to us, hear us, and act on what they hear are becoming common in our homes, with Amazon Echo, Google Home, and a flurry of new introductions in 2017. So far, however, they are only interested in what we say, not how we say it, where we say it, or what other sounds they hear. I predict a trend, very soon, toward much more human-like hearing functions, integrating the “how”, “what”, and “where” aspects of sound perception to augment the current technology of speech recognition. As the meaning of sound comes to be better extracted, even the “why” is something we can expect machines to deal with.

Some of these abilities are becoming available already, for example in security cameras, which can alert you to people talking, dogs barking, and other sound categories. I have developed technologies to help this field of machine hearing develop, on and off over the last 40 years, based firmly in the approach of understanding and modeling how human hearing works. Recently, progress has been greatly accelerated by leveraging modern machine learning methods, such as those developed for image recognition, to map from auditory representations to answers to the “what” and “where” questions.

It is not just computer scientists who can benefit from this engineering approach to hearing. Within the hearing-specialized medical, physiology, anatomy, and psychology communities, there is a great wealth of knowledge and understanding about most aspects of hearing, but too often a lack of the sort of engineering understanding that would allow one to build machine models that listen and extract meaning as effectively as we do. I believe the only way to sort out the important knowledge is to build machine models that incorporate it. We should routinely run the same tests on models that we run on humans and animals, to test and refine our understanding and our models. And we should extend those tests to increasingly realistic and difficult scenarios, such as sorting out the voices in a meeting — or in the proverbial cocktail party.

To bring hearing scientists and computer scientists together, I target the engineering explanations in my book to both. A shared understanding of linear and nonlinear systems, continuous- and discrete-time systems, acoustic and auditory approaches, etc., will help them move forward together, rather than in orthogonal directions as has been too common in the past.

Find out more about the book and check out Richard Lyon’s commentary on, and errata for, Human and Machine Hearing.

Richard F. Lyon leads Google’s research and applications development in machine hearing as well as the team that developed camera systems for the Google Street View project. He is an engineer and scientist known for his work on cochlear models and auditory correlograms for the analysis and visualization of sound, and for implementations of these models, which he has also worked on at Xerox PARC, Schlumberger, and Apple. Lyon is a Fellow of the Institute of Electrical and Electronics Engineers and of the Association for Computing Machinery, and is among the world’s top 500 editors of Wikipedia. He has published widely in hearing, VLSI design, signal processing, speech recognition, computer architecture, photographic technology, handwriting recognition, computer graphics, and slide rules. He holds 58 issued United States patents for his inventions, including the optical mouse.

Checking in on grammar checking

shutterstock_384737923 Grammar Letters‘Checking in on Grammar Checking’ by Robert Dale is the latest Industry Watch column to be published in the journal Natural Language Engineering.

Reflecting back to 2004, industry expert Robert Dale reminds us of a time when Microsoft Word was the dominant software used for grammar checking. Bringing us up-to-date in 2016, Dale discusses the evolution, capabilities and current marketplace for grammar checking and its diverse range of users: from academics, men on dating websites to the fifty top celebrities on Twitter.

Below is an extract from the article, which is available to read in full here.

An appropriate time to reflect
I am writing this piece on a very special day. It’s National Grammar Day, ‘observed’ (to use Wikipedia’s crowdsourced choice of words) in the US on March 4th. The word ‘observed’ makes me think of citizens across the land going about their business throughout the day quietly and with a certain reverence; determined, on this day of all days, to ensure that their subjects agree with their verbs, to not their infinitives split, and to avoid using prepositions to end their sentences with. I can’t see it, really. I suspect that, for most people, National Grammar Day ranks some distance behind National Hug Day (January 21st) and National Cat Day (October 29th). And, at least in Poland and Lithuania, it has to compete with St Casimir’s Day, also celebrated on March 4th. I suppose we could do a study to see whether Polish and Lithuanian speakers have poorer grammar than Americans on that day, but I doubt we’d find a significant difference. So National Grammar Day might not mean all that much to most people, but it does feel like an appropriate time to take stock of where the grammar checking industry has got to. I last wrote a piece on commercial grammar checkers for the Industry Watch column over 10 years ago (Dale 2004). At the time, there really was no alternative to the grammar checker in Microsoft Word. What’s changed in the interim? And does anyone really need a grammar checker when so much content these days consists of generated-on-a-whim tweets and SMS messages?

The evolution of grammar checking
Grammar checking software has evolved through three distinct paradigms. First-generation tools were based on simple pattern matching and string replacement, using tables of suspect strings and their corresponding corrections. For example, we might search a text for any occurrences of the string isnt and suggest replacing them by isn’t. The basic technology here was pioneered by Bell Labs in the UNIX Writer’s Workbench tools (Macdonald 1983) in the late 1970s and early 1980s, and was widely used in a range of more or less derivative commercial software products that appeared on the market in the early ’80s. Anyone who can remember that far back might dimly recall using programs like RightWriter on the PC and Grammatik on the Mac. Second-generation tools embodied real syntactic processing. IBM’s Epistle (Heidorn et al. 1982) was the first really visible foray into this space, and key members of the team that built that application went on to develop the grammar checker that, to this day, resides inside Microsoft Word (Heidorn 2000). These systems rely on large rule-based descriptions of permissible syntax, in combination with a variety of techniques for detecting ungrammatical elements and posing potential corrections for those errors. Perhaps not surprisingly, the third generation of grammar-checking software is represented by solutions that make use of statistical language models in one way or another. The most impressive of these is Google’s context-aware spell checker (Whitelaw et al. 2009)—when you start taking context into account, the boundary between spell checking and grammar checking gets a bit fuzzy. Google’s entrance into a marketplace is enough to make anyone go weak at the knees, but there are other third-party developers brave enough to explore what’s possible in this space. A recent attempt that looks interesting is Deep Grammar (www.deepgrammar.com). We might expect to find that modern grammar checkers draw on techniques from each of these three paradigms. You can get a long way using simple table lookup for common errors, so it would be daft to ignore that fact, but each generation adds the potential for further coverage and capability.

The remainder of the article discusses the following:

  • Today’s grammar-checking marketplace
  • Capabilities
  • Who needs a grammar checker?

‘Checking in on grammar checking’ is an Open Access article. You may also be interested in complimentary access to a collection of related articles about grammar published in Natural Language Engineering. These papers are fully available until 30th June 2016.

Other recent Industry Watch articles by Robert Dale:

How to make money in Machine Translation

Machine Translation across the world

Google’s translation service is used more than a billion times a day worldwide

Extract from the article ‘How to make money in the translation business’ by industry expert Robert Dale published in the journal Natural Language Engineering.

An anniversary year

2016 marks the fiftieth anniversary of an important event in the history of Machine Translation (MT). In 1966, after two years of work, the group of seven scientists who constituted the US National Science Foundation’s Automatic Language Processing Advisory Committee (ALPAC) handed down a 124-page report that was, well, somewhat negative about the state of MT research and its prospects. The ALPAC report is widely credited with causing the US government to drastically reduce funding in MT, and other countries to follow suit.

As it happens, 2016 also marks the tenth anniversary of the launch of the Google Translate web-based translation service, which was soon followed in 2007 by Microsoft’s Translator. Google says its translation service is used more than a billion times a day worldwide, by more than 500 million people a month. In mid-2015, one market research report estimated that, by 2020, the global MT market will be worth $10B.

Not a bad turnaround in outlook, even if it did take a few decades.

MT is special

In the portfolio of language technology applications that are the focus of interest of this journal’s readership, MT occupies a special place. MT was the goal of one of the very first experiments in Natural Language Processing. In 1954, the Georgetown–IBM MT system automatically translated sixty Russian sentences into English, leading its authors to claim that within three or five years, MT might be a solved problem. You can still find the original press release on the web; it’s a fascinating read, with its detailed description of a ‘brain’ that ‘dashed off its English translations . . . at the breakneck speed of two and a half lines per second.’

MT is also special because it’s one of the first areas of Natural Language Processing where statistical methods took hold in a big way. Although the idea of statistical MT was first raised by Warren Weaver in a 1949 memorandum, it was IBM’s influential statistical MT work in the late 1980s and early 1990s that caused researchers to sit up and take notice. I think it’s reasonable to claim that the perceived successes of Statistical Machine Translation (SMT) have been a major driver for the application of statistical techniques in other areas of Natural Language Processing since that time.

And MT is special because it’s possibly the most accessible form of language technology in terms of the popular understanding. It can be a struggle to explain to the layperson exactly what text analytics is, or why it is that grammar checkers and speech recognisers make mistakes. But most people get what MT is about, and can see that it might be a hard thing to do; many people have struggled with learning a second language. Nobody doubts the value of a technology that can take one human language as input and provide another as output.

In fact, universal translators have been a staple of science fiction, and thus part of the popular imagination, since at least 1945. Devices that can translate languages have played a role in many popular sci-fi TV shows. You can even guess someone’s age bracket by the movie or TV show whose name comes to mind when you mention the idea—for me, it’s Star Trek, where the back-story is that the Universal Translator was first used in the late twenty-second century for the translation of well-known Earth languages.

From where we stand now, Star Trek’s creator, Gene Rodenberry, looks to have been just a bit on the cautious side with his predictions. Perhaps he had read the ALPAC report: the Universal Translator first showed up in a 1967 episode of the show.

In the rest of the article, Robert Dale looks at where we are now, MT delivery models, considers humans versus machines gives his opinion on where the commercial potential lies.

Read the article How to make money in the translation business

You may also be interested in complimentary access* to a collection of related articles on Machine Translation from the journal Natural Language Engineering. *Free access available until 31 March 2016

Graphs and Natural Language Processing

Natural Language Engineering

Blog post written by Vivi Nastase based on the special issue ‘Graphs and Natural Language Processing’ in the journal Natural Language Engineering.

Graph structures naturally model connections. In natural language processing (NLP) connections are ubiquitous, on anything between small and web scale: between words — as structural/grammatical or semantic connections; between concepts in ontologies or semantic repositories; between web pages; between entities in social networks. Such connections are relatively obvious and the parallel with the graph structures straight-forward. While less obvious, with a little mathematical imagination, graphs can be applied to typo correction, machine translation, document structuring, sentiment analysis and more.

Graphs can be extremely useful for revealing regularities and patterns in the data. Graph formalisms have been adopted as an unsupervised learning approach to numerous problems – such as language identification, part-of-speech (POS) induction, or word sense induction – and also in semi-supervised settings, where a small set of annotated seed examples are used together with the graph structure to spread their annotations throughout the graph. Graphs’ appeal is also enhanced by the fact that using them as a representation method can reveal characteristics and be useful for human inspection, and thus provide insights and ideas for automatic methods.

We find not only the standard graphs — consisting of a set of nodes and edges that connect pairs of nodes — but also heterogeneous graphs (to model the network of tweeters and their tweets, or the network of articles, their authors and references), hypergraphs (which allow edges with more than two nodes, that could model grammatical rules for example), graphs with multi-layered edges, to fit more complex problems and data.

In the special issue we include a survey of graph-based methods in natural language processing, to show both the variety of graph formalisms and of tasks they can be useful for. The core of the issue consists of four articles, each of which showcases and exploits a different facet of graphs for different tasks in NLP: graphs as a framework for the organization of complex knowledge; using the graph structure of knowledge repositories for the computation of semantic relatedness between texts; revealing and exploiting sub-structures in word co-occurrence graphs for approximating word senses and performing sense-level translations; tracking changes in word co-occurrence graphs to identify diachronic sense changes.

Read the special issue ‘Graphs and Natural Language Processing’ in the journal Natural Language Engineering.

NLP meets the cloud

NLE Blog post July 15In his latest industry watch column, Robert Dale, Chief Technology Officer for Arria NLG, takes a look at what’s on offer in the NLP microservices space, reviewing five SaaS offerings as of June 2015

Below is an extract from the column

With NLP services now widely available via cloud APIs, tasks like named entity recognition and sentiment analysis are virtually commodities. We look at what’s on offer, and make some suggestions for how to get rich.

Software as a service, or SaaS – the mode of software delivery where you pay a monthly or annual subscription to use a cloud-based service, rather than having a piece of software installed on your desktop just gets more and more popular. If you’re a user of Evernote or CrashPlan, or in fact even GMail or Google Docs, you’ve used SaaS. The biggest impact of the model is in the world of enterprise software, with applications like Salesforce, Netsuite and Concur now part of the furniture for many organisations. SaaS is big business: depending on which industry analyst you trust, the SaaS market will be worth somewhere between US$70 billion and US$120 billion by 2018. The benefits from the software vendor’s point of view are well known: you only have one instance of your software to maintain and upgrade, provisioning can be handled elastically, the revenue model is very attractive, and you get better control of your intellectual property. And customers like the hassle-free access from any web-enabled device without setup or maintenance, the ability to turn subscriptions on and off with no up-front licence fees, and not having to talk to the IT department to get what they want.

The SaaS model meets the NLP world in the area of cloud-based microservices: a specific form of SaaS where you deliver a small, well-defined, modular set of services through some lightweight mechanism. By combining NLP microservices in novel ways with other functionalities, you can easily build a sophisticated mashup that might just net you an early retirement. The economics of commercial NLP microservices offerings make these an appealing way to get your app up and running without having to build all the bits yourself, with your costs scaling comfortably with the success of your innovation. So what is out there in the NLP microservices space? That early retirement thing sounded good to me, so I decided to take a look. But here’s the thing: I’m lazy.

I want to know with minimal effort whether someone’s toolset is going to do the job for me; I don’t want to spend hours digging through a website to understand what’s on offer. So, I decided to evaluate SaaS offerings in the NLP space using, appropriately, the SAS (Short Attention Span) methodology: I would see how many functioning NLP service vendors I could track down in an afternoon on the web,

and I would give each website a maximum of five minutes of exploration time to see what it offered up. If after five minutes on a site I couldn’t really form a clear picture of what was on offer, how to use it, or what it would cost me, I would move on. Expecting me to read more than a paragraph of text is so Gen X.

Before we get into specifics, some general comments about the nature of these services are in order, because what’s striking is the similarities that hold across the different providers. Taken together, these almost constitute a playbook for rolling out a SaaS offering in this space.

Read the rest of the article including reviews of Alchemy API, TextRazor and more in the Journal of Natural Language Engineering

Machine learning helps computers predict near-synonyms

The article is published in Natural Language Engineering, a journal that meets the needs of professionals and researchers working in all areas of computerised language processing

Choosing the best word or phrase for a given context from among candidate near-synonyms, such as “slim” and “skinny”, is something that human writers, given some experience, do naturally; but for choices with this level of granularity, it can be a difficult selection problem for computers.

Researchers from Macquarie University in Australia have published an article in the journal Natural Language Engineering, investigating whether they could use machine learning to re-predict a particular choice among near-synonyms made by a human author – a task known as the lexical gap problem.

They used a supervised machine learning approach to this problem in which the weights of different features of a document are learned computationally. Through using this approach, the computers were able to predict synonyms with greater accuracy and reduce errors.

The initial approach solidly outperformed some standard baselines, and predictions of synonyms made using a small window around the word outperformed those made using a wider context (such as the whole document).

However, they found that this was not the case uniformly across all types of near-synonyms.  Those that embodied connotational or affective differences — such as “slim” versus “skinny”, with differences in how positively the meaning is presented — behaved quite differently, in a way that suggested that broader features related to the ‘tone’ of the document could be useful, including document sentiment, document author, and a distance metric for weighting the wider lexical context of the gap itself  (For instance, if the chosen near-synonym was negative in sentiment, this might be linked to other expressions of negative sentiment in the document).

The distance weighting was particularly effective, resulting in a 38% decrease in errors, and these models turned out to improve accuracy not just on affective word choice, but on non-affective word choice also.

Read the full article ‘Predicting word choice in affective text’ online in the journal Natural Language Engineering

Can your phone make you laugh?

Funny Texting

Examples of humorous and sometimes awkward autocorrect substitutions happen all the time. Typing ‘funny autocorrect’ into Google brings up page upon page of examples where phones seem to have a mind of their own.

A group of researchers at the University of Helsinki, under the lead of Professor Hannu Toivonen, have been examining word substitution and sentence formation, to see the extent to which they can implement a completely automatic form of humour generation. The results have been published online in the in the journal Natural Language Engineering.

Basing the experiment on the ideas and methods of computational humour explored by Alessandro Valitutti for several years, the researchers worked with short length text messages changing one word to another one, turning the text to a pun, possibly using a taboo word. By isolating and manipulating the main components of such pun-based texts, they were able to generate humorous texts in a more controllable way.

For example, it was proved that replacing a word at the end of the sentence surprised recipients, contributing to the humorous effect. They also proved that word replacement is funnier if the word is phonetically similar to the original word and when the word is a “humorously inappropriate” taboo word.

The experiment involved over 70,000 assessments in total, and used crowd sourcing to test funniness of the texts. This is the largest experiment that Professor Toivonen knows of related to this field of research.

How funny?

People were asked to assess individual messages for their funniness on a scale of 0 to 4 with 0 indicating the text wasn’t funny. And comedians can sigh with relief – the initial median score from the research was just 0.55, indicating that on average the text can hardly be called funny. But by following a combination of rules, this median increased by 67% showing that by inserting certain criteria could impact upon how funny the text message was.

Does this mean that in the future people will ‘rofl’ (roll on the floor laughing) in response to a funny quip or witty banter made by a phone?

Professor Toivonen sees a future where programs will be able to generate humorous automated responses and sentences:

“Some of the first applications of this type of research are likely to be seen in the automated production of funny marketing messages and help with creative writing. But who knows, maybe phones will one day be intelligent enough to make you laugh.”

Read the article ‘Computational generation and dissection of lexical replacement humor’ online in the journal Natural Language Engineering– please note that the article contains language that some may find offensive.

5 of the funniest texts*

Message Original Word Replacement word
Okie, pee ya later see pee
How come u r back so fart? fast fart
Now u makin me more curious…smell me pls… Tell smell
Dunno…My mum is kill bathing. still kill
No choice have to eat her treat eat

*There were funnier texts but due to offensive language we were not able to publish them on this blog