NLP meets the cloud

NLE Blog post July 15In his latest industry watch column, Robert Dale, Chief Technology Officer for Arria NLG, takes a look at what’s on offer in the NLP microservices space, reviewing five SaaS offerings as of June 2015

Below is an extract from the column

With NLP services now widely available via cloud APIs, tasks like named entity recognition and sentiment analysis are virtually commodities. We look at what’s on offer, and make some suggestions for how to get rich.

Software as a service, or SaaS – the mode of software delivery where you pay a monthly or annual subscription to use a cloud-based service, rather than having a piece of software installed on your desktop just gets more and more popular. If you’re a user of Evernote or CrashPlan, or in fact even GMail or Google Docs, you’ve used SaaS. The biggest impact of the model is in the world of enterprise software, with applications like Salesforce, Netsuite and Concur now part of the furniture for many organisations. SaaS is big business: depending on which industry analyst you trust, the SaaS market will be worth somewhere between US$70 billion and US$120 billion by 2018. The benefits from the software vendor’s point of view are well known: you only have one instance of your software to maintain and upgrade, provisioning can be handled elastically, the revenue model is very attractive, and you get better control of your intellectual property. And customers like the hassle-free access from any web-enabled device without setup or maintenance, the ability to turn subscriptions on and off with no up-front licence fees, and not having to talk to the IT department to get what they want.

The SaaS model meets the NLP world in the area of cloud-based microservices: a specific form of SaaS where you deliver a small, well-defined, modular set of services through some lightweight mechanism. By combining NLP microservices in novel ways with other functionalities, you can easily build a sophisticated mashup that might just net you an early retirement. The economics of commercial NLP microservices offerings make these an appealing way to get your app up and running without having to build all the bits yourself, with your costs scaling comfortably with the success of your innovation. So what is out there in the NLP microservices space? That early retirement thing sounded good to me, so I decided to take a look. But here’s the thing: I’m lazy.

I want to know with minimal effort whether someone’s toolset is going to do the job for me; I don’t want to spend hours digging through a website to understand what’s on offer. So, I decided to evaluate SaaS offerings in the NLP space using, appropriately, the SAS (Short Attention Span) methodology: I would see how many functioning NLP service vendors I could track down in an afternoon on the web,

and I would give each website a maximum of five minutes of exploration time to see what it offered up. If after five minutes on a site I couldn’t really form a clear picture of what was on offer, how to use it, or what it would cost me, I would move on. Expecting me to read more than a paragraph of text is so Gen X.

Before we get into specifics, some general comments about the nature of these services are in order, because what’s striking is the similarities that hold across the different providers. Taken together, these almost constitute a playbook for rolling out a SaaS offering in this space.

Read the rest of the article including reviews of Alchemy API, TextRazor and more in the Journal of Natural Language Engineering

A literary history of the strange expression ‘what is it like?

ENG blog post - July 15Blog written by Anne Seaton based on an article in the journal English Today 

It was when I was working on Chambers Universal Learners’ Dictionary in the late ’70s that I suddenly focused on the weirdness of the expression ‘what is it like?’ Why ask for a comparison when you want a description? I managed to squeeze it into the dictionary at W (for what), since it had missed the boat at L (for like). Desk dictionaries seemed not to bother with it. But the 1933 OED pinpointed its function with notable precision: ‘The question what is he (or it) like? means ‘What sort of man is he?’, ‘What sort of thing is it?’, the expected answer being a description, and not at all the mention of a resembling person or thing.’ However, it gave only two citations, the earlier dated 1878, whereas citations from my databases, when I began searching on ‘what … like’, showed that it was in use in the early 19th century. Earlier than that there was evidence that the question was indeed used literally to ask for a comparison.

I’m very aware that ‘what is it like?’ should be studied in conjunction with ‘like that’ (as in ‘He’s like that’, ‘It’s like that’), which can be understood as its counterpart in statement form. Citations for ‘people/things like that’ can be found as early as the 17th century, but the use of ‘like that’ as a complement after a linking verb seems to arrive in the mid 19th century. Trollope, who quibbles over ‘What is he like?’ seems OK with ‘like that’. In The Small House at Allington he puts it into the mouth of Johnny Eames:

‘My belief is, that a girl thinks nothing of a man till she has refused him half-a-dozen times.’

‘I don’t think Lily is at all like that.’

Read the full article ‘A literary history of the strange expression ‘what is it like?’ A straightforward question that changed its function and took universal hold’ 

Evolving and adapting to global changes regarding English

ENG July 2015English language teaching in the Siberian city of Irkutsk

Blog post written by Valerie Sartor based on a recent article in the journal English Today

The Russian Federation, established after the breakup of the USSR in the early 1990s, is the largest country in the world, and until recently, a nation that did not encourage foreigners to enter in order to teach English to the native population. Moscow and St Petersburg remain the two main intellectual and cultural capitals. During the Soviet era (1917-1990), however, cities in the western provinces, such as Kiev and Riga, were also held in high regard for education, with specialized universities dedicated to making contributions to science and technology, as well as the arts and sciences. Very little, however, was known about Siberian educational institutions, and little has been written recently about English in universities in the more remote areas of Siberia.

I served as a Fulbright Global TEFL Exchange Scholar for the 2014-2015 academic year in southeastern Siberia. My post was inside the English language teaching within the Eurasian Linguistic Institute (ELI), a new affiliate branch of the Moscow State Linguistic University (MGLU), located in the city of Irkutsk, Irkutsk Province. Formerly known as the Irkutsk State Pedagogical Institute of Foreign Languages, this facility was founded in 1948. Irkutsk has long hosted a diverse population. Historically it is known as one of the prosperous tea route cities, and also Irkutsk welcomed the Decembrist exiles, along with other political and religious exiles from European Russia and Eastern Europe. Because of this, despite being provincial, Irkutsk has many universities, art galleries, theaters, and beautiful architecture modeled after the buildings in St Petersburg. The ELI building itself is striking.

Presently, at the Eurasian Linguistic Institute, Russian students continue to specialize in learning English in order to become English teachers and translators. Traditionally, females have held these jobs and the trend continues. Globalization has, however, impacted teaching methods as well as the ways in which students acquire fluency. ELI teachers now employ textbooks from the UK and the USA. Many teachers and students travel internationally to English speaking countries for study and work exchanges. Finally, the Internet has opened up a vast window to English language resources.

With these positive opportunities have also come some negative outcomes. Faculty at ELI complain that their students no longer read as extensively as their students did during Soviet times; moreover, with the fluctuating economic situation since the early 1990s, enrolments have dropped. Currently, funding for state universities and institutes is also problematic. Recently, ELI merged with Moscow State Linguistic University as part of Mr. Putin’s plans for streamlining educational institutions to make them sustainable. Funding problems and globalization have also impacted teacher perceptions. Some ELI teachers feel that they have lost “educational capital” as mentors and models in regard to students, who focus more on adjusting to the post-Soviet economic situation than to establishing themselves in the academy.

Yet at the same time, faculty at ELI reported that they were under the same pressure as in Soviet times. They were expected to better themselves academically; to write articles, or to conduct extra-curricular activities – involving creation of textbooks or curriculum. English teachers carry out tedious administrative functions and teach many classes. Nevertheless, the teachers I worked with were dedicated educators, spending many hours at the institute. Many also moonlighted as private tutors in order to enhance their economic situation.

Read the full article ‘Evolving and adapting to global changes regarding English: English language teaching in the Siberian city of Irkutsk, Contemporary English language teaching in a remote Siberian university’ by Valerie Sartorand Svetlana Bogdanova.


Talker familiarity and spoken word recognition in school-age children

JCL blosg post - Jun 15Blog post written by Susannah Levi based on an article in Journal of Child Language 

When people listen to speech, they hear two types of information:  what is being said (such as “That’s a ball”) and who said it (such as MOMMY).

Prior studies have shown that when adults understand speech better when it is spoken by a familiar voice. In this study, we tested whether school-age children also understand speech better when listening to a familiar voice. First, children learned the voices of three previously unknown speakers over five days. Following this voice familiarization, children listened to words mixed with background noise and were asked to tell us what they heard. These words were spoken both by the now familiar speakers and by another set of unfamiliar speakers.

Our results showed that, like adults, children understand speech better when it is produced by a familiar voice. Interestingly, this benefit of voice familiarity only occurred when listening to highly familiar words (such as “book” or “cat”) and not to words that are less familiar to school-age children (such as “fate” or “void”). We also found that the benefit of familiarity with a voice was most noticeable in children with the poorest performance, suggesting that familiarity with a voice may be especially useful for children who have difficulty understanding spoken language.

We invite you to read the full article ‘Talker familiarity and spoken word recognition in school-age children’ here

Measuring a very young child’s language and communication skills in developing countries

JCL Blog Jun 15Blog post written by Katie Alcock based on an article in Journal of Child Language 

The best way to find out about a very young child’s language and communication is to ask their parents – but in developing countries parents can’t always fill in a written questionnaire – so we have created a very successful interview technique to do this.

To start with, we wanted to visit homes close to the Wellcome Trust Unit in Kilifi Town, Coastal Kenya but even in town, there are few paved roads.  We went out in a four-wheel-drive from the Unit in to a Kiswahili-speaking family home; homes here range from concrete to mud walls, from tin to thatch rooves.

Families in Kilifi are used to nosy questions from researchers but we weren’t sure how easy they’d find it to talk about their children’s language. Parents worldwide find it very difficult to answer open-ended questions about words their young child knows. When asked simple yes/no questions about individual words though, we think the information is accurate.

The family we saw that day had a 15 month old boy. For children of this age we ask parents  about words for animals and noises, foods, household objects, toys, verbs. Children often know few words at this age so we also ask about gestures such as waving and (very important in this culture) shaking hands.

“Can your child understand or say the following words… mee mee [what a goat says]; boo boo [what a cow says]… maji [water]… ndizi  [banana]… taa [lamp]” “Yes! He says “taa” and he thinks the moon is a lamp too, he says “taa” when he sees the moon!”.

Bingo! A classic example of overextension – a child using a word for one thing to refer to something similar. We were unsure when we started what kind of answers parents would give and how patient they would be with quite a lengthy questionnaire (over 300 words, even for 8 month old babies).  Our researchers didn’t know either whether local parents would be aware of children making animal noises – like “baa” and “moo”. It turned out this was very much a “thing” that parents noticed – the baby word for “cat” is “nyau”.

We do this research through the MRC unit because we need good tools to assess how child is affected by factors such as HIV, cerebral malaria, and malnutrition development.

We found out find that parents were very accurate in telling us how well their child communicates, and very patient! They told us about the same kinds of mistakes children make learning other languages. We also went on to use our questionnaire to look at whether children exposed to HIV had delayed language compared to their peers.

Read the full article ‘Developmental inventories using illiterate parents as informants: Communicative Development Inventory (CDI) adaptation for two Kenyan languages’ here

Machine learning helps computers predict near-synonyms

The article is published in Natural Language Engineering, a journal that meets the needs of professionals and researchers working in all areas of computerised language processing

Choosing the best word or phrase for a given context from among candidate near-synonyms, such as “slim” and “skinny”, is something that human writers, given some experience, do naturally; but for choices with this level of granularity, it can be a difficult selection problem for computers.

Researchers from Macquarie University in Australia have published an article in the journal Natural Language Engineering, investigating whether they could use machine learning to re-predict a particular choice among near-synonyms made by a human author – a task known as the lexical gap problem.

They used a supervised machine learning approach to this problem in which the weights of different features of a document are learned computationally. Through using this approach, the computers were able to predict synonyms with greater accuracy and reduce errors.

The initial approach solidly outperformed some standard baselines, and predictions of synonyms made using a small window around the word outperformed those made using a wider context (such as the whole document).

However, they found that this was not the case uniformly across all types of near-synonyms.  Those that embodied connotational or affective differences — such as “slim” versus “skinny”, with differences in how positively the meaning is presented — behaved quite differently, in a way that suggested that broader features related to the ‘tone’ of the document could be useful, including document sentiment, document author, and a distance metric for weighting the wider lexical context of the gap itself  (For instance, if the chosen near-synonym was negative in sentiment, this might be linked to other expressions of negative sentiment in the document).

The distance weighting was particularly effective, resulting in a 38% decrease in errors, and these models turned out to improve accuracy not just on affective word choice, but on non-affective word choice also.

Read the full article ‘Predicting word choice in affective text’ online in the journal Natural Language Engineering

How do children who have recently begun to learn English map new L2 words into their existing mental lexicon?

BIL SI Cover 2015Blog post written by Greg Poarch based on an article in Bilingualism: Language and Cognition 

How do children who have recently begun to learn English map new L2 words into their existing mental lexicon? We tested the predictions of the Revised Hierarchical Model (Kroll & Stewart, 1994), originally introduced to explain language production processes and the relative strengths of the underlying connections between L1 and L2 word forms and the corresponding concepts. To examine how children map novel words to concepts during early stages of L2 learning, we tested fifth grade Dutch L2 learners with eight months of English instruction.

In Study 1, the children performed a translation recognition task, in which an English word (bike) was shown followed by a Dutch word and the children had to indicate whether the Dutch word was the correct translation. The Dutch word could be on of three: the correct translation equivalent (fiets), a semantically related incorrect translation (wiel [wheel]), or an unrelated incorrect translation (melk [milk]). The critical stimuli here were the semantically related incorrect translations: the RHM predicts that beginning learners should not be sensitive yet to L2 semantics, and hence perform equally on both kinds of incorrect translations. The children, however, were already sensitive to L2 word meaning and took longer to decide that a word was an incorrect translation when it was semantically related than unrelated.

In Study 2 the children performed backward and forward translation production tasks, and were faster in the backward direction, indicating direct translation from the L2 word to the L1 word without the detour via the concept, as predicted by the RHM. Our results indicate that depending on the task, Dutch beginning L2 learners do exploit conceptual information during L2 processing and map L2 word-forms to concepts, but evidently more so in recognition tasks than in production tasks. Critically, the children in our study had learned L2 words in contexts enriched by pictures and listening/speaking exercises.

This is further evidence that manner of L2 instruction may majorly impact the activation of lexical and conceptual information during translation.

Read the full article ‘Accessing word meaning in beginning second language learners: Lexical or conceptual mediation?’ here

Can your phone make you laugh?

Funny Texting

Examples of humorous and sometimes awkward autocorrect substitutions happen all the time. Typing ‘funny autocorrect’ into Google brings up page upon page of examples where phones seem to have a mind of their own.

A group of researchers at the University of Helsinki, under the lead of Professor Hannu Toivonen, have been examining word substitution and sentence formation, to see the extent to which they can implement a completely automatic form of humour generation. The results have been published online in the in the journal Natural Language Engineering.

Basing the experiment on the ideas and methods of computational humour explored by Alessandro Valitutti for several years, the researchers worked with short length text messages changing one word to another one, turning the text to a pun, possibly using a taboo word. By isolating and manipulating the main components of such pun-based texts, they were able to generate humorous texts in a more controllable way.

For example, it was proved that replacing a word at the end of the sentence surprised recipients, contributing to the humorous effect. They also proved that word replacement is funnier if the word is phonetically similar to the original word and when the word is a “humorously inappropriate” taboo word.

The experiment involved over 70,000 assessments in total, and used crowd sourcing to test funniness of the texts. This is the largest experiment that Professor Toivonen knows of related to this field of research.

How funny?

People were asked to assess individual messages for their funniness on a scale of 0 to 4 with 0 indicating the text wasn’t funny. And comedians can sigh with relief – the initial median score from the research was just 0.55, indicating that on average the text can hardly be called funny. But by following a combination of rules, this median increased by 67% showing that by inserting certain criteria could impact upon how funny the text message was.

Does this mean that in the future people will ‘rofl’ (roll on the floor laughing) in response to a funny quip or witty banter made by a phone?

Professor Toivonen sees a future where programs will be able to generate humorous automated responses and sentences:

“Some of the first applications of this type of research are likely to be seen in the automated production of funny marketing messages and help with creative writing. But who knows, maybe phones will one day be intelligent enough to make you laugh.”

Read the article ‘Computational generation and dissection of lexical replacement humor’ online in the journal Natural Language Engineering– please note that the article contains language that some may find offensive.

5 of the funniest texts*

Message Original Word Replacement word
Okie, pee ya later see pee
How come u r back so fart? fast fart
Now u makin me more curious…smell me pls… Tell smell
Dunno…My mum is kill bathing. still kill
No choice have to eat her treat eat

*There were funnier texts but due to offensive language we were not able to publish them on this blog

Caregivers provide more labeling responses to infants’ pointing than to infants’ object-directed vocalizations

JCL June 2015Blog post written by Julie Gros-Louis based on an article in a recent issue of Journal of Child Language

One main context for language learning is in social interactions with parents and caregivers. Infants produce vocal and gestural behaviors and caregivers respond to these behaviors, which supports language development. Prior studies have shown a strong relationship between infants’ pointing gestures and language outcomes. One reason for this association is that parents translate the apparent meaning of infants’ points, thus providing infants with language input associated with their pointing behavior. In contrast to the relationship between pointing and language development, infants’ overall vocal production is not related to language outcomes. One possible explanation for the different association between pointing and language outcomes, compared to vocalizations and language outcomes, is that pointing may elicit more verbal responses from social partners that are facilitative for language learning.

To examine this possibility, we observed twelve-month-olds during free play interactions with their mothers and fathers. At this age, infants do not have many words in their vocabulary and thus communicate primarily with gestures and vocalizations. We compared parents’ verbal responses to infants’ pointing gestures and object-directed vocalizations. Results showed that infants’ pointing elicited more verbal responses from parents compared to object-directed vocalizations. Also, these verbal responses were mainly object labels. These results may help explain why pointing is associated with indices of language acquisition, but the production of vocalizations is not. Furthermore, the study highlights the importance of examining moment-to-moment interactions to uncover social mechanisms that support language development.

We invite to read the full article ‘Caregivers provide more labeling responses to infants’ pointing than to infants’ object-directed vocalizations’ here


Research trends in mobile assisted language learning from 2000 to 2012

REC Blog May 15Blog post written by Guler Duman based on an article in the latest issue of ReCALL

The widespread ownership of sophisticated but affordable mobile technologies has extended opportunities for making language teaching and learning available beyond the traditional classroom. Researchers have therefore begun to investigate new uses for various mobile technologies to facilitate language learning. It is not surprising, then, that a growing body of research into using these technologies for language learning has been documented over the past several decades, making mobile assisted language learning (MALL) an emerging research field. We believe that a comprehensive analysis of MALL-related literature is necessary for those interested in MALL research tounderstand current practices and to direct future research in the field.

In order to trace how MALL has evolved in recent years and show
to what extent mobile devices are being used to support language learning, in this article, we analysed the MALL studies published from 2000 to 2012 with regard to the distribution of research topics and theoretical bases, the variety of mobile devices supported by the many mobile platforms and functions, and the diversity of methodological approaches.

A systematic and extensive review of MALL-related literature revealed that research in the field increased at a fast pace from 2008 and reached a peak in 2012. Teaching vocabulary with the use of cell phones and PDAs remained popular over this period. The writing process and grammar
acquisition tended to be neglected in the MALL studies. Furthermore, the need for solid theoretical bases helping to establish a link between theory and practice emerged since a significant number of studies did not base their research on any theoretical framework. MALL research also remained limited by its methodological approaches. Applied and design-based research dominated the field, and these studies generally adopted quantitative research methods.

Ultimately, this study provides an important reference base for future research in the field of MALL with the identification of the most widely examined areas and issues.

Read the full article ‘Research trends in mobile assisted language learning from 2000 to 2012′ here.