Choosing the best word or phrase for a given context from among candidate near-synonyms, such as “slim” and “skinny”, is something that human writers, given some experience, do naturally; but for choices with this level of granularity, it can be a difficult selection problem for computers.
Researchers from Macquarie University in Australia have published an article in the journal Natural Language Engineering, investigating whether they could use machine learning to re-predict a particular choice among near-synonyms made by a human author – a task known as the lexical gap problem.
They used a supervised machine learning approach to this problem in which the weights of different features of a document are learned computationally. Through using this approach, the computers were able to predict synonyms with greater accuracy and reduce errors.
The initial approach solidly outperformed some standard baselines, and predictions of synonyms made using a small window around the word outperformed those made using a wider context (such as the whole document).
However, they found that this was not the case uniformly across all types of near-synonyms. Those that embodied connotational or affective differences — such as “slim” versus “skinny”, with differences in how positively the meaning is presented — behaved quite differently, in a way that suggested that broader features related to the ‘tone’ of the document could be useful, including document sentiment, document author, and a distance metric for weighting the wider lexical context of the gap itself (For instance, if the chosen near-synonym was negative in sentiment, this might be linked to other expressions of negative sentiment in the document).
The distance weighting was particularly effective, resulting in a 38% decrease in errors, and these models turned out to improve accuracy not just on affective word choice, but on non-affective word choice also.