Blog post written by Vivi Nastase based on the special issue ‘Graphs and Natural Language Processing’ in the journal Natural Language Engineering.
Graph structures naturally model connections. In natural language processing (NLP) connections are ubiquitous, on anything between small and web scale: between words — as structural/grammatical or semantic connections; between concepts in ontologies or semantic repositories; between web pages; between entities in social networks. Such connections are relatively obvious and the parallel with the graph structures straight-forward. While less obvious, with a little mathematical imagination, graphs can be applied to typo correction, machine translation, document structuring, sentiment analysis and more.
Graphs can be extremely useful for revealing regularities and patterns in the data. Graph formalisms have been adopted as an unsupervised learning approach to numerous problems – such as language identification, part-of-speech (POS) induction, or word sense induction – and also in semi-supervised settings, where a small set of annotated seed examples are used together with the graph structure to spread their annotations throughout the graph. Graphs’ appeal is also enhanced by the fact that using them as a representation method can reveal characteristics and be useful for human inspection, and thus provide insights and ideas for automatic methods.
We find not only the standard graphs — consisting of a set of nodes and edges that connect pairs of nodes — but also heterogeneous graphs (to model the network of tweeters and their tweets, or the network of articles, their authors and references), hypergraphs (which allow edges with more than two nodes, that could model grammatical rules for example), graphs with multi-layered edges, to fit more complex problems and data.
In the special issue we include a survey of graph-based methods in natural language processing, to show both the variety of graph formalisms and of tasks they can be useful for. The core of the issue consists of four articles, each of which showcases and exploits a different facet of graphs for different tasks in NLP: graphs as a framework for the organization of complex knowledge; using the graph structure of knowledge repositories for the computation of semantic relatedness between texts; revealing and exploiting sub-structures in word co-occurrence graphs for approximating word senses and performing sense-level translations; tracking changes in word co-occurrence graphs to identify diachronic sense changes.