Learning Construction Grammars Computationally

Blog post by Jonathan Dunn, Ph.D.

Construction Grammar, or CxG, takes a usage-based approach to describing grammar. In practice, this term usage-based means two different things:

First, it means that idiomatic constructions belong in the grammar. For example, the ditransitive construction “John sent Mary a letter” has item-specific cases like “John gave Mary a hand” and “John gave Mary a hard time.” These idiomatic versions of the ditransitive have distinct meanings. While other grammatical paradigms consider these different meanings to be outside the scope of grammar, CxG argues that idiomatic constructions are actually an important part of grammar.

Second, CxG is usage-based because it argues that we learn grammar by observing actual idiomatic usage: language is more nurture than nature. The role of innate structure is limited to general cognitive constraints such as limits on working memory and the ability to recognize and categorize differences. CxG views language learning as a bottom-up process of systematicity spreading from idiomatic constructions to generalized constructions.

The problem is that the usage-based approach to grammar has struggled to live up to its own expectations. First, a very large number of idiomatic constructions could be posited to resolve any descriptive challenge. As a result, CxG has struggled to show that its grammars are falsifiable. Second, there are potentially large numbers of overlapping idiomatic constructions each with its own distinct meaning; thus, without relying on innate constraints, CxG has struggled to show that its grammars are learnable.

This paper takes a computational approach to learning CxGs in order to resolve these difficulties. Can stable, generalized grammars be learned from actual usage? Without innate structure to limit the space of possible constructions, this approach faces four challenges that make it difficult to learn the best grammar:

First, we do not know how many items or slots a construction contains, so the algorithm must be able to perform segmentation in order to find construction boundaries. Second, CxG allows multiple types of representation (lexical, semantic, syntactic), so the algorithm must be able to find the best way to describe each slot in a construction. Third, CxG allows unfilled slots, so the algorithm must be able to find constructions that do not appear to be continuous. Fourth, slots can have recursive internal structure, so the algorithm must be able to find complex fillers.

The difficulty is that these challenges must be solved with as few language-specific assumptions as possible in order to qualify as usage-based in the senses described above. This paper shows that a learnable and falsifiable usage-based CxG is possible, the first step in reconciling the claims and the actuality of the Construction Grammar paradigm.

Jonathan Dunn, Ph.D., is Research Assistant Professor of Computer Science at Illinois Institute of Technology. His recent article, “Computational learning of construction grammars,” can be accessed without charge until March 15th. Explore all of Language and Cognition by clicking here.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>