Multiword Expressions in Syntactic and Semantic Parsing Environments is the title of a DFG-funded research project lead by Laura Kallmeyer that is being conducted at the Heinrich-Heine-Universität Düsseldorf from 07/2022 to 07/2025.
Project Description
Multiword expressions (MWEs) are combinations of multiple lexemes whose overall properties are not readily predictable by those of their components. A well-known example is the expression kick the bucket, whose overall meaning (‚die‘) cannot be derived by combining the meanings of its constitutents. This idiosyncrasy, which is usually termed idiomaticity in the MWE context, makes MWEs a challenge for natural language processing (NLP), and their ubiquity forces us to find ways to account for them. Verbal MWEs (VMWEs) are particularly challenging because of properties like discontinuity, overlap, varying word order, and syntactic or semantic ambiguity. Recently, a considerable amount of NLP research has been done on identifying MWEs and on combining MWE identification with syntactic parsing (see, for example, the European PARSEME project). This, together with a representation of their idiomatic meanings, is a prerequisite for successful semantic parsing of MWEs. Most current semantic parsers, however, either do not cover non-literal readings of MWEs at all or only in a very limited way. MWE-SemPrE addresses this problem and aims at improving MWE identification approaches, integrating MWE identification with syntactic parsing, and extending the approach to an MWE-aware semantic parser. The focus will be mainly on VMWEs and architectures based on Tree Adjoining Grammars (TAG), Combinatory Categorial Grammar (CCG) and Discourse Representation Theory (DRT), in combination with an MWE-specific extension of the Parallel Meaning Bank (PMB). A central idea is to use similar methods for syntactic and semantic parsing based on some notion of extended domain of locality and supertagging. We hypothesize that a separate supertagging step, which provides (syntactic resp. semantic) predicate argument templates provides useful units for the correct treatment of MWEs.