This work proposes a probabilistic basis for natural language understanding models. It has become apparent that syntax and semantics need to be highly integrated, especially to understand constructs like nominal compounds, but inadequate modelling tools have hindered efforts to replace the traditional parser-interpreter pipeline architecture. Qualitatively, associative frameworks like spreading activation and marker passing produce the desired interactions, but their reliance on ad hoc numeric weights make scaling them up to increasingly large domains difficult. On the other hand, statistical approaches ground numeric measures over large domains, but have thus far failed to incorporate the structural generalizations found in traditional models. A major reason for this is the inability of most statistical language models to represent compositional constraints; this is related to the variable binding problem in neural networks.

The proposed model attacks these issues from three directions. First, it distinguishes two fundamentally different mental processing modes: automatic and controlled inference. Automatic inference is pre-attentive, subconscious, reflexive, fairly instantaneous, associative, and highly heuristic; this delimits the domain of parallel interactive processing. Automatic inference is motivated by both resource bounds and empirical criteria, and is responsible for much if not most of parsing and semantic interpretation.

Second, the nature of mental representations is defined more precisely. The proposed cognitive ontology includes mental images, lexical semantics, conceptual, and lexicosyntactic modules. Automatic inference extends over all modules. The modular ontology approach accounts for a range of subtle meaning distinctions, is consistent with psycholinguistic and neural evidence, and helps reduce the complexity of the concept space.

Third, probability theory provides an elegant basis for evidential interpretation, to model automatic inference in language understanding. A uniform representation for all the modules is proposed, compatible with both feature-structures and semantic networks. Probabilistic, associative extensions are then made to those frameworks. Theoretical and approximate maximum entropy methods for evaluating probabilities are proposed, as well as the basis for a normative distribution for learning and generalization.




Download Full History