Integrating experiential and distributional data to learn semantic representations.

Abstract
The authors identify 2 major types of statistical data from which semantic representations can be learned. These are denoted as experiential data and distributional data. Experiential data are derived by way of experience with the physical world and comprise the sensory-motor data obtained through sense receptors. Distributional data, by contrast, describe the statistical distribution of words across spoken and written language. The authors claim that experiential and distributional data represent distinct data types and that each is a nontrivial source of semantic information. Their theoretical proposal is that human semantic representations are derived from an optimal statistical combination of these 2 data types. Using a Bayesian probabilistic model, they demonstrate how word meanings can be learned by treating experiential and distributional data as a single joint distribution and learning the statistical structure that underlies it. The semantic representations that are learned in this manner are measurably more realistic-as verified by comparison to a set of human-based measures of semantic representation-than those available from either data type individually or from both sources independently. This is not a result of merely using quantitatively more data, but rather it is because experiential and distributional data are qualitatively distinct, yet intercorrelated, types of data. The semantic representations that are learned are based on statistical structures that exist both within and between the experiential and distributional data types.
Funding Information
  • European Union (028714)
  • Biotechnology and Biological Sciences Research Council (31/S18)
  • Economic and Social Research Council (RES-620-28-6001)

This publication has 0 references indexed in Scilit: