Français | English
Colloques       Bibliographie       Liens       Nous


Constructions underlying theory of mind and language
Peter Ford F. Dominey


 Modérateurs : Peter Ford F. Dominey, Anne Reboul, Gloria Origgi
 

1. Introduction

Both language and ToM are uniquely human, both involve manipulation of complex embedded structures, and both are subject to debate concerning the articulation of genetic and developmental processes. Likewise, there appears to be a crucial co-evolutionary relation between them that is at the crux of this conference. Does this relation extend to the sharing of a common underlying processing mechanism? The objective of this paper is to argue for the idea that the notion of "construction" as a mapping between representations can be extended from language and grammatical constructions to the domain of social behavior and theory of mind. In this context, we can consider that language is about mapping sentences to meaning, and theory of mind is about mapping social/behavioral contexts onto other's behavior (in terms of their underlying mental states).

From the language perspective, the construction grammar framework has been an important component of the functionalist approach to linguistics (see Goldberg 1995, Croft 2001, Tomasello 2003). The essential claim in this framework is that language is learned as a structured inventory of form to meaning mappings. In making this claim the framework exploits the inherent richness of the structure of meaning in form to meaning relations, and thus reduces the requirements on an innate universal grammar. The question posed here will be to what extent can the benefits of this construction model be applied to ToM. That is, can some or all of the ToM capability be provided by a learning mechanism that acquires the mappings between social/behavior contexts and their outcomes in progressively abstract and compositional manner?

Corcoran and Frith (2003) present results supporting the theory that at least part of a ToM capability might rely on analogical mapping between the current situation and autobiographical memory of similar situations in order to draw inferences about the mental states of others. This suggests the notion of ToM constructions, that is, mappings between behavioral scenario structure and the corresponding social/intentional outcome providing the point of departure for a construction based analysis of ToM, analogous to that in language. The rest of the paper will sketch out a brief overview of the construction grammar framework for language, and then a view of how the underlying structure mapping mechanism can be applied to provide a construction based theory of mind, ConTom.

2. Brief Review of Language in the Construction Grammar Framework.

As mentioned above, a significant scholarly effort has gone into the elaboration of the construction grammar framework for language that can be found in references including but clearly not limited to Goldberg (1995, 2003), Croft (2001), and Tomasello (2003). One of the most attractive elements of this framework is that the poverty of the stimulus argument, that has been so central in evoking the need for a highly pre-specified Universal Grammar (UG), is significantly weakened by reconsidering the powerful learning and attention sharing mechanisms and the richness of the stimulus (Goldberg 2003, Tomasello 2003, Pullman & Schultz 2002). In this context, the current analysis will attempt to identify perceptual primitives required to extract meaning from the environment, and structure mapping mechanisms that construct the mappings between language and this meaning. The goal will be to accommodate (sentence, meaning) pairs that embody some of the interesting aspects of language including its embedded recursive structure, as characterized in the relativised sentence “The boy kicked the ball that broke the window”.

In the technical sense, both words and larger phrasal patterns are constructions, as both pair form with meaning. The mapping from words to meaning retains a rather classical lexical mapping, and phrasal constructions can also have this holistic nature as in idiomatic phrases such as “Gimme that”, or “Don't beat around the bush”. Phrasal constructions can also be abstract and generative, as in the English transitive construction in which the arguments AGENT, ACTION and OBJECT can be instantiated by an open set of nouns and verbs to generate diverse sentences such as “John kicked the ball” and “The cat chased the dog”. The transition from these abstract argument constructions to generative, recursive compositionality results from the statistical pattern finding and extraction of phrasal and clausal structures such as noun phrases that can then occupy argument positions in existing constructions, thus yielding a flexible compositional capability. This type of capability has been demonstrated in a hybrid neural network architecture by Miikkulainen (1996).

As stated by Goldberg (2003) a characteristic aspect of the construction grammar framework is that it is a “what you see is what you get” approach to syntax, in which there are no underlying levels of syntax, nor phonologically empty elements posited. In contrast to formally oriented generative frameworks, this view of language places a large emphasis on general structural mapping mechanisms, and the importance of the structure of meaning. The next paragraphs outline a “mechanistic” view of how this could work.

2.1 Perceptual Primitives

Events, Agents and Objects: In order to learn sentence to meaning mappings, the child/system must have a capability to extract meaning from perception. This includes the ability to discriminate between distinct objects and to perceive and represent events in terms of their agents and objects in a predicate-argument format. Given these capabilities, the infant can represent the event described by the sentence “The boy kicked the ball” in some kind of predicate-argument format “kick(boy, ball)”. This corresponds to the first level of Leslie’s theory of agency, the Theory of Body mechanism (ToBy) that characterizes the “mechanical” aspects of agents and objects and their interactions, as characterized for example by physical collision events. From this perspective, already at 6 months of age, children are capable of processing causal events with agents, objects and actions and using these "naive physics" representations to understand simple action scenarios that involve goal-directed reaching for objects (e.g. Woodward 1998). Similarly, infants in this same age range display rather sophisticated knowledge of the physical properties of objects that allows them to "parse" and understand dynamic scenes with multiple objects (Carey and Xu 2000).

Demonstrating this functionality, computer vision systems now exist that can extract force dynamic information including physical contact and relative velocity of objects in order to parse events and their thematic arguments from visual scenes (Siskind 2001, Dominey 2003a).

Referential Ambiguity: While it seems apparent that infants can extract and construct meaning from the perceptual world (Mandler 1999), there will potentially be massive meaning to choose from in that perceptual world, and so in a language learning context, the infant must somehow zero-in on what the speaker is talking about. Joint attention refers to this capability for the infant to follow (and later direct) the gaze of the speaker in order to establish a shared frame of reference around a third object (reviewed in Tomasello 1999). This capability begins to emerge well before the first birthday, and allows the formation of a triadic relation between the speaker, the infant, and the object of shared attention. It should be made clear that this joint attention mechanism makes a substantial and crucial reduction in referential ambiguity that will be required for language acquisition. It is important to distinguish this joint attention from the shared attention mechanism (SAM) of Baron-Cohen (1995) that generates embedded relations such as “John sees(I see the girl)”, and instead to associate it with the eye direction detection (EDD) capability proposed by Baron-Cohen. It is also worth mentioning that even in blind children who learn language, joint attention still plays a crucial role, though not – of course – in the visual modality.

2.2 Structure mapping

Given these meaning extraction and attentional capabilities, the goal of the structure mapping system is to learn to associate different grammatical forms (e.g. active, passive, relative) with their associated meaning structures. Concretely such a system is provided with (sentence; meaning) pairs, such as (John hit the ball; hit(John, Ball)), (The ball was hit by John; hit(John, Ball)), (The ball that John hit broke the window; hit(John, ball), broke(ball, window)). Based on corpora-scale exposure to such examples, the system should extract the underlying structural mapping from grammatical form to meaning. In order to achieve this, the system should exploit the cross-linguistic regularity (Bates et al. 1982) that cues including case and grammatical markings (fixed or free) and word order uniquely identify each grammatical form and thus allow each construction type to be uniquely associated with its corresponding form to meaning mapping. We have demonstrated that such a mechanism can accommodate a variety of grammatical constructions in English, including embedded relative clauses, and demonstrates the ability to systematically generalize to new sentences (Dominey 2003b), and to extend without modification to accommodate Japanese, despite the significant typological differences between these languages (Dominey & Inui submitted). This limited overview is not at all meant to be complete, but rather to outline the theoretical framework CG for potential application to ToM.

3. A Construction Based Approach to Theory of Mind (ConTom)

Thus, given the above analysis of language in the construction context, we can now apply an analogous analysis to the domain of ToM. Again, from a functional perspective we will consider that ToM refers to the capacity to interpret, predict and explain the behavior of others (in terms of their underlying mental states). The inputs to the ToM system are behavioral situations, the processing is the recognition and analysis of these situations and the output is the prediction/explanation of the of the future behavior. Among the desired behaviors will be to interpret the goals and behavior of others, as revealed by imitation, and to demonstrate ability to exploit embedded propositional attitudes as revealed by the use of interpreting other’s actions based on their false beliefs. As suggested by Corcoran and Frith (2003), ToM capacities appear to rely at least in part on reference to autobiographical memory that allow inferences to made about the mental states of others. The next paragraphs provide an outline of how this could work in the construction framework.

3.1. Perceptual Primitives

In addition to the joint attention and event processing capabilities required for language processing as described above, it will be seen below that the child should also be equipped to detect “satisfaction” or “happiness” when an observed agent has successfully completed a goal. This will be important for post-hoc linkage of preconditions of actions with their goals.

3.2 Structure mapping

Here we will consider a capability that allows mapping of initial states or situations to behavior outcome states. Again, as suggested by Corcoran & Frith (2003) these mappings can be stored in an “autobiographical” memory that we can refer to as a “social construction inventory”. Then, new situations can be interpreted based on (analogical) reference to constructions in this inventory. These constructions will be of the form (initial observed behavioral state; outcome behavioral state). When confronted with a new situation, by finding the closest match to this new situation in the “initial state” component of the social construction inventory, the child can then use the associated outcome in order to infer or predict the outcome for the new situation. Pattern finding processes will operate on the contents of the social construction inventory in order to generate progressively more abstract and generalized ToM constructions in the same manner as in the grammatical construction framework. Interestingly, in a population of schizophrenic patients, Corcoran and Frith (2003) observed correlated impairments in autobiographical memory, and in theory of mind task performance. We now consider examples of how a construction based theory of mind system might operate.

Goal attribution: Here we consider a behavioral scenario in which the child observes his Mother look at, reach for and pick up a bottle. Repeated exposure to this kind of event sequence will allow the child to learn to predict that Mother will subsequently take the bottle that she is currently looking at and reaching for. Technically, this corresponds to the mapping ((Look-at(Mother, bottle), Reach-for(Mother, bottle)); Pick-up(Mother, bottle)). As suggested by Leslie (1994), the outcome of the action can be entered into the construction representation as the goal or outcome state. Initially this may correspond to a “holo-construction” strictly linked to the Mother and bottle. As in grammatical constructions, with exposure and pattern finding, this holo-construction be extended first to include Mother and other objects, and then to other agents, and actions as well. Functionally, this capability should allow the child to imitate an uncompleted action in which the adult tries and fails to achieve a well know action, thus indicating the child’s ability to infer the intended goal (Bellagamba & Tomasello 1999).

Attitude attribution: How can this attribution of goals be extended to behavior that reflects attribution of attitudes, allowing the child to have the meta-representation that “John wants Bill to give him an apple.” Let us work through an example: Suppose that the child observes a scene in which Bill takes an apple out of a sack, and John approaches, looks at the apple then at Bill. Bill then gives the apple to John, who takes it and smiles. If the infant is sensitive to the meaning of the smile, then she will conclude that John is happy to have the apple, and that a goal has been satisfied. In the construction framework, the initial state is characterized by the Bill’s possession of the apple, and John’s approach behavior. The state transition is the giving of the apple from Bill to John, and the outcome is John’s possession of the apple, and his satisfaction. In a repetitive learning situation, the child will learn the mapping from the initial state to outcome state. The backward linking of John’s final satisfaction to his approach behavior in the initial state to allows the child/system to define a perceptual correlate of “wanting” in terms of goal satisfaction. Again, the resulting ability to determine that John wants Bill to give him the apple will start as a fixed holo-construction, that can subsequently generalize to different objects and participants, based on the exposure of the child to training examples, resulting in a generalized approach-based goal-attribution construction.

Extension to false beliefs: Imagine now that one day, the child sees that Bill has no apple in his sack. As usual, John approaches Bill in the same manner, “wanting” Bill to give him the apple that is (not) in his sack. Though John approaches Bill in his standard manner, Bill has no apple to give, and John is left dissatisfied. John approaches Bill and expects an apple based on his false belief that there is an apple in the bag. This learning example for a false belief situation provides data for the autobiographical social construction inventory that Bill acts on his false belief that John has an apple, despite that fact that the child knows that there is no apple. Again, through the operation of statistically based pattern finding, this will result in the progressive development of a generalized false belief capability.

4.Discussion and Conclusions

From the perspective of co-evolution, one result of this exercise is the suggestion that the meta-representational aspect of theory of mind was not a precursor for language with recursive embedded structure (see Reboul, this conference, and discussion). Leslie likewise notes that structural linguistic knowledge and language processing mechanisms are essentially independent of ToMM, though ToMM’s development may impact on communicative language use (Leslie 1994). What is required however, is a joint attention capability – visual or otherwise, that allows teacher and learner to talk about a common referential object.

With respect to the proposed construction based theory of mind (ConTom), we can observe that this is comparable to the teleological reasoning in infancy as described by Gergely and Csibra (2003). This is a non-mentalistic system that allows inference about other’s goal directed actions based on perceptual aspects of reality without attributing intentional mental states to the actor’s mind. The question then is whether this type of teleological system can extend naturally to a mentalistic system that allows meta-representations including propositional attitudes about propositional attitudes. There are two components to the response.

First, purely from the perspective of structural mapping, given a representation of the form (believes(John, wants(Mary, Apple))) can the ConTom system accommodate such a representation as a component the outcome of a ToM construction? I believe that the response is an obvious yes: the construction framework is based on a generic analogical mapping capability that maps well formed inputs onto their corresponding outputs, whether these (input; output) pairs are (sentence; meaning) pairs in language, or (social context; behavioral/mental outcome) in the ToM domain. As long as the outcome is a well formed and reproducible it can be paired with its initial state in the social construction inventory.

Second, from the perspective of generating meta-representations like (believes(John, wants(Mary, Apple))), can the ability to generate such representation be developed directly from the teleological system, or does it require an additional mind reading module corresponding to something like Leslie’s ToMM2? In a certain sense we approach issues related to Searle’s (1980) Chinese room in that this construction based approach can likely be extended to yield behavior indicative of a m-representation theory of mind (as indicated in the false belief example above), with the open issue of whether “understanding” has really been captured in computational terms (Horst 2003). While the response is beyond the scope of this paper, we can speculate that in the developing child, if self beliefs (e.g. “I thought it was raining”) become accessible as elements contributing to social constructions, then they can provide the grounding for understanding and generalization to the application of such representations to others.

In conclusion, the stated goal of this exercise was to demonstrate that a structure mapping capability that has been described for grammatical constructions in language can generalize to explain aspects of human theory of mind. In agreement with the theory and results of Corcoran and Frith (2003) on the potential role of autobiographical in theory of mind, I have worked through a demonstration sketch that supports the stated goal. It is of significant interest that this further elevates the status of a generalized analogical structure mapping capability in the hierarchy of cognitive functions by demonstrating the effectiveness of this analogical mapping as a basis for language and theory of mind construction frameworks.

References:

Baron-Cohen, S. 1995 Mindblindness , MIT Press.

Bates E, McNew S, MacWhinney B, Devescovi A, Smith S (1982) “Functional constraints on sentence processing: A cross-linguistic study” Cognition, 11, 245-299.

Bellagamba F, Tomasello M (1999) “Re-enacting intended acts: comparing 12- and 18-Month-Olds”, Infant Behavior and Development 22(2) 277-282.

Carey S, Xu F (2001) “Infant’s knowledge of objects: beyond object files and object tracking” Cognition, 80, 179-213.

Corcoran R, Frith CD (2003) "Autobiographical memory and theory of mind: evidence of a relationship in schizophrenia" Psychological Medicine, 33 897-905

Croft W (2001) Radical construction grammar: syntactic theory in typological perspective, Oxford: Oxford University Press.

Dominey P.F. (2003) “Learning Grammatical Constructions from Narrated Video Events for Human-Robot Interaction”, IEEE Conf. On Humanoid Robots, Karlsruhe Germany.

Dominey, P.F. (2003a) “Learning Grammatical Constructions in a Miniature Language from Narrated Video Events”, Proceedings of the 25th Annual Meeting of the Cognitive Science Society, Boston.

Dominey P.F., Inui T (2004) “Miniature Language Learning via Mapping of Grammatical Structure to Visual Scene Structure in English and Japanese”, submitted.

Frith,C.D.,and Frith,U.1999 “Interacting minds - a biological basis” Science 286:1692 -1695.

Gergely G, Csibra G (2003) “Teleological reasoning in infancy: the naïve theory of rational action”, Trends in Cognitive Science, 7(3)287-292.

Goldberg, A. (1995) Constructions University Chicago Press, Chicago and London.

Goldberg, A. (2003) “Constructions: a new theoretical approach to language”, Trends in Cognitive Science, Volume 7, Issue 5 , May 2003, Pages 219-224.

Horst, S., "The Computational Theory of Mind", The Stanford Encyclopedia of Philosophy (Fall 2003 Edition), Edward N. Zalta (ed.), URL = http://plato.stanford.edu/archives/fall2003/entries/computational-mind/>.

Kotovsky L., Baillargeon R., “ The development of calibration-based reasoning about collision events in young infants”. 1998, Cognition, 67, 311-351.

Leslie A.M. (1994) “ToMM, ToBy, and Agency: Core architecture and domain specificity”, in Mapping the mind: domain specificity in cognition and culture (LA Hirschfeld & SA Gelman, Eds.)

Mandler J. (1999) éPreverbal representation and language”, in Bloom et al. (eds) Language and Space, 365-384.

Miikkulainen R. (1996) “Subsymbolic case-role analysis of sentences with embedded clauses”. Cognitive Science, 20: 47-73.

Pullman G.K., Sholz B.C. (2002) “Empirical assessment of stimulus poverty arguments” Linguistic Review 19, 9-50.

Reboul A (2004) “Evolution of Language from Theory of Mind or Coevolution of Language and Theory of Mind?” at: http://www.interdisciplines.org/coevolution .

Searle, John. (1980) "Minds, Brains and Programs" Behavioral and Brain Sciences 3:417-424.

Siskind JM (2001) “Grounding the Lexical Semantics of Verbs in Visual Perception Using Force Dynamics and Event Logic”, Journal of Artificial Intelligence Research, volume 15, pp. 31-90.

Tomasello, M. (2003) Constructing a language: A usage-based theory of language acquisition. Harvard University Press, Cambridge.

Tomasello, M. (1999) The cultural origins of human cognition Harvard University Press, Cambridge.

Woodward A.L. (1998) “Infants selectively encode the goal object of an actor's reach” Cognition 69 1-34.

Ouvrir Statistical learning pervades cognition (2 réponses)
Sergio Navega, 3 mars 2004 13:25 UT
Ouvrir Meta-representation takes more than embedding (8 réponses)
Dan Sperber, 2 mars 2004 15:41 UT
 
Nota: les flèches jaunes (   ) indiquent de nouveaux messages mis en ligne depuis votre dernière visite.
 
© 2008 interdisciplines.