| |
This note will address the development of a generalized 'if P then Q’ schema based on accumulated real-world experience with the perception of causal events, in a constructionist context.
Background
The perhaps non-controversial context in which this work is set holds that a number of high-level social communicative and cognitive functions are based on a generalized mechanism for mapping structure in one domain onto structure in another domain (Dominey in Press). The initial domain of this method of inquiry has been language, developed by the “construction grammar” (CxG) school (e.g. Fillmore 1988, Goldberg 1995). In CxG, grammatical constructions are mappings from sentence form to meaning, and language is considered to be made up of a structured inventory of these mappings. We have recently implemented a neurophysiologically grounded model of sentence processing that learns to perform these mappings (Dominey et al. 2003, Dominey & Boucher 2005). To summarize with an example, a ditransitive sentence form such as “X was Y by Z to Q” maps onto a predicate-argument representation of a ditransitive event of the form Y(Z, X, Q) where Y, Z, X and Q in the sentence correspond to a ditransitive verb, an agent, object and recipient noun, respectively, and Y, Z, X and Q correspond to their respective lexical semantic meanings. For a variety of construction types, we demonstrated that a generalized structure mapping system could then map sentences to meanings, in order to develop abstract representations of the form to meaning mappings, i.e. grammatical constructions (Dominey 2002; Dominey et al. 2003). To validate the psychological reality of this approach, using a human-robot interaction set-up, we asked naive human subjects to perform transitive actions with simple objects in the field of view of a video camera, and at the same time to narrate these events, in order to provide (sentence, event) pairs to teach the system based on the meaning descriptions and their grammatical forms. Predicate-argument meanings were extracted from the video sequence based on a decomposition of the events into discrete sequences of physical contacts between the different entities. Indeed, this provided concrete evidence that event descriptions including assignment of agent, object and recipient roles could proceed in a fully mechanistic manner based on extraction of low level perceptual primitives. We thus demonstrated in a grounded robotic system, that by extracting these predicate-argument meaning representations from actual video scenes of causal transitive actions, and pairing these representations with the corresponding human narration of these events, the system could learn a miniature event language and the associated set of grammatical constructions (Dominey & Boucher 2005)
Objective
The objective of the current exercise is to argue that this kind of form-meaning mapping capability can apply in the domain of conditional sentences and causally related events. Specifically we consider the mapping between an “if X then Y” form and a corresponding meaning that can be extracted from the physical world. The corollary objective is that the resulting “if-then” construction can then be applied to arbitrary non-physically causal relations.
Method
The question immediately arises, what is the nature of this “causal” meaning and where does it come from. Up to this point we have considered event predicates like give(John, Ball, Mary) corresponding to John gave the ball to Mary. In order to capture the required relations for the current purposes, we must now take into consideration the states and state changes that occur as the result of events/action. Before and after an event such as give, the world is in two distinct states, and the event can be considered as a transition between these states. This corresponds to a triplet (enabling-state, action, resulting-state). From a “good old AI” perspective, this is an alternative manner of characterizing a production rule, in which the left hand side is the enabling state, and the right hand side is the resulting state change after the action of the rule is applied. This triplet can be broken into two pairs (enabling-state, action) and (action, resulting-state). In our example, these two pairs correspond to (has(John, ball), give(John, ball, Mary)) and (give(John, ball, Mary), has(Mary, ball)). Anticipating the mapping of language onto these meaning pairs yields the “enabling if-then” “If John has the ball then he can give it to Mary” and the “resulting if-then” “If John gives the ball to Mary then she will have it”.
In a scenario in which an infant (or a robot) observes numerous transitive and ditransitive physical events, it will be exposed not only to these events, but to the enabling initial conditions/states and resulting final conditions/states as well. We can thus propose that through the kind of mechanistic perception of events, a structured set of ordered pairs of the form (enabling-state, action) and (action, resulting-state) will develop. When paired with sentences such as “If you push that then it will fall” or “if you push that, it will fall” the structure-mapping mechanism will begin to create “If X then Y” conditional constructions.
Appeal to Behavioral Development
The concept of form to meaning mapping has a long history in CxG, and we have recently demonstrated how such mappings can be performed (Dominey 2002, Dominey et al. 2003, Dominey & Boucher 2005). The question is whether this approach can extend to conditional constructions, and in particular the “if then” construction that involves mapping onto an (action, resulting-state) representation. For our purposes it would be nice if we could assume that indeed these (action, resulting-state) representations have some psychological reality. One likely avenue would be the link between action and goals in goal directed behavior. Developmental research indicates that agentive experience contributes to the creation of goal directed action representations in infancy, as during the first year of life the production of goal directed actions become progressively frequent, precise and refined (Sommerville et al. 2005). This implies the development of relations linking actions to goals such as in the (action, goal) pair (take(me, ball), have(me, ball)). Another avenue for building up (action, resulting state) representations would be from more basic physical regularities corresponding to observations that if one pushes an object it moves, if one drops an object it falls, etc. Numerous studies of infants’ perception of physical events (e.g. Spelke 1991, Baillargeon 1994) demonstrate that early in the first year of life infants develop representations including those that would correspond to “if you push the block past the edge of the table, it will fall off”, as revealed by infants predictive anticipation of such consequent events.
It is not controversial then to assume that during the first year of life, infants are developing and interpreting the world around them with representations of the form (physical event, resulting state) in a form of predictive relation.
Linking Causal Event Representations to Conditional Constructions
The proposition of a construction-based approach to linking “if-then” forms with predictive event structure is a rather obvious and not new. Indeed, Dancygier (1998) has taken an approach in which conditionals are considered in the framework of constructions (as defined by Fillmore 1988, Goldberg 1995 and others) that pair form with meaning. Dancygier notes that the analysis of conditionals has a long and varied history, with logical, truth conditional analyses vs analyses based on the form of the conditional, with the resulting possibility that many of the accounts do not even share a common view of what a conditional is. Dancygier thus proposes that a construction based analysis that pairs form with interpreted meaning can help. She further notes that in a move to clarify the situation one can distinguish between content, epistemic and speech act domains for interpretation of conditionals. In the content domain the clauses are linked causally, while in the epistemic and speech act domains, they are linked by more abstract relations. Indeed in the epistemic and speech act domains the presence of “if” indicates an instruction for the hearer to treat the assumption in its scope as not being asserted in the usual way. Our position will be that by grounding the development of if-then constructions in the content domain, the system provides itself with a basis for extrapolating to the epistemic and speech act domains. But the current exercise will be limited to the content domain.
Establishing the link
Here we will work through the process. As stated, we assume that the system will be capable of parsing event structure from the visual input, as we have previously demonstrated for physical events, as well as for the enabling and resulting states. We previously categorized the events in terms of contact and its parameters such that, for example, take(Agent, object, source) is recognized as a contact between that agent and the object, co-motion of the agent and object, and the ending of a contact between the object and source. Remaining within this domain of actions that include touch, push, take, and give, the state variable that will perhaps be most salient is that of “possession”. Indeed, goal directed behavior is often “possession oriented”. So how is possession recognized? Interestingly, the notion of possession can be represented without the requirement for any new perceptual primitives, as it is captured by the primitive “contact.” In this context, the actions give and take both produce clear changes in the possession status of different agents and objects. In the internal mechanistic representation extracted from the video sequence, there will thus be the contact (possession) status of elements in the world before a given event, then a representation of the event, followed by a representation of the resulting state. In this context, we can consider the generation of sentence meaning pairs such as:
- “John has the block”, contact(John, block)
- “John gives the block to Mary”, give(John, Block, Mary)
- “Now Mary has the block”, contact(Mary, block)
The observations give(John, Block, Mary) and contact(Mary, block) will enter into a predecessor – successor relation that will co-occur with statistical regularity. Indeed, give(Ag, Obj, Rec) and contact(Rec, Obj) will co-occur with high regularity, allowing a statistical learning mechanism to link them together in a relationship of temporal succession such that the subsequent presence of the predecessor element will yield an internal prediction of the arrival of the successor element. Again, the human developmental manifestation of this type of phenomena is observed in the studies of Spelke, Baillargeon and numerous others indicating that infants will anticipate the perceptual outcome of physical events.
Now the groundwork is laid for establishing the form to meaning mapping between “if then” utterances and these temporally associated elements. The phrasal constituents “John gives the block to Mary” and “Mary has the block” can be inserted into the “if then” structure to yield “If John gives Mary the block then she will have the block.” Related (and perhaps more basic) forms could be “If you push that it moves” or “If you drop that it will fall,” both of which are similarly grounded in the corresponding perceptual event temporal sequences.
The result of this binding of “If p then q” construction to phrases p and q which enter into temporal succession relations is that part of the semantics of this construction is the explicit coding of this temporal succession relation between p and q. That is, the “If p then q” encodes that the prediction relation holds between p and q.
Generalization
Knowledge about prediction/succession relations of the world can acquire representational status in a developing system by at least two distinct pathways. First, via observation of physical events, the system can acquire sufficient data to extract a prediction relation between two events that reliably co-occur in a repeating temporal sequence, as outlined above. Further, as outlined above, this succession or prediction relation can become linked to a grammatical “if p then q” construction where p and q correspond to the respective predecessor and successor elements. This provides the basis for the second pathway for acquiring knowledge of prediction relations – via use of the “if p then q” construction. As formalized by Goldberg (1995), a new construction is created when the intended meaning cannot be derived purely from the combination of the constituents. Thus the “if p then q” construction encodes the added information corresponding to the prediction or succession relation that holds between p and q. Thus, this construction can be used to extend or project this relation onto constituents p and q which have not otherwise been observed to enter into this relation.
Summary
Temporal succession or predictability relations that are robustly present in the physical world can be extracted by essentially mechanistic perceptual systems (e.g. as an extension to that implemented in Dominey & Boucher 2005). Via a form to meaning mapping mechanism, a grammatical “if p then q” form can become associated with the succession/prediction relation. Once this mapping is established it can be generalized, and applied, via use of sentences built for the grammatical construction, to link previously unrelated pairs of constituents via this succession/prediction relation. This provides a basis for establishing the “content” domain (see Dancygier 1998) for interpreting if-then conditionals in terms of concrete causal relations. Future research should then examine whether this provides a stepping stone towards the use and interpretation of such conditionals in the epistemic and speech act domains.
References
Baillargeon, R. (1994). "How do infants learn about the physical world?" Current Directions in Psychological Science, 3, 133-140.
Dancygier, B. (1998) Conditionals and Predication: Time, Knowledge and Causation in Conditional Constructions, Cambridge University Press.
Dominey, P.F. (2002) "Conceptual Grounding in Simulation Studies of Language Acquisition", Evolution of Communication (2000), 4 (1) 57-85.
Dominey, P.F. (2005) "Toward a construction-based account of shared intentions in social cognition", Comment on Tomasello et al. "Understanding and sharing intentions", Behavioral and Brain Sciences (2005) 28:5, 696.
Dominey, P.F. (In Press) "Towards a Construction-Based Framework for Development of Language, Event Perception and Social Cognition: Insights from Grounded Robotics and Simulation" .
Dominey, P.F., Boucher J.D. (2005) "Learning To Talk About Events From Narrated Video in the Construction Grammar Framework", Artificial Intelligence, 167 (2005) 31–61.
Dominey, P.F., Hoen M., Blanc J.M., Lelekov-Boissard, T. (2003) "Neurological basis of language and sequential cognition: Evidence from Simulation, Aphasia and ERP Studies", Brain and Language, 86(2):207-25.
Fillmore, C.J. (1988) The mechanisms of “Construction Grammar,” Berkeley Linguistics Society, 14, 35-55.
Goldberg, A.E. (1995) Constructions: A Construction Grammar Approach to Argument Structure, University of Chicago Press.
Spelke, E.S. (1991). "Physical Knowledge in Infancy: Reflections on Piaget's theory". In S. Carey & R. Gelman (Eds.) The Epigenesis of Mind: Essays on Biology and Cognition (pp. 133-169). Hillsdale, NJ: Erlbaum.
Sommerville, J.A., Woodward A.L., Needham, A. (2005) "Action experience alters 3-month-old infants’ perception of others’ actions". Cognition 96, B1-B11. |
 |
 |
|
Real Infants need more than observation 
Robert Stonjek
Feb 25, 2006 12:01 UT
Infants in the process of the development of their understanding of causation in the procession of events, such as observing the passing of a ball from one child to another, are also exposed to numerous counterintuitive and complex sequences that are not interpretable through simple observation only.
Johnny, for instance, may drop the ball, lob the ball to Mary, or toss it directly up into the air via a hand action imperceptible (or confusing) to the observing infant. Johnny may offer the ball to Mary, who previously had no interest in the ball, then withdraw the ball to howls of protest from Mary who now takes an interest in the ball and demands sharing.
From the numerous possible real-world scenarios, the infant must select only the “Johnny gives the ball to Mary” event as a template for understanding possessional transactions? Unlikely!
The important additional step is the emulation of the behaviour, or some part of it, by the infant ahead of learning. In fact an infant at first learns almost exclusively by experience before observation and experiment and finally from observation alone (which never runs to completion, even in adults.)
Until the age of around four years old, the child references all events back to the self exclusively. Only after the development of ‘theory of mind’ can a child see events from the third person perspective. A good example of this is the so called ‘Sally-Anne’ test.
Sally and Anne are in a room together, observed by the infant. Sally places an object in a box and leaves the room. Anne removes the object and places it in her pocket. Sally returns and the child is asked where she will look to find the object. Children under (on average) 4.5 years old will say “in Anne’s pocket”, older children can see the problem from Sally’s perspective and so answer “in the box”.
Thus there is substantial non-trivial information processing occurring in even the youngest infant’s brain when considering even seemingly obvious transactions, and only after repeated exposure and self experimentation do they learn even the simplest if p then q transactions. Kind regards Robert Karl Stonjek
|
| |
|
0 replies to Real Infants need more than observation:
|
|
|
Note: yellow triangles ( ) indicate new messages that have been posted since your last visit to the site.
|
|