Here is a rudimentary interactionally motivated algorithm that enables the agent to learn and exploit two-step regularities of interaction. Table 33-1 presents its main loop, and Tables 33-2 and 33-3 present subroutines.
In Table 33-1, we chose a set of valences and a particular environment to demonstrate this learning mechanism: this agent is pleased when it receives result r2, but it must learn that the environment returns r2 only if it alternates experiments e1 and e2 every second cycle. Your programming activities will consist of experimenting with other valences and other environments.
Table 33-1: Main loop of an interactionally motivated algorithm that learns two-step sequences of interaction.
01 createPrimitiveInteraction(e1, r1, -1) 02 createPrimitiveInteraction(e1, r2, 1) 03 createPrimitiveInteraction(e2, r1, -1) 04 createPrimitiveInteraction(e2, r2, 1) 05 while() 06 contextInteraction = enactedInteraction 07 anticipations = anticipate(enactedInteraction) 08 experiment = selectExperiment(anticipations) 09 if (experiment = previousExperiment) 10 result = r1 11 else 12 result = r2 13 previousExperiment = experiment 14 enactedInteraction = getInteraction(experiment, result) 15 if (enactedInteraction.valence ≥ 0) 16 mood = PLEASED 17 else 18 mood = PAINED 19 learnCompositeInteraction(contextInteraction, enactedInteraction)
Table 33-1, lines 01 to 04 initialize the primitive interactions (similar to Page 23) to specify the agent's preferences. In this particular configuration, interactions whose result is r1 have a negative valence, and interactions whose result is r2 have a positive valence. 06: the previously enacted interaction is memorized as the context interaction. 07: computes anticipations in the context of the previous enacted interaction. 08: selects an experiment from the anticipations.
Lines 09 to 13 implement the environment. This new environment was designed to demonstrate the benefit of learning two-step regularities of interaction. If the experiment equals the previous experiment then result is r1, otherwise the result is r2.
Lines 14 to 18: the enacted interaction is retrieved from memory; and the agent is pleased if its valence is positive (similar to Page 23). Line 19: the agent records the composite interaction as a tuple ‹contextInteraction, enactedInteraction› in memory.
Table 33-2 presents a simple version of the learnCompositeInteraction(), anticipate(), and selectExperiment() functions.
Table 33-2: Pseudocode of a simple version.
01 function learnCompositeInteraction(contextInteraction, enactedInteraction) 02 compositeInteraction = create new tuple(contextInteraction, enactedInteraction) 03 if compositeInteraction already in the list of known interactions 04 do nothing 05 else 06 add compositeInteraction to the list of known interactions 10 function anticipate(enactedInteraction) 11 for each interaction in the list of known interactions 12 if interaction.preInteraction = enactedInteraction 13 create new anticipation(interaction.postInteraction) 14 return the list of anticipations 20 function selectExperiment(anticipations) 21 sort the list anticipations by decreasing valence of their proposed interaction. 22 if anticipation.interaction.valence ≥ 0 23 return anticipation.interaction.experiment 24 else 25 return another experiment than anticipation.interaction.experiment
The anticipate() function checks for known composite interactions whose pre-interactions match the last enacted primitive interaction; we call these the activated composite interactions. A new object, anticipation, is created for each activated composite interaction. The activated composite interaction's post-interaction is associated with this anticipation as the anticipation's proposed interaction. The selectExperiment() function sorts the list of anticipations by decreasing valence of their proposed interaction. Then, it takes the fist anticipation (index ), which has the highest valence in the list. If this valence is positive, then the agent wants to re-enact this proposed interaction, leading to the agent choosing this proposed interaction's experiment.
This solution works in a very simple environment that generates no competing anticipations. However, for environments that may generate competing anticipations, we want the agent to be able to balance competing anticipations based on their probabilities of realization. We may have an environment that, in a given context, makes all the four interactions likely to happen but with different probabilities. For example, in the context in which e1r1 was enacted, both e1 and e2 may result sometimes in r1 and sometimes in r2. But, e1 is more likely to result in r2 than e2. To handle this kind of environment, we associate a weight to composite interactions, as shown in Table 33-3.
Table 33-3: Pseudocode for weighted anticipations.
01 function learnCompositeInteraction(contextInteraction, enactedInteraction) 02 compositeInteraction = create new tuple(contextInteraction, enactedInteraction) 03 if compositeInteraction already in the list of known interactions 04 increment compositeInteraction's weight 05 else 06 add compositeInteraction to the list of known interactions with a weight of 1 10 function anticipate(enactedInteraction) 11 for each interaction in the list of known interactions 12 if interaction.preInteraction = enactedInteraction 13 proposedExperiment = interaction.postInteraction.experiment 14 proclivity = interaction.weight * interaction.postInteraction.valence 15 if an anticipation already exists for proposedExperience 16 add proclivity to this anticipation's proclivity 17 else 18 create new anticipation(proposedExperiment) with proclivity proclivity 19 return the list of anticipations 20 function selectExperiment(anticipations) 21 sort the list anticipations by decreasing proclivity. 22 if anticipation.proclivity ≥ 0 23 return anticipation.experiment 24 else 25 return another experiment than anticipation.experiment
Now, the learnCompositeInteraction() function either records or reinforces composite interactions. The anticipate() function generates an anticipation for each proposed experiment. Anticipations have a proclivity value computed from the weight of the proposing activated composite interaction multiplied by the valence of the proposed interaction. As a result, the anticipations that are the most likely to result in the primitive interaction that have the highest valence receive the highest proclivity. In the example above, in the context when e1r1 has been enacted, the agent learns to choose e1 because it will more likely result in a positive interaction than e2.
See public discussions about this page or start a new discussion by clicking on the Google+ Share button. Please type the #IDEALMOOC033 hashtag in your post: