# Implementation of DEvelopmentAl Learning (IDEAL) Course

## Algorithm for learning regularities of interaction

Here is a rudimentary interactionally motivated algorithm that enables the agent to learn and exploit two-step regularities of interaction. Table 33-1 presents its main loop, and Tables 33-2 and 33-3 present subroutines.

In Table 33-1, we chose a set of valences and a particular environment to demonstrate this learning mechanism: this agent is pleased when it receives result r2, but it must learn that the environment returns r2 only if it alternates experiments e1 and e2 every second cycle. Your programming activities will consist of experimenting with other valences and other environments.

Table 33-1: Main loop of an interactionally motivated algorithm that learns two-step sequences of interaction.

```01  createPrimitiveInteraction(e1, r1, -1)
02  createPrimitiveInteraction(e1, r2, 1)
03  createPrimitiveInteraction(e2, r1, -1)
04  createPrimitiveInteraction(e2, r2, 1)
05  while()
06     contextInteraction = enactedInteraction
07     anticipations = anticipate(enactedInteraction)
08     experiment = selectExperiment(anticipations)

09     if (experiment = previousExperiment)
10        result = r1
11     else
12        result = r2
13     previousExperiment = experiment

14     enactedInteraction = getInteraction(experiment, result)
15     if (enactedInteraction.valence ≥ 0)
17     else
18        mood = PAINED
19     learnCompositeInteraction(contextInteraction, enactedInteraction)
```

Table 33-1, lines 01 to 04 initialize the primitive interactions (similar to Page 23) to specify the agent's preferences. In this particular configuration, interactions whose result is r1 have a negative valence, and interactions whose result is r2 have a positive valence. 06: the previously enacted interaction is memorized as the context interaction. 07: computes anticipations in the context of the previous enacted interaction. 08: selects an experiment from the anticipations.

Lines 09 to 13 implement the environment. This new environment was designed to demonstrate the benefit of learning two-step regularities of interaction. If the experiment equals the previous experiment then result is r1, otherwise the result is r2.

Lines 14 to 18: the enacted interaction is retrieved from memory; and the agent is pleased if its valence is positive (similar to Page 23). Line 19: the agent records the composite interaction as a tuple ‹contextInteraction, enactedInteraction› in memory.

Table 33-2 presents a simple version of the learnCompositeInteraction(), anticipate(), and selectExperiment() functions.

Table 33-2: Pseudocode of a simple version.

```01   function learnCompositeInteraction(contextInteraction, enactedInteraction)
02      compositeInteraction = create new tuple(contextInteraction, enactedInteraction)
03      if compositeInteraction already in the list of known interactions
04         do nothing
05      else
06         add compositeInteraction to the list of known interactions

10   function anticipate(enactedInteraction)
11      for each interaction in the list of known interactions
12          if interaction.preInteraction = enactedInteraction
13             create new anticipation(interaction.postInteraction)
14      return the list of anticipations

20   function selectExperiment(anticipations)
21      sort the list anticipations by decreasing valence of their proposed interaction.
22      if anticipation[0].interaction.valence ≥ 0
23         return anticipation[0].interaction.experiment
24      else
25         return another experiment than anticipation[0].interaction.experiment
```

The anticipate() function checks for known composite interactions whose pre-interactions match the last enacted primitive interaction; we call these the activated composite interactions. A new object, anticipation, is created for each activated composite interaction. The activated composite interaction's post-interaction is associated with this anticipation as the anticipation's proposed interaction. The selectExperiment() function sorts the list of anticipations by decreasing valence of their proposed interaction. Then, it takes the fist anticipation (index [0]), which has the highest valence in the list. If this valence is positive, then the agent wants to re-enact this proposed interaction, leading to the agent choosing this proposed interaction's experiment.

This solution works in a very simple environment that generates no competing anticipations. However, for environments that may generate competing anticipations, we want the agent to be able to balance competing anticipations based on their probabilities of realization. We may have an environment that, in a given context, makes all the four interactions likely to happen but with different probabilities. For example, in the context in which e1r1 was enacted, both e1 and e2 may result sometimes in r1 and sometimes in r2. But, e1 is more likely to result in r2 than e2. To handle this kind of environment, we associate a weight to composite interactions, as shown in Table 33-3.

Table 33-3: Pseudocode for weighted anticipations.

```01   function learnCompositeInteraction(contextInteraction, enactedInteraction)
02      compositeInteraction = create new tuple(contextInteraction, enactedInteraction)
03      if compositeInteraction already in the list of known interactions
04         increment compositeInteraction's weight
05      else
06         add compositeInteraction to the list of known interactions with a weight of 1

10   function anticipate(enactedInteraction)
11      for each interaction in the list of known interactions
12         if interaction.preInteraction = enactedInteraction
13            proposedExperiment = interaction.postInteraction.experiment
14            proclivity = interaction.weight * interaction.postInteraction.valence
15            if an anticipation already exists for proposedExperience
16               add proclivity to this anticipation's proclivity
17            else
18               create new anticipation(proposedExperiment) with proclivity proclivity
19      return the list of anticipations

20   function selectExperiment(anticipations)
21      sort the list anticipations by decreasing proclivity.
22      if anticipation[0].proclivity ≥ 0
23         return anticipation[0].experiment
24      else
25         return another experiment than anticipation[0].experiment
```

Now, the learnCompositeInteraction() function either records or reinforces composite interactions. The anticipate() function generates an anticipation for each proposed experiment. Anticipations have a proclivity value computed from the weight of the proposing activated composite interaction multiplied by the valence of the proposed interaction. As a result, the anticipations that are the most likely to result in the primitive interaction that have the highest valence receive the highest proclivity. In the example above, in the context when e1r1 has been enacted, the agent learns to choose e1 because it will more likely result in a positive interaction than e2.