# The mathematics behind HeiDI

The HeiDI model has four major components: 1) the acquisition of reciprocal associations between stimuli, 2) the pooling of those associations into stimulus activations, 3) the distribution of those activations into stimulus-specific response units, and 4) the generation of responses.

## 1 - Acquiring reciprocal associations

Whenever a trial is given, HeiDI learns associations among stimuli. The association between two stimuli, $$i$$ and $$j$$ is denoted via $$v_{i,j}$$. The association $$v_{i,j}$$ represents a directional expectation: the expectation of $$j$$ after being presented with $$i$$. Furthermore, its value represents the nature of the effect that $$i$$ has over the representation of $$j$$. If positive, the presentation of $$i$$ “excites” the representation of $$j$$. If negative, the presentation of $$i$$ “inhibits” the representation of $$j$$.

HeiDI not only learns “forward” associations between stimuli, but also their reciprocal, or “backward” associations. Thus, if organisms are presented with $$i \rightarrow j$$, organisms not only learn about $$v_{i,j}$$, but also about $$v_{j, i}$$, or the expectation of receiving $$i$$ after being presented with $$j$$. Note that, for the sake of brevity, the learning equations below are only specified for forward associations.

### 1.1 - The stimulus expectation rule

HeiDI generates expectations about stimuli. The expectation of stimulus $$j$$ ($$e_j$$) is expressed as

$\tag{Eq. 1} e_j = \sum_{k}^{K}x_kv_{k,j}$

where $$K$$ is the set containing all stimuli in the experiment, and $$x_k$$ is a quantity denoting the presence or absence of stimulus $$k$$ (1 or 0, respectively)1.

### 1.2 - Learning rule

HeiDI learns the appropriate expectations via error-correction mechanisms. After trial $$t$$, the association between stimuli $$i$$ and $$j$$ is expressed as

$\tag{Eq. 2} v_{i,j, t} = v_{i,j, t-1} + \Delta v_{i,j, t}$

where $$v_{j,i, t-1}$$ is the forward association between $$i$$ and $$j$$ on trial $$t-1$$, and $$\Delta v_{i,j, t}$$ is the change in that association as a result of trial $$t$$. That delta term uses a pooled error term and is expressed as

$\tag{Eq. 3} \Delta v_{i,j} = x_i\alpha_i(x_jc\alpha_j - e_j)$ where $$\alpha_i$$ and $$\alpha_j$$ are parameters representing the salience of stimuli $$i$$ and $$j$$, respectively ($$0 \le \alpha \le 1$$), $$c$$ is a scaling constant ($$c = 1$$). Note that the term denoting the trial, $$t$$ has been omitted here for simplicity.

## 2 - Pooling the strength of associations

HeiDI pools its stimulus associations to activate stimulus-specific representations. The activation of the representation for stimulus $$j$$, $$a_j$$, is defined as:

$\tag{Eq. 4} a_{j,M} = o_{j,M} + h_{j,M}$

where $$o_{j,M}$$ denotes the combined associative strength towards stimulus $$j$$ in presence of stimuli $$M$$, and $$h_{j,M}$$ denotes the chained associative strength towards stimulus $$j$$ in presence of stimuli $$M$$.

### 2.1 - Combined associative strength

The quantity $$o_{j,M}$$ is the result of combining the associative strength of forward and backward associations to and from stimulus $$j$$ as

$\tag{Eq. 5} o_{j,M} = \sum_{m \neq j}^{M}v_{m,j} + \left(\frac{\sum_{m \neq j}^{M}v_{m,j} \sum_{m \neq j}^{M}v_{j,m}}{c}\right)$

where each of the sums above run over all stimuli $$M$$ presented in the trial, different from stimulus $$j$$.2 The left-hand term describes how the forward associations from stimuli $$M$$ to $$j$$ affect the representation of $$j$$, whereas the right-hand term describes how the backward associations that $$j$$ has with stimuli $$M$$ affect its representation (although these are modulated by the forward associations themselves).

### 2.2 - Chained associative strength

The quantity $$h_{j,M}$$ captures the indirect associative strength that the stimuli $$M$$ have with $$j$$, via absent stimuli. As such, $$h_{j,M}$$ is defined as

$\tag{Eq. 6a} h_{j,M} = \sum_{m \neq j}^{M} \sum_{n}^{N}\frac{v_{m,n}o_{j,n}}{c}$

where N are the stimuli not presented on the trial (i.e., K-M). Note the re-use of $$o$$, the quantity defined in Eq. 5. This equation allows absent stimuli $$N$$ to influence the representation of stimulus $$j$$, as long as they have an association with present stimuli $$M$$.

In Honey and Dwyer (2022), the authors specify a similarity-based mechanism that modulates the effect of associative chains according to the similarity of the salience of nominal and retrieved stimuli3. As such, Eq. 6a is expanded as:

$\tag{Eq. 6b} h_{j,M} = \sum_{m \neq j}^{M} \sum_{n}^{N}S(\alpha_{n}, \alpha'_n)\frac{v_{m,n}o_{j,n}}{c}$

where $$S$$ is a similarity function that takes the nominal salience of stimulus n, $$\alpha_n$$ (as perceived when $$n$$ is presented on a trial) and its retrieved salience, $$\alpha'_n$$ (as perceived when $$n$$ is retrieved via other stimuli M, see ahead). This function is defined as:

$\tag{Eq. 7} S(\alpha_n, \alpha'_n) = \frac{\alpha_n}{\alpha_n + |\alpha_n-\alpha'_n|} \times \frac{\alpha'_n}{\alpha'_n+ |\alpha_n-\alpha'_n|}$

Notably, whenever there is more than one nominal salience for a given stimulus, then $$\alpha_n$$ is the arithmetic mean among all nominal values (see “heidi_similarity” vignette).

## 3 - Distributing strength into stimulus-specific response units

HeiDI then distributes the pooled stimulus-specific strength among all $$K$$ stimuli, according to their relative salience. The activation of response unit $$j$$, $$R_j$$ is expressed as

$\tag{Eq. 8} R_{j,k} = \frac{\theta(j)}{\sum_{k}^{K}\theta(k)}a_{k,M}$

where $$j \in K$$. As $$K$$ can include both present and absent stimuli, the $$\theta$$ function above depends on whether the stimulus $$k$$ is absent (i.e., $$k \in N$$) or not (i.e., $$k \in M$$), as:

$\tag{Eq. 9} \theta(k) = \begin{cases} \left |\sum_{m}^{M}\left( v_{m,k}+\sum_{n \neq k}^{N}\frac{v_{m,n}v_{n,k}}{c}\right) \right|,& \text{if } k \in N\\ \alpha_k, & \text{otherwise} \end{cases}$

Note that the quantity for absent stimuli is absolute, to prevent negative $$\theta$$ values due to inhibitory associations4. Also, note a summation term is used on the left-hand side of the expression for an absent stimulus. It implies that all the present stimuli $$M$$ contribute to the salience of stimulus $$k$$. Finally, note on the right-hand side of the same expression that the present stimuli contribute not only via the direct association each of them has with $$k$$, $$v_{m,k}$$ but also through associative chains with other absent stimuli (c.f., Eq. 6a).

## 4 - Generating responses

Finally, HeiDI responds. The response-generating mechanisms in HeiDI are currently underspecified. In its current version, HeiDI’s responses are the product of the activation of stimulus-specific response units and the connection that those units have with specific motor units. As such, the activation of motor unit $$q$$, $$r_q$$, is given by

$\tag{Eq. 10} r_q = R_jw_{j,q}$

where $$w_{j,q}$$ is a weight representing the association between stimulus-specific unit $$j$$ and motor unit $$q$$.

1. We go the extra length of specifying $$x$$ quantities because the stimulus expectation and learning rules can be vectorized, as $$\textbf{e} = \textbf{x}V$$ and $$\Delta V = (\textbf{x}\odot\textbf{a})' (c(\textbf{x}\odot\textbf{a})-\textbf{e})$$, respectively. Here, the matrix $$V$$ contains all associations between each pair of stimuli, the row vectors $$\textbf x$$ and $$\textbf a$$ denote the presence and salience of all stimuli $$K$$, the $$\odot$$ symbol specifies element-wise multiplication, and the $$'$$ symbol denotes transposition. Note further that the $$\Delta V$$ matrix must be made hollow before summing it to $$V$$.↩︎

2. An alternative formulation of this equation could be $$\sum_{m \neq j}^{M} v_{m,j} + (v_{m,j} v_{j,m})$$ but, although this alternative formulation is positively related to Eq. 5, we have not compared their behavior exhaustively.↩︎

3. This mechanism is in model HD2022 but not in model HDI2020↩︎

4. An alternative and perhaps more naturalistic parametrization of this rule would be to use $$min[0,\theta(n)]$$, where $$min$$ is the minimum function and $$n$$ is an absent stimulus; ReLUs are extensively used in neural networks. Another alternative that avoids the use of absolute values or a rectifying mechanism would be to use quantities of $$e^{\theta(k)}$$ instead of $$\theta(k)$$.↩︎