Welcome to the Lab!

In your lab, you have a collection $\{ S\}$ of sources and a collection $\{B\} $ of boxes. Each box has an input port, an output port, and some number $n>1$ of lights on it.

A source spits out stuff, which you can direct into the input port of a box, and stuff will come out the output port. When you send stuff into a box, sometimes one of the $n$ lights will blink. You can arrange the sources and the boxes in whatever pattern you like, sending stuff into boxes, and thence into other boxes. The question is: Can you predict the pattern of lights?

The Tools

First, for each box, figure out which sources always spit out stuff that makes a light blink: we'll say then that a given source $S$ is compatible with box $B$.

Probability Vectors

We can characterize the relationship between a compatible source $S$ and a box $B$ with a probability vector: $p(B_{i})$. Label the lights on box $B_{i}=1 \dots n$, and send stuff from the same source $S$ into the same box $B$ many times, while keeping a tally of how many times each light blinks. Dividing each tally by the total amount of stuff you sent in gives you a probability vector such that $\sum_{i} p(B_{i}) = 1$.

Conditional Probability Matrices

Similarly, we can characterize the relationship between two boxes $A$ and $B$ with a conditional probability matrix: $p(B_{i}|A_{j})$. Make a table whose rows are labeled by lights of $B$ and whose columns are labeled by lights of $A$. Send stuff at random into box $A$, and then direct what comes out into box $B$, and record which lights $B_{i}, A_{j}$ blinked: put a tally mark at the $i^{th}$ row and $j^{th}$ column of your table. Finally, divide each column by the total of that column to get the conditional probability matrix.

The Law of Total Probability

Suppose instead we'd sent stuff from a single source $S$ into $A$ and then into $B$, kept tallies of the lights, and then divided the whole matrix by the total amount of stuff we'd sent in. E.g.:

$\begin{pmatrix}p(B_{0}A_{0}) &p(B_{0}A_{1}) &p(B_{0}A_{2}) \\ p(B_{1}A_{0}) &p(B_{1}A_{1}) &p(B_{1}A_{2}) \\p(B_{2}A_{0}) &p(B_{2}A_{1}) &p(B_{2}A_{2}) \\\end{pmatrix}$

Then the sum of each row $i$ of this matrix would give us probabilities $p(B_{i})$, and the sum of each column $j$ would give us the probabilities $p(A_{j})$, for this situation. If we divide each column by its column sum, which is $p(A_{j})$, we'd obtain the conditional probability matrix. Thus, we can interpret our present matrix's entries as: $p(B_{i}|A_{j})p(A_{j})$. In other words, $p(B_{i}A_{j}) = p(B_{i}|A_{j})p(A_{j})$.

If we sum each row, we obtain: $p(B_{i}) = \sum_{j} p(B_{i}|A_{j})p(A_{j})$. This is known as the "Law of Total Probability." In matrix notation: $\vec{b} = [B|A]\vec{a}$, where $\vec{a}$ is the probability vector for $A$'s lights given the source; $[B|A]$ is the conditional probability matrix for $B$'s lights given $A$'s lights; and $\vec{b}$ is the probability vector for $B$'s lights to blink after the stuff from $S$ has gone into $A$ and then into $B$. Naturally, if we know the conditional probability matrices for each pair in a chain of boxes, we can calculate probabilities for the lights of the final box, using the probabilities for the lights of the initial box.

Self Conditional Probability Matrices

Next, we can characterize a box $B$ in its own terms by looking at its self conditional probability matrix, in other words, the conditional probability matrix corresponding to sending stuff into the same box twice: $p(B_{i}|B_{j})$, or $[B|B]$. We'll call the columns of this matrix the output vectors of $B$.

Completely Reliable Boxes

Certain boxes have a special property, namely, that $[B|B] = I$, the identity matrix. In other words, if you send stuff into such a box $B$ and the $i^{th}$ light blinks, and then you direct the stuff into another such box $B$, then the $i^{th}$ light will always blink again. To wit, if you repeat the box, you'll always get the same result. We'll call these reliable boxes. In general, it might be that boxes are somewhat reliable, in that some outcomes are reliable, but not others.

The Dimension

To each kind of stuff we can associate a number $d$, its dimension. In general, stuff can light up a box with any number of lights: but each box is only compatible with stuff if the stuff has the right dimension. To determine the dimension, sift through all the boxes compatible with a given kind of stuff: you'll find there will be a maximum number of lights $d$ such that if a compatible box has more than $d$ lights, it can't be reliable. In other words, the dimension $d$ (of some stuff) is the maximum number of lights possible on a reliable box compatible with that stuff.

Informational Completeness

In general, if we repeatedly send stuff from two different sources into the same box, and keep track of their two probability vectors, we might get the same probabilities for each. In that case, the box can't tell the two sources of stuff apart. We'll call a box informationally complete, however, if each kind of stuff gives you a signature vector of probabilities, so that, having estimated the probabilities, you'll never confuse one source for another. For a box to be informationally complete, it must have at least $d^2$ lights, where $d$ is the dimension of the stuff the box is compatible with. Note therefore that a reliable box can't be informationally complete, and an informationally complete box can't be reliable. Finally, the more outcomes a box has, in principle, the less disturbing it may be to the stuff that goes through it.

Bias

A box may be biased or unbiased: in other words, some of the box's lights may be more intrinsically likely or unlikely to blink independently of the stuff you throw into it. To determine the bias, throw stuff from many different sources, completely at random, into the box, and collect the probability vector for the lights. We'll call it $\vec{c}$, the bias vector. If this vector is the completely even probability vector, e.g. $\begin{pmatrix} \frac{1}{n} \\ \frac{1}{n} \\ \vdots \end{pmatrix}$ for $n$ lights, then the box is unbiased. This means that if we're completely uncertain about what went in, then we're completely uncertain about what light will blink. If we get some other probability vector, however, then $\vec{c}$ tells us how the box is biased toward different outcomes. Either way, it'll be the case that $[B|B]\vec{c}_{B} = \vec{c}_{B}$.

The Metric

Finally, for each box, we can construct a metric. Take the self conditional probability matrix $[B|B]$, and the bias vector $\vec{c}_{B}$, and multiply each row of the matrix by $d\vec{c}_{B}$, where $d$ is the dimension. The resulting matrix $G_{B}$ will be a symmetric matrix, as will its inverse $G_{B}^{-1}$. We'll call $G_{B}^{-1}$ the metric.

Note that if a box is informationally overcomplete (being informationally complete, but having more than $d^2$ outcomes), then $G$ won't have a proper inverse. Instead take the pseudo-inverse: using the singular value decomposition, we can write $G = UDV$, where $U$ and $V$ are orthogonal matrices (whose inverse is their transpose), and $D$ is a diagonal matrix with the "singular values" along the diagonal. $D$ will have $d^2$ non-zero entries: take the reciprocals of them to form $D^{-1}$. Then: $G^{-1} = V^{T}D^{-1}U^{T}$.

If you have an informationally complete box, each kind of compatible stuff can be characterized by a unique set of probabilities; but if you have an informationally overcomplete box, there will in general be multiple probability vectors which characterize the stuff, which, however, lead to equivalent predictions.

The Transition Probability

Suppose we have two sources $S_{a}$ and $S_{b}$, each characterized by dimension $d$, and an informationally complete box $R$ with $d^2$ lights. We collect the probabilities $\vec{a}$ and $\vec{b}$ for these two sources with respect to the box $R$. We could ask the question: Suppose I send stuff $a$ into any box: what's the probability that stuff $b$ will come out?

The metric provides us with the answer: $p_{a \rightarrow b} = p_{b \rightarrow a} = \vec{b}G_{R}^{-1}\vec{a}$.

In general, this transition probability will depend on the number of lights on the latter box and its bias, and be proportional to the above number. But if $\vec{b}G_{R}^{-1}\vec{a} = 0$, then we can say that if $a$-stuff goes in, we should predict that $b$-stuff will never come out, and vice versa, for any box. If this is the case, we'll say $a$ and $b$ are mutually exclusive. If a box is completely reliable, then all its outcome vectors are mutually exclusive.

Any box will determine a metric in this way. If the box is informationally complete, then the transition probability determined in terms of its metric will always agree with the transition probability determined using any other informationally complete box's metric. E.g., $\vec{b}_{B}G_{B}^{-1}\vec{a}_{B} = \vec{b}_{C}G_{C}^{-1}\vec{a}_{C}$ for all informationally complete boxes $B$ and $C$.

Purity

We can consider $p_{a \rightarrow a} = \vec{a}G_{B}^{-1}\vec{a}$. This implies $0 \leq \vec{a}G_{B}^{-1}\vec{a} \leq 1$, since probabilities are between $0$ and $1$.

If $p_{a \rightarrow a} = 1$, then we'll call $\vec{a}$ a pure probability vector. If $p_{a \rightarrow a} < 1$, we'll call it a mixed probability vector. Mixed probability vectors can always be written as sums of pure vectors.

The idea is that a pure vector characterizes one kind of stuff, but a mixed vector characterizes uncertainty about which kind of stuff you're dealing with. For example, if you, behind my back, fed stuff into $B$ from source $S_{a}$ $\frac{1}{3}$ of the time and stuff from source $S_{b} \frac{2}{3}$ of the time, and I recorded the probabilities, I'd get a mixed probability vector: e.g., $S_{a \ or \ b} = \frac{1}{3}\vec{a} + \frac{2}{3} \vec{b}$.

Similarly, we have, on the one hand, pure boxes and, on the other hand, mixed boxes. The output vectors of pure boxes (the columns of $[A|A]$) are all pure. Another way to check if a box is pure is to see whether the diagonal of $[A|A] = d\vec{c}_{A}$, where $\vec{c}_{A}$ is the bias vector. If so, the box is pure.

In the case of a pure box, if you see the $i^{th}$ light blink, you should wager that the stuff afterwards will be characterized by the $i^{th}$ output vector of $A$. We shall come to the case of mixed boxes later.

Passive Reference Frame Switches

We've seen how by means of the metric provided by an informationally complete box (which we'll call a reference box), we can can calculate the transition probability between two kinds of stuff which is applicable for all boxes.

Similarly, we can calculate the probabilities for the outcomes of any box in terms of a) the probabilities on the reference box, and b) the conditional probabilities between the reference box and the box in question.

Suppose we have a reference box $R$ and another box $E$, and we have the probabilities for some stuff coming out of a source $S$ on the reference box: $\vec{r}$. We also have the conditional probabilities for outcomes of $E$ given outcomes of $R$: $[E|R]$. Finally, we have the self conditional probability matrix for the reference box with itself: $[R|R]$. We invert this matrix: $[R|R]^{-1}$.

By the law of total probability, we can calculate the probability for one of $E$'s lights to blink if stuff characterized by $\vec{r}$ undergoes an $R$ box, and then an $E$ box. It's: $[E|R]\vec{r}$.

On the other hand, using $[R|R]^{-1}$, can calculate the probability for one of $E$'s lights to blink supposing that stuff characterized by $\vec{r}$ goes right into $E$: $[E|R][R|R]^{-1}\vec{r}$.

Since $[R|R]\big{(}[R|R]^{-1}\vec{r}\big{)} = \vec{r}$, we can think of $[R|R]^{-1}\vec{r}$ as the probability vector $\vec{r}$ "pulled back" to before the $R$ box was performed, such that going into another $R$ box would recover the original probabilities. From this extended vantage point, "what would have been $\vec{r}$" instead enters the $E$ box, characterized in terms of its relationship to $R$, giving us the probabilities for $E$'s lights to blink in terms of $R$'s, without assuming that the $R$ measurement was actually performed. We'll call this a passive transformation. We can think of it as a modification of the law of total probability.

Read $\vec{e}=[E|R][R|R]^{-1}\vec{r}$ as "what would have given probabilities $\vec{r}$ on $R$ gives probabilities $\vec{e}$ on $E$."

Active Transformations: In Between Boxes

In between two boxes, it may happen that stuff changes. You can capture this change using pairs of identical reference boxes $R$, and thus calculate the result of an active transformation on probability vectors. The basic principle is that what can happen in between two identical boxes is dual to reference frame switching between two different boxes.

Direct some stuff into a reference box $R$, let it transform, and afterwards send the stuff into $R$ again. Collect conditional probabilities. In a sense, we're making a new box, consisting of: whatever happens after the first $R$ box, including the second $R$ box. We'll call $R^{\leftarrow}$. So we'll end up with $[R^{\leftarrow}| R]$.

If stuff transforms, and then goes into a box, you'll get the same effect as if the original stuff went into a box that's been transformed in the opposite direction (i.e. its outcome vectors have all been transformed in the opposite direction). For instance, hold out your thumb, pointed up. You can either a) turn it to the right, or b) turn your head to the left. With respect to the relationship between your head and your thumb, it comes to the same effect.

Thus we can think of $R^{\leftarrow}$ as $R$ transformed in the opposite way to the stuff. Thus: $\vec{p}_{R}^{\rightarrow} = [R^\leftarrow |R][R|R]^{-1}\vec{p}_{R}$. This gives us the probabilities for the stuff with respect to $R$ after the transformation, but before the second $R$ box. This is an active transformation.

In other words, the way we should update our probability vector after a transformation, $\vec{p}_{R}^\rightarrow$, can be thought of as: "what would have given probabilities $\vec{r}$ on $R$ instead goes into $ R^{\leftarrow}$, a box evolved in the opposite direction, and gives probabilities $\vec{p}_{R}^\rightarrow$". In this sense, an active transformation is like "would" in reverse.

Reversing transformations

You may recall Bayes' Rule: $p(A|B) = \frac{p(B|A)p(A)}{p(B)}$.

Translated into matrix terms: $[A|B] = [A|B]^{T} \circ |\vec{c_{A}}\rangle\langle\frac{1}{\vec{c}_{B}}|$, where $\vec{c}_{A}$ is the bias vector for $A$, and similarly $\vec{c}_{B}$ is the bias vector for $B$. $|\rangle\langle|$ denotes the outer product, and $\circ$ denotes the entry-wise matrix product. Naturally, this simplifies to $[B|A] = [A|B]^{T}$ in the case of unbiased boxes.


Symmetrically Informationally Complete Boxes

When it comes to reference measurements, the most beautiful kind are the most symmetrical: those with exactly $d^2$ pure elements such that the transition probabilities between them are all equal. They're called SIC's.

For $d=2$, we'd have:

$[R|R] = \left[\begin{matrix}\frac{1}{2} & \frac{1}{6} & \frac{1}{6} & \frac{1}{6}\\\frac{1}{6} & \frac{1}{2} & \frac{1}{6} & \frac{1}{6}\\\frac{1}{6} & \frac{1}{6} & \frac{1}{2} & \frac{1}{6}\\\frac{1}{6} & \frac{1}{6} & \frac{1}{6} & \frac{1}{2}\end{matrix}\right]$

The columns are the output vectors of $R$. If you do $R$ twice, half the time you'll get the same answer, or else one of the other three answers, each a third of the time.

We have:

$[R|R]^{-1} = \left[\begin{matrix}\frac{5}{2} & - \frac{1}{2} & - \frac{1}{2} & - \frac{1}{2}\\- \frac{1}{2} & \frac{5}{2} & - \frac{1}{2} & - \frac{1}{2}\\- \frac{1}{2} & - \frac{1}{2} & \frac{5}{2} & - \frac{1}{2}\\- \frac{1}{2} & - \frac{1}{2} & - \frac{1}{2} & \frac{5}{2}\end{matrix}\right]$

Recall how to invert a matrix: place the identity matrix to the right of your matrix:

$\left[\begin{matrix}\frac{1}{2} & \frac{1}{6} & \frac{1}{6} & \frac{1}{6}\\\frac{1}{6} & \frac{1}{2} & \frac{1}{6} & \frac{1}{6}\\\frac{1}{6} & \frac{1}{6} & \frac{1}{2} & \frac{1}{6}\\\frac{1}{6} & \frac{1}{6} & \frac{1}{6} & \frac{1}{2}\end{matrix}\right] \left[\begin{matrix}1 & 0 & 0 & 0\\0 & 1 & 0 & 0\\0 & 0 & 1 & 0\\0 & 0 & 0 & 1\end{matrix}\right]$

Then perform elementary row operations across the whole "augmented" matrix: swap rows, multiply or divide a row by a constant, add or subtract a multiple of a row to a row, until you get the identity matrix on the left: then $[R|R]^{-1} $ will be on the right.

Notice that a SIC is unbiased:

$\left[\begin{matrix}\frac{1}{2} & \frac{1}{6} & \frac{1}{6} & \frac{1}{6}\\\frac{1}{6} & \frac{1}{2} & \frac{1}{6} & \frac{1}{6}\\\frac{1}{6} & \frac{1}{6} & \frac{1}{2} & \frac{1}{6}\\\frac{1}{6} & \frac{1}{6} & \frac{1}{6} & \frac{1}{2}\end{matrix}\right]\left[\begin{matrix}\frac{1}{4}\\\frac{1}{4}\\\frac{1}{4}\\\frac{1}{4}\end{matrix}\right] = \left[\begin{matrix}\frac{1}{4}\\\frac{1}{4}\\\frac{1}{4}\\\frac{1}{4}\end{matrix}\right] $

Thus we can get $G_{R}$ by multiplying the rows of $[R|R]$ by $2\left[\begin{matrix}\frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4}\end{matrix}\right]$.

$G_{R} = \left[\begin{matrix}\frac{1}{4} & \frac{1}{12} & \frac{1}{12} & \frac{1}{12}\\\frac{1}{12} & \frac{1}{4} & \frac{1}{12} & \frac{1}{12}\\\frac{1}{12} & \frac{1}{12} & \frac{1}{4} & \frac{1}{12}\\\frac{1}{12} & \frac{1}{12} & \frac{1}{12} & \frac{1}{4}\end{matrix}\right]$ and $G_{R}^{-1} =\left[\begin{matrix}5 & -1 & -1 & -1\\-1 & 5 & -1 & -1\\-1 & -1 & 5 & -1\\-1 & -1 & -1 & 5\end{matrix}\right] $.

Notice that for each of these four matrices, the off diagonal entries are equal. In general, for any $d$, the relevant matrices for a SIC are:

$[R|R]_{i=j} = \frac{1}{d},[R|R]_{i\neq j} = \frac{1}{d(d+1)} $ .

$[R|R]^{-1}_{i=j} = \frac{d(d+1)-1}{d}, [R|R]^{-1}_{i\neq j} = -\frac{1}{d} $

$G_{{R}_{i=j}} = \frac{1}{d^2}, G_{{R}_{i\neq j}} = \frac{1}{d^2(d+1)}$

$G^{-1}_{{R}_{i=j}} = d(d+1)-1, G^{-1}_{{R}_{i\neq j}} = -1$

As a consequence, we can make some dramatic simplifications:

$p_{a \rightarrow b} = \vec{a}G_{R}^{-1}\vec{b} = [d(d+1)-1]\sum_{i} a_{i}b_{i} - \sum_{i\neq j} a_{i}b_{j}$.

In the case of $d=2$:

$p_{a \rightarrow b} = 5\sum_{i} a_{i}b_{i} - \sum_{i\neq j} a_{i}b_{j}$.

So we can say, when $p_{a \rightarrow b} = 0$:

$5\sum_{i} a_{i}b_{i} = \sum_{i\neq j} a_{i}b_{j}$.

In other words, the transition probability is $0$ if: when you send $a$-stuff and $b$-stuff into the reference measurement $R$, you're likely to get different answers five times more often than you get the same answer. In other words, if the probability of different lights blinking when you send $a$ and $b$ into a SIC is five times the probability of the same lights blinking, then you ought to assign probability $0$ for $a$ transitioning into $b$ or vice versa, for any box.

Of course, when we're talking about the probabilities of the same or different lights blinking, we're talking about the probabilities irrespective of the ordering of the outcomes, in other words, the tensor product of probability vectors.

The Tensor Product

We could write $\vec{a} G_{R}^{-1} \vec{b} = (\vec{a} \otimes \vec{b}) \cdot \vec{G}_{R}^{-1}$, where $\otimes$ is the tensor product and $\vec{G}^{-1}$ is the vectorized version of $\vec{G}^{-1}$ (all the rows of the matrix laid next to each other as a vector).

For intuition about the tensor product: suppose there were four possible outcomes, and you recorded this sequence of lights from the two boxes.

$$ \begin{matrix} A & B \\ 1 & 1 \\ 1 & 2 \\ 2 & 2 \\ 3 & 3 \end{matrix} $$

For $A$, we have $\frac{2}{4}$ of the time $1$; $\frac{1}{4}$ of the time $2$; and $\frac{1}{4}$ of the time $3$. For $B$, we have $\frac{1}{4}$ of the time $1$, $\frac{2}{4}$ of the time $2$, and $\frac{1}{4}$ of the time $3$.

We can then consider $\vec{a} \otimes \vec{b}$: $\begin{pmatrix} \frac{2}{4} \\ \frac{1}{4} \\ \frac{1}{4} \end{pmatrix} \otimes \begin{pmatrix} \frac{1}{4} \\ \frac{2}{4} \\ \frac{1}{4} \end{pmatrix} = \begin{pmatrix} \frac{2}{16} \\ \frac{4}{16} \\ \frac{2}{16} \\ \frac{1}{16} \\ \frac{2}{16} \\ \frac{1}{16} \\ \frac{1}{16} \\ \frac{2}{16} \\ \frac{1}{16} \end{pmatrix}$.

On the other hand, we can get the same result directly from the table of outcomes, by considering them in such a way that their ordering doesn't matter.

Pair each actual outcome of $A$ with each actual outcome of $B$: you'll get a data set of 16 pairs: $11, 12, 12, 13; 11, 12, 12, 13; 21, 22, 22, 23; 31, 32, 32, 33$. Then, calculating the probabilities of each possible pair from this list, we get $\frac{2}{16}$ of the time 11, $\frac{4}{16}$ of the time $12$, $\dots$, just the same as in the tensor product $\vec{a} \otimes \vec{b}$ above.

The Geometry of SIC's

Returning to the transition probability for a SIC, because it must be between $0$ and $1$, we have:

$0 \leq [d(d+1)-1]\sum_{i} a_{i}b_{i} - \sum_{i\neq j} a_{i}b_{j} \leq 1$

$$ 0 \leq d(d+1)\sum_{i} a_{i}b_{i} - \sum_{i} a_{i}b_{i} - \sum_{i \neq j} a_{i}b_{j} \leq 1 $$$$ 0 \leq d(d+1)\sum_{i} a_{i}b_{i} - \sum_{i,j} a_{i}b_{j} \leq 1 $$$$ 0 \leq d(d+1)\sum_{i} a_{i}b_{i} - 1 \leq 1 $$$$ \frac{1}{d(d+1)} \leq \sum_{i} a_{i}b_{i} \leq \frac{2}{d(d+1)} $$$$ \frac{1}{d(d+1)} \leq \vec{a} \cdot \vec{b} \leq \frac{2}{d(d+1)} $$

In particlar, $\vec{a} \cdot \vec{a} \leq \frac{2}{d(d+1)} $, so that we have $|\vec{a}| \leq \sqrt{\frac{2}{d(d+1)}}$. In other words, in addition to $\sum_{i} a_{i} = 1$, we have $\sum_{i} a_{i}^2 \leq \frac{2}{d(d+1)}$ constraining our SIC probability vectors. In the case of a pure vector, we have equality: so that the length of any pure probability vector is a constant relating to the dimension. Therefore the pure vectors live on the surface of a sphere of that radius, and mixed vectors live within the sphere.

Indeed, this sphere is a subset of the space of probabilities, the latter called the probability simplex. For $d=2$, this sphere is inscribed in the probability simplex, and just touches it. In higher dimensions, the sphere bubbles out of the edges of the simplex, so that not all of the sphere is in the probability simplex. The outcome vectors of a SIC form a mini-simplex whose vertices are pure states that lie on the surface of the pure sphere in the part of the sphere contained by the probability simplex. One implication of all this is that a SIC probability vector won't have an element that exceeds $\frac{1}{d}$, and the total number of zero entries can't be greater than $\frac{d(d-1)}{2}$.

If $R$ is a SIC, we can also simplify our modified law of total probability, i.e., our rule for a passive reference frame switch, from $\vec{e} = [E|R][R|R]^{-1}\vec{r}$ to:

$ p(E_{i}) = \sum_{j} [(d+1)p(R_{j}) - \frac{1}{d}]p(E_{i}|R_{j})$.

In the special case where $E$ is a reliable box, having mutually exclusive outcomes, then $\vec{e}= (d+1)[E|R]\vec{r} -1$.

Informational Completeness with Reliable Boxes

For a given dimension $d$, there are informationally complete boxes with at least $d^2$ elements. These single boxes can be used as reference boxes to compute probabilities for what will happen if the stuff goes into any other box. But because they are informationally complete, they must be unreliable. On the other hand, one can achieve informational completeness using reliable boxes, but this requires sending stuff from the same source into multiple reliable boxes, whose probabilities all together will characterize the stuff.

For example, in the case of $d=2$, consider three boxes we'll call $X, Y$, and $Z$. Each has two lights, which we'll denote $\uparrow$ and $\downarrow$. Their metrics are all the identity matrix: they are reliable boxes. We can characterize them with reference to a SIC in $d=2$, which has $4$ outcomes.

For example:

$ [Z|R] = \left[\begin{matrix}1 & \frac{1}{3} & \frac{1}{3} & \frac{1}{3}\\0 & \frac{2}{3} & \frac{2}{3} & \frac{2}{3}\end{matrix}\right]$ , $[R|Z]=\left[\begin{matrix}\frac{1}{2} & 0\\\frac{1}{6} & \frac{1}{3}\\\frac{1}{6} & \frac{1}{3}\\\frac{1}{6} & \frac{1}{3}\end{matrix}\right]$

$ [Y|R] = \left[\begin{matrix}\frac{1}{2} & \frac{1}{2} & \frac{1}{\sqrt{6}} + \frac{1}{2} & \frac{1}{2} - \frac{1}{\sqrt{6}}\\\frac{1}{2} & \frac{1}{2} & \frac{1}{2} - \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} + \frac{1}{2}\end{matrix}\right]$ , $[R|Y] = \left[\begin{matrix}\frac{1}{4} & \frac{1}{4}\\\frac{1}{4} & \frac{1}{4}\\\frac{\sqrt{6}}{12} + \frac{1}{4} & \frac{1}{4} - \frac{\sqrt{6}}{12}\\\frac{1}{4} - \frac{\sqrt{6}}{12} & \frac{\sqrt{6}}{12} + \frac{1}{4}\end{matrix}\right]$

$[X|R]=\left[\begin{matrix}\frac{1}{2} & \frac{\sqrt{2}}{3} + \frac{1}{2} & \frac{1}{2} - \frac{\sqrt{2}}{6} & \frac{1}{2} - \frac{\sqrt{2}}{6}\\\frac{1}{2} & \frac{1}{2} - \frac{\sqrt{2}}{3} & \frac{\sqrt{2}}{6} + \frac{1}{2} & \frac{\sqrt{2}}{6} + \frac{1}{2}\end{matrix}\right]$, $[R|X] = \left[\begin{matrix}\frac{1}{4} & \frac{1}{4}\\\frac{\sqrt{2}}{6} + \frac{1}{4} & \frac{1}{4} - \frac{\sqrt{2}}{6}\\\frac{1}{4} - \frac{\sqrt{2}}{12} & \frac{\sqrt{2}}{12} + \frac{1}{4}\\\frac{1}{4} - \frac{\sqrt{2}}{12} & \frac{\sqrt{2}}{12} + \frac{1}{4}\end{matrix}\right]$

So: the columns of $[Z|R]$ tell us the probabilities that a $Z$ box will give $\uparrow$ or $\downarrow$ for each of the SIC outcome vectors. Given some probability vector with respect to $R$, we can calculate the probabilities for $Z$ outcomes via $[Z|R][R|R]^{-1}\vec{r}$.

In contrast, the columns of $[R|Z]$ characterize the stuff that comes out of the $Z$ box with reference to $R$: outcome vectors $Z_{\uparrow}$ and $Z_{\downarrow}$. (In this case, things are aligned so that one of the outcome vectors of $Z$, in fact, corresponds to one of the SIC outcome vectors.)

We can calculate the probabilities for $R$'s lights to blink given a vector of $Z$ probabilities with:

$[R|Z][Z|Z]^{-1}\vec{z} = [R|Z]\vec{z}$

Since the box is reliable, $[Z|Z] = I$. Thus we can use the unmodified law of total probability to calculate $R$ probabilities from $Z$ probabilities: no woulds necessary.

Mututally Unbiased Reliable Boxes

$X$, $Y$, and $Z$ have a special property. If we look at the conditional probability matrices $[X|Y], [X|Z], [Y|Z]$, and so on, we'll find:

$[X|Y] = [X|R][R|R]^{-1}[R|Y] = \left[\begin{matrix}\frac{1}{2} & \frac{1}{2}\\\frac{1}{2} & \frac{1}{2}\end{matrix}\right]$.

In other words, if something comes out of an $X$ box, and you send it through another $X$ box, it'll always give the same result. But if you instead send the stuff through an $X$ box and then a $Y$ box, you'll get $Y_{\uparrow}$ or $Y_\downarrow$ each with equal probability. Similarly, if you send the stuff through a $Z$ box, and then an $Y$ box, or any pair. Thus, these three reliable boxes, each associated with two mutually exclusive outcomes, are also unbiased or complementary​ with respect to each other.

The Expected Value

Suppose we assigned a valuation, a number to each outcome of our three boxes $X, Y, Z$: $\{\uparrow, \downarrow\} \rightarrow \{1,-1\}$, so that each set of lights is associated with a "weight vector" $\vec{\lambda} = \left[ \begin{matrix}1 \\ -1 \end{matrix}\right] $. We can then assign an expected value for $X$ in terms of our reference probabilities.

$\langle X \rangle = 1\cdot p(X_\uparrow) + (-1)\cdot p(X_\downarrow) = \vec{\lambda}[X|R][R|R]^{-1}\vec{r}$

In turns out that stuff with $d=2$ can also be fully characterized by the vector of expectation values $( \langle X \rangle,\langle Y \rangle,\langle Z \rangle )$. In fact, just as in the SIC representation, pure states live on the surface of a sphere, now in three dimensions, and mixed states live in the interior of the sphere. Each box $X, Y,$ or $Z$, has two outcome vectors, and in each case, the two outcome vectors correspond to antipodal points on the sphere, fixing three orthogonal axes.

Moreover, if you calculate the $(\langle X\rangle, \langle Y \rangle, \langle Z \rangle)$ vectors corresponding to our SIC outcome vectors, you'll find that they make a tetrahedron that fits snugly in the sphere. It's worth mentioning, then, that SIC box for a given dimension is not unique: in this case, there's a whole sphere's worth of them, since rotating the tetrahedron as a whole won't change the angles between the vectors. Thus if you find two SIC's in $d=2$ such that they give different probabilities for the same stuff, then they must be related by a 3D rotation.

Geometrically, if you have two kinds of stuff to which you've assigned 3-vectors $\vec{a}_{xyz}$ and $\vec{b}_{xyz}$, you can calculate the transition probability between them via $p_{a \rightarrow b} = \frac{1}{2}(1 + \vec{a}_{xyz} \cdot \vec{b}_{xyz})$. This means that if two points are antipodal on the sphere, they have 0 probability of transitioning from one to another.

Finally, the $X, Y, Z$ coordinates can be given in terms of our SIC probabilities:

$\left[\langle X\rangle, \langle Y \rangle, \langle Z \rangle \right]=\left[ \left[\begin{matrix}\sqrt{2} \left(2 p_{2} - p_{3} - p_{4}\right)\end{matrix}\right], \ \left[\begin{matrix}\sqrt{6} \left(p_{3} - p_{4}\right)\end{matrix}\right], \ \left[\begin{matrix}3 p_{1} - p_{2} - p_{3} - p_{4}\end{matrix}\right]\right]$

Time Reversal (for SIC's)

Suppose I have a second SIC rotated around the $X$ axis 90 degrees from the first, so that $[0,0,1]$ goes to $[0,1,0]$. If I collect the conditional probabilities for outcomes of the rotated SIC given outcomes of the original SIC, I'd end up with:

$[R^\prime|R] = \left[\begin{matrix}\frac{1}{4} & \frac{1}{4} & \frac{\sqrt{6}}{12} + \frac{1}{4} & \frac{1}{4} - \frac{\sqrt{6}}{12}\\\frac{1}{4} & \frac{17}{36} & \frac{5}{36} - \frac{\sqrt{6}}{36} & \frac{\sqrt{6}}{36} + \frac{5}{36}\\\frac{1}{4} - \frac{\sqrt{6}}{12} & \frac{\sqrt{6}}{36} + \frac{5}{36} & \frac{11}{36} & \frac{\sqrt{6}}{18} + \frac{11}{36}\\\frac{\sqrt{6}}{12} + \frac{1}{4} & \frac{5}{36} - \frac{\sqrt{6}}{36} & \frac{11}{36} - \frac{\sqrt{6}}{18} & \frac{11}{36}\end{matrix}\right]$

It's worth observing, like many of the matrices we've dealt with, that this is a stochastic matrix, in fact a doubly stochastic matrix: its rows and columns all sum to 1. (A right stochastic matrix has rows which sum to 1, and a left stochastic matrix has columns that sum to 1). The point of stochastic matrices is that they preserve the fact that probabilities must sum to 1.

As we've noted, we can calculate the probabilities with respect to the rotated SIC via: $\vec{r}^\prime = [R^\prime|R][R|R]^{-1}\vec{r}$.

We can also consider the opposite rotation, its time reverse: e.g. a rotation from $[0,1,0]$ to $[0,0,1]$, by considering $[R|R^\prime]$. You'll get:

$[R|R^\prime] = \left[\begin{matrix}\frac{1}{4} & \frac{1}{4} & \frac{1}{4} - \frac{\sqrt{6}}{12} & \frac{\sqrt{6}}{12} + \frac{1}{4}\\\frac{1}{4} & \frac{17}{36} & \frac{\sqrt{6}}{36} + \frac{5}{36} & \frac{5}{36} - \frac{\sqrt{6}}{36}\\\frac{\sqrt{6}}{12} + \frac{1}{4} & \frac{5}{36} - \frac{\sqrt{6}}{36} & \frac{11}{36} & \frac{11}{36} - \frac{\sqrt{6}}{18}\\\frac{1}{4} - \frac{\sqrt{6}}{12} & \frac{\sqrt{6}}{36} + \frac{5}{36} & \frac{\sqrt{6}}{18} + \frac{11}{36} & \frac{11}{36}\end{matrix}\right]$

Notice that $[R|R^\prime] = [R^\prime|R]^{T}$. As we know, this will be true if you're using an unbiased reference box.

Interestingly, in general, even for unbiased boxes, although $[R|R] = [R^\prime | R^\prime ]$:

$[R|R^\prime] [R^\prime|R^\prime]^{-1} \neq \left( [R^\prime|R][R|R]^{-1} \right)^{T}$.

But this is true for a SIC.


Entanglement

Next, we consider correlations between the lights of two boxes, where we keep track of the order in which we feed stuff into the two boxes. In other words, we keep count as we feed stuff into the $A$ box, and the same for $B$, so we can pair up the results:

$\left[\begin{matrix} & A & B \\ 1) & 1 & 2 \\ 2)& 3 & 0 \\ \vdots & \vdots\end{matrix} \right]$

In this way, we can assign probabilities to each combination of outcomes of $A$ and $B$: $p(A_{1}B_{1}), p(A_{1}B_{2}), p(A_{1}B_{3}), \dots$, which we can treat as a joint probability vector $\vec{p}_{AB}$.

As we noted above, $\vec{p}_{A} \otimes \vec{p}_{B}$ gives us the probabilities over pairs of outcomes in such a way that the order doesn't matter. In the case that $\vec{p}_{AB}$ equals some $\vec{p}_{A} \otimes \vec{p}_{B}$, we'll call $\vec{p_{AB}}$ separable: the probabilities for individual outcomes of $A$ and $B$ aren't different from what you would have gotten if you hadn't kept track of the ordering of the outcomes at all, i.e., if you'd taken them independently of temporal ordering. But in general, $\vec{p}_{AB}$ can represent correlations in the case that the individual outcomes of $A$ and $B$ depend on each other.

Rearrange $\vec{p}_{AB}$ into a matrix $\hat{p_{AB}}$, such that the rows are labeled with lights of $A$ and columns are labeled with lights of $B$. If you sum each row, you'll get a vector $\vec{p}_{A|AB}$; and if you sum each column, you'll get a vector $\vec{p}_{B|AB}$. These are the "partial" probability vectors for what went into $A$ (on the left) and $B$ (on the right). You can use $\vec{p}_{A|AB}$ to calculate probabilities for the stuff on the left, and you can use $\vec{p}_{B|AB}$ to calculate probabilities for the stuff on the right, considered independently of each other.

Nicely, $G_{AB} = G_{A} \otimes G_{B} $, and if you have two vectors $\vec{a}$ and $\vec{b}$, their joint state with respect to $A \otimes B$ will be $\vec{a} \otimes \vec{b}$. Moreover, $[A\otimes B|A\otimes B] = [A|A] \otimes [B|B]$ Note however, that if $A$ and $B$ are informationally overcomplete, then $G^{-1}_{AB} \neq G^{-1}_{A} \otimes G^{-1}_{B}$ and $[A\otimes B|A\otimes B]^{-1} \neq [A|A]^{-1} \otimes [B|B]^{-1}$.

Using the joint metric, we can calculate $\vec{p}_{AB} G^{-1}_{AB} \vec{p}_{AB}$, the self transition probability. Recall that this is $1$ if the joint probability vector is pure. Similarly, $\vec{p}_{A|AB}G^{-1}_{A}\vec{p}_{A|AB}$ and $\vec{p}_{B|AB}G^{-1}_{B}\vec{p}_{B|AB}$ will tell us if the partial probability vectors are also pure. If the overall joint probability vector and the partial probability vectors are all pure, then the joint probability vector must be separable: the outcomes of $A$ and $B$ won't be correlated in their ordering. But if the joint probability vector is pure, but any of the partial probability vectors are mixed, then the joint vector is entangled. In other words, what appears as one kind of stuff, considered as a whole, has parts for which it is uncertain what kind of stuff it is: ie., there are correlations across the whole, yet freedom in the parts.

We can of course consider the tensor product of many boxes. For example, if we characterize stuff with three boxes $A, B, C$, to work with $A$'s partial vector, you'll want to reshape $\vec{p}_{ABC}$ so it has as many rows as $A$'s lights, and as many columns as the product of the number of $B$ and $C$'s lights, so that the sum of the rows will give you $\vec{p}_{A|ABC}$, and so forth.

Example: An "Anticorrelated" Joint Probability Vector

For example, consider this joint probability vector with respect to the tensor product of two $d=2$ SIC's: $A$ and $B$.

$\hat{p_{AB}} = \left[\begin{matrix}0 & \frac{1}{12} & \frac{1}{12} & \frac{1}{12}\\\frac{1}{12} & 0 & \frac{1}{12} & \frac{1}{12}\\\frac{1}{12} & \frac{1}{12} & 0 & \frac{1}{12}\\\frac{1}{12} & \frac{1}{12} & \frac{1}{12} & 0\end{matrix}\right], \vec{p}_{A|AB} = \left[ \begin{matrix} \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \end{matrix}\right], \vec{p}_{B|AB} = \left[ \begin{matrix} \frac{1}{4} \\ \frac{1}{4} \\ \frac{1}{4} \\ \frac{1}{4} \end{matrix}\right]$

The two partial states are states of maximal ignorance: even probabilities for all the outcomes. Nevertheless: if you got outcome $A_{i}$, you should assign probabilities to outcomes $B_{j}$ using the $i^{th}$ row of $\hat{p_{AB}}$, normalized. Similarly, if you got outcome $B_{j}$, you should assign probabilities to outcomes of $A_{i}$ using the $j^{th}$ column of $\hat{p_{AB}}$, normalized. We can see, then, that the outcomes of the two boxes are anticorrelated, in the sense that, if you got outcome $A_{i}$, then you should assign $0$ probability to $B_{i}$, and even chance for the others $(\frac{1}{3}$). In other words: the same lights never blink on $A$ and $B$.

This is a special case of a more general rule. Suppose we have some joint stuff characterized by $\vec{p}_{AB}$; and after we throw the stuff on the left into some other box, we find out that we should assign probability vector $\vec{a}$ to it. How should we update our expectations for $\vec{b}$ on the right?

Simply: $\vec{b} = \frac{\hat{p_{AB}}G_{A}^{-1}\vec{a}}{\vec{a}G^{-1}_{A}\vec{p}_{A|AB}} $.

The denominator is just the transition probability for the partial vector on the left and the final probability vector $\vec{a}$.

Symmetrically, if we end up with $\vec{b}$ on the right, we'll have on the left: $\vec{a} = \frac{\vec{b}G^{-1}_{B} \hat{p_{AB}}}{\vec{b}G^{-1}_{B}\vec{p}_{B|AB}}$.

If you like, in terms of the joint state, we have:

$\hat{p_{AB}} \rightarrow \frac{\hat{p_{AB}}G^{-1}_{A}\vec{a}\vec{a}^{T}}{\vec{a}G^{-1}_{A}\vec{p}_{A|AB}}$

$\hat{p_{AB}} \rightarrow \frac{\vec{b}\vec{b}^{T}G^{-1}_{B}\hat{p_{AB}}}{\vec{b}G^{-1}_{B}\vec{p}_{B|AB}}$

Returning to our $X, Y, $ and $Z$ boxes, as noted, we have the relationship:

$\left[\langle X\rangle, \langle Y \rangle, \langle Z \rangle \right]=\left[ \left[\begin{matrix}\sqrt{2} \left(2 p_{2} - p_{3} - p_{4}\right)\end{matrix}\right], \ \left[\begin{matrix}\sqrt{6} \left(p_{3} - p_{4}\right)\end{matrix}\right], \ \left[\begin{matrix}3 p_{1} - p_{2} - p_{3} - p_{4}\end{matrix}\right]\right]$

We can represent this as a matrix: $T = \left[\begin{matrix}0 & 2 \sqrt{2} & - \sqrt{2} & - \sqrt{2}\\0 & 0 & \sqrt{6} & - \sqrt{6}\\3 & -1 & -1 & -1\end{matrix}\right]$, which takes SIC probability vectors to $X, Y, Z$ coordinates: $\vec{xyz} = T\vec{p}_{xyz}$.

Note that $TT^{T} = 12I$, so that we can define an inverse for $T$, $T^{-1} = \frac{1}{12}T^{T}$. Then $TT^{-1} = I$. We'll use $T^{-1}$ to go from $X,Y,Z$ coordinates to SIC probability vectors.

We end up with: $T^{-1}\vec{xyz} = \left[\begin{matrix}\frac{z}{4}\\\frac{\sqrt{2} x}{6} - \frac{z}{12}\\- \frac{\sqrt{2} x}{12} + \frac{\sqrt{6} y}{12} - \frac{z}{12}\\- \frac{\sqrt{2} x}{12} - \frac{\sqrt{6} y}{12} - \frac{z}{12}\end{matrix}\right]$.

But note that $\sum_{i} [T^{-1}\vec{xyz}](i) = 0$. We can fix this via: $\vec{p}_{xyz} = T^{-1}\vec{xyz} + \left[\begin{matrix}\frac{1}{4} \\ \frac{1}{4} \\ \frac{1}{4} \\ \frac{1}{4} \end{matrix}\right]$ . Now we have $\sum_{i} p_{xyz}(i) = 1$.

Thus we can express the SIC probabilities as: $\vec{p}_{xyz} = \left[\begin{matrix}\frac{z}{4} + \frac{1}{4}\\\frac{\sqrt{2} x}{6} - \frac{z}{12} + \frac{1}{4}\\- \frac{\sqrt{2} x}{12} + \frac{\sqrt{6} y}{12} - \frac{z}{12} + \frac{1}{4}\\- \frac{\sqrt{2} x}{12} - \frac{\sqrt{6} y}{12} - \frac{z}{12} + \frac{1}{4}\end{matrix}\right]$.

You can check that: $\left[\begin{matrix}0 & 2 \sqrt{2} & - \sqrt{2} & - \sqrt{2}\\0 & 0 & \sqrt{6} & - \sqrt{6}\\3 & -1 & -1 & -1\end{matrix}\right]\left[\begin{matrix}\frac{z}{4} + \frac{1}{4}\\\frac{\sqrt{2} x}{6} - \frac{z}{12} + \frac{1}{4}\\- \frac{\sqrt{2} x}{12} + \frac{\sqrt{6} y}{12} - \frac{z}{12} + \frac{1}{4}\\- \frac{\sqrt{2} x}{12} - \frac{\sqrt{6} y}{12} - \frac{z}{12} + \frac{1}{4}\end{matrix}\right] = \left[ \begin{matrix} x \\ y \\ z\end{matrix} \right]$

Then if we have our entangled anticorrelated joint probability vector, and we assign $\vec{p}_{xyz}$ on the left, then we should infer on the right:

$\frac{\hat{p_{AB}}G^{-1}_{A}\vec{p_{xyz}}}{\vec{p_{xyz}}G_{A}^{-1}\vec{p}_{A|AB}} = \left[\begin{matrix}\frac{1}{4} - \frac{z}{4}\\- \frac{\sqrt{2} x}{6} + \frac{z}{12} + \frac{1}{4}\\\frac{\sqrt{2} x}{12} - \frac{\sqrt{6} y}{12} + \frac{z}{12} + \frac{1}{4}\\\frac{\sqrt{2} x}{12} + \frac{\sqrt{6} y}{12} + \frac{z}{12} + \frac{1}{4}\end{matrix}\right] = \vec{p}_{-xyz} $.

In other words, $[x, y, z]$ has been sent to $[-x, -y, -z]$.

So: if a SIC tensor product box $A \otimes B$ would characterize a joint pair of stuff by this anticorrelated probability vector, then if we update on the left to $\vec{p}_{xyz}$, then we should update on the right to $\vec{p}_{-xyz}$: the antipodal point. Recalling that antipodal points have transition probability $0$, we predict that if we send the stuff on the left and right each through reliable boxes with two outcomes corresponding to $[x, y, z]$ and $[-x,-y,-z]$, then they will always be found pointing in opposite directions (although which is up and which is down is a coin flip)$-$for any choice of $x,y,z$.

This is remarkable in that the particular direction along which the vectors are always found opposite couldn't have been determined beforehand, since it depends on the choice of reliable box.

Correlations Between Reliable Boxes

Suppose we send our anticorrelated stuff into two $X$ boxes ($A$ and $B$), and collect joint probabilities. We'll get $\uparrow\downarrow$ or $\downarrow\uparrow$, each half the time, and ought to assign $\vec{p}_{AB} = [ \begin{matrix} 0 & \frac{1}{2} & \frac{1}{2} & 0 \end{matrix}]$.

Since $G^{-1}_{X\otimes X}= I$, we have: $\vec{p}_{AB}G^{-1}_{X\otimes X}\vec{p}_{AB} = \vec{p}_{AB} \cdot \vec{p}_{AB} = \frac{1}{2}$.

In other words, $\vec{p}_{AB}$ is not a pure vector with respect to $X \otimes X$. This is in contrast to the probability vector with respect to $R \otimes R$, where $R$ is informationally complete, which is pure. One consequence is that $\vec{p}_{AB}$ can't capture the full entanglement at play in this situation, although we can use it to reason about correlations with respect to $X$.

For example, we have $\hat{p}_{AB} = \begin{pmatrix} 0 & \frac{1}{2} \\ \frac{1}{2} & 0 \end{pmatrix}$, so $\vec{p}_{A|AB} = \vec{p}_{B|AB} = \begin{pmatrix} \frac{1}{2} \\ \frac{1}{2} \end{pmatrix}$. In other words, as we know, there's an even chance for each $X$ box to give $\uparrow$ or $\downarrow$.

Moreover, our formulas for how to update our probabilities for stuff on the left given stuff on the right, and vice versa, simplify:

$\vec{b} = \frac{\hat{p_{AB}}G_{A}^{-1}\vec{a}}{\vec{a}G^{-1}_{A}\vec{p}_{A|AB}} = \frac{\hat{p_{AB}}\vec{a}}{\vec{a} \cdot \vec{p}_{A|AB}}$

$\vec{a} = \frac{\vec{b}G^{-1}_{B} \hat{p_{AB}}}{\vec{b}G^{-1}_{B}\vec{p}_{B|AB}} = \frac{\vec{b}\hat{p_{AB}}}{\vec{b} \cdot \vec{p}_{B|AB}}$

In fact, let's unpack these formulas:

Since $\hat{p}_{AB} = p(X_{i}X_{j})$, we have:

$\vec{p}_{A|AB} = \sum_{j} p(X_{i}X_{j})$ and $\vec{p}_{B|AB} = \sum_{i} p(X_{i}X_{j})$.

Since $\vec{a} = p(X_{i})$ and $\vec{b} = p(X_{j})$, we have:

$\vec{a} \cdot \vec{p}_{A|AB} = \sum_{i,j} p(X_{i}X_{j})p(X_{i}) $, and $\vec{b} \cdot \vec{p}_{B|AB} = \sum_{i,j} p(X_{i}X_{j})p(X_{j})$.

So that:

$\vec{b} = \frac{\hat{p_{AB}}\vec{a}}{\vec{a} \cdot \vec{p}_{A|AB}} = p(X_{j}) = \frac{\sum_{i} p(X_{i}X_{j})p(X_{i})}{ \sum_{i,j} p(X_{i}X_{j})p(X_{i})}$

$\vec{a} = \frac{\vec{b} \hat{p_{AB}}}{\vec{b} \cdot \vec{p}_{B|AB}} = p(X_{i})= \frac{\sum_{j} p(X_{i}X_{j})p(X_{j})}{ \sum_{i,j} p(X_{i}X_{j})p(X_{j})}$

Indeed, if we update the probabilities on the left to $\begin{pmatrix} 1 \\ 0 \end{pmatrix}$, we ought to update the probabilities on the right to $\begin{pmatrix} 0 \\ 1\end{pmatrix}$.

Moreover, in the case we've been discussing, of perfect anticorrelation, we would collect the exact same probability vector for two $Y$ boxes or two $Z$ boxes.

But as we've seen, there's more going on under the surface here. The issue is that while we could use these vectors to calculate the probabilities for, say, $X$ measurements followed by $Y$ measurements, we couldn't use the $X$ probabilities to calculate the $Y$ probabilities in the case where we haven't done the $X$ measurement. In other words, these probability vectors on reliable boxes can capture correlations, but not the full entanglement, which as we've seen is intimately related to 3D geometry.

Non-Tensor Product Joint Boxes

Considered as a single box, $A \otimes B$ is compatible with stuff of dimension $d_{A} \times d_{B}$. And indeed, afterwards, we could send the output of $A \otimes B$ into any box compatible with stuff of dimension $d_{A} \times d_{B}$.

For instance, we could characterize the relationship between the tensor product of reference SIC's $R_{d} \otimes R_{d}$ and a reference SIC $R_{d^2}$. Whereas the outcome vectors of tensor product boxes are always separable, this won't be true for a general box compatible with dimension $d_{A} \times d_{B}$.


Opening the box

Finally, let's open up a box, and take a look inside.

Memory Stuff

It turns out that in each box $B$ with $n$ lights, there's a little source of $n$ dimensional stuff which we'll call memory stuff. The memory stuff interacts with the incoming stuff, before the latter exits the output port. The memory stuff then enters a reliable box $M$ with $n$ outcomes, and it's these outcomes which determine which light on the outer box blinks.

Indeed, if you delay putting the memory stuff into the reliable box $M$, even after the outgoing stuff has left, you'll find whatever outcome you finally get on the reliable box will be correlated with the outgoing stuff. If you collect a joint probability vector for the outgoing stuff and the memory stuff using a tensor product reference box, ideally a SIC, you'll find that their joint probability vector is entangled.

Update Rule for Mixed Boxes

We've said that if the diagonal of $[A|A] = d\vec{c}_{A}$, where $\vec{c}_{A}$ is the bias vector, then the box $A$ is a pure box. Otherwise, the box is a mixed box. If you have a pure box, and the $i^{th}$ light blinks, then you should update your probability vector to the $i^{th}$ outcome vector of $A$ (the $i^{th}$ column of $[A|A]$). This is just a special case of the general rule for updating partial vectors on the basis of entanglement.

Indeed, more generally, if you have a mixed box, and the $i^{th}$ light blinks, then you should employ the overall entangled joint probability vector of the memory and the stuff, condition on the $i^{th}$ light of $M$ on the left, and update your vector for the stuff on the right, according to the rule.

Alternatives

Specializing for simplicity to the case of a pure box $B$, after the interaction between the memory stuff and the incoming stuff, you'll assign an entangled joint probability vector such that: if you get the $i^{th}$ outcome of $M$, then you should update the probability vector for the stuff to the $i^{th}$ outcome vector of $B$. Thus you can infer from the lights what vector to assign to the outgoing stuff, as we already know.

Alternatively, however, you could send the memory stuff through some other pure box $Q$. Afterwards, you'll end up characterizing the memory stuff with an outcome vector of $Q$, and then you'll update your probability vector for the outgoing stuff using the general rule.

And vice versa: if you send the outgoing stuff through some other pure box $P$, and observe what light blinks, then from the joint state, you'll infer a particular probability vector for the memory stuff, and from that assign probabilities for outcomes of $M$. And rather than assigning probability 1 to a particular light blinking, as it would if you'd sent the stuff through a copy of the original box $B$, you'll get a spread of probabilities for outcomes of $M$.

To say it again, if you send the stuff into $B$, then from what light blinks you can infer with certainty the outcome of the $M$ box on the memory stuff. But if you send one (or the other) through a different box, there will be a degree of uncertainty on the other end.

Entangled Memories

What's more, you could arrange a whole sequence of boxes, but disconnect the interior $M$ boxes, and collect the memory stuff from each of them, careful not to disturb it. Stuff would enter boxes, interact with memory stuff, and thence traverse other boxes, interacting with other memory stuff. Afterwards, you gather up all the memory stuff from all the boxes and put it side by side: so far, you've delayed putting any of them into boxes. If you repeat this procedure many, many times and collect probabilities with respect to a reference box for the entire joint vector of the memories, then you'll find that the memories are all entangled with each other. And you can measure them (with their original reliable boxes) in any order you like, and you'll get a consistent story about what happened to the stuff at each box.

Then again, as we've seen: alternatively, you could send the memories into whatever other boxes you like, and then infer whatever vectors for all the outgoing stuff. But this will preclude you telling a story about what happened to the original stuff in terms of the original boxes, since for one thing, the stuff may be characterized by probability vectors that aren't necessarily outcome vectors of the (pure) boxes they went through$-$and now we have that, across time.

Entering the box

Suppose there's some memory stuff inside you, and when you see a light blink, this means that some stuff has interacted with your memory. In other words, place yourself inside the pure box $B$, as the memory. And imagine there's someone outside the box. This person saw you and the stuff go in, and they'll see you and the stuff come out. This outsider wants to predict how the two of you will respond to future boxes.

Suppose that being asked what you saw is effectively the $M$ box for you. And suppose the outsider asks you what you saw, and you say light $i$. Then they'll predict that the stuff will be characterized by the $i^{th}$ outcome vector of $B$. Conversely, if they send the stuff into a $B$ box, and observe light $i$, then they'll predict you'll definitely answer: light $i$.

On the other hand, if they send the stuff into some other pure box $P$, and update on an outcome vector, then they'll assign a spread of probabilities for what you'll say about what light you saw.

It's not unreasonable that this outsider will have some uncertainty about how you'll answer. You and the stuff had an "agreement" written, as it were, in terms of the outcomes of the original box $B$. What this means is that if the outsider respects this agreement, there's a way of asking you and the stuff questions such that the correspondence between your answers is strict. But there are other ways of asking questions, such that the other will appear to have some wiggle room in how they'll answer, the shape of which depends on the particular question, and the particular agreement. Indeed, if the person outside the box asks the "wrong" question of the stuff, there's no way you can "agree" with it: what would agreement mean, on an individual outcome basis, if you're asked question $M$, but the stuff is asked question $P$?

What's perhaps confusing, however, is that in repetitions of this scenario, you'll assign probabilities to outcomes of $B$ that depend entirely on the initial probability vector of the stuff. But on the other hand, in repetitions of the scenario, the outsider will assign probabilities for your answers (about which outcome of $B$ you observed), which will depend on the final probability vector of the stuff (after the $P$ box).

In other words, it makes a difference in the probabilities whether you place yourself first person or third person into the picture.

We could say: when the outsider assigns an entangled joint vector to you and the stuff, it's a recognition that you and the stuff have secured a particular relationship, which the outsider, beyond the horizon of the box, is not a part of. Nevertheless, the assignment of an entangled joint vector is a wager that if the outsider puts you in $M$ and/or the stuff in $B$, then they can with certainty predict how the one will respond from the other's answer; but if the stuff is sent through a different box $P$, you'll answer in a way that depends on $P$'s outcome, while necessarily surprising the outsider$-$despite them having incorporated all possible information into their predictions.

Of course, after sending the stuff into $P$, and supposing they can infer a probability vector for you, the outsider can then try find a reliable box to send you through which has that vector as an outcome vector, and then they'd predict the outcome of that box that with certainty, thus compensating for their earlier choice of $P$, which introduced uncertainty into your answer to $M$. But otherwise, in terms of $M$, there's just a fundamental ambiguity for the outsider. After all, for the outsider, if the stuff isn't characterized by an outcome vector of $B$, there's no more "right" answer in terms of $M$: the outsider can only predict a spread of probabilities, depending on how strange the new stuff is relative to the fact that if the stuff were hypothetically in an outcome state of $B$, they'd have certainty. It's as if: the outsider didn't respect the relationship between you and the stuff, and the price of doing so was paid in certainty. As if, because of their choice of box, there's something missing from the world of the outsider, a gap of subjunctivity, so that your answer must be to them as much an act of creation as the original light you saw blink.


Bibliography

The Varieties of Minimal Tomographically Complete Measurements

Quantum Theory is a Quasi-stochastic Process Theory

Introducing the Qplex: A Novel Arena for Quantum Theory

QBism: Quantum Theory as a Hero’s Handbook