An Operational Interpretation of Partial Information Decomposition

In this section, we provide the main result of this work, which is an operational interpretation of partial information decomposition.

4.1. PID via Sato’s Outer Bound

Researchers in the information theory community have made numerous efforts to identify a computable characterization of the capacity region of general broadcast channels (see the textbook [33] for a historical summary); yet, at this time, a complete solution is still elusive. Nevertheless, significant progress has indeed been made toward this goal. Particularly, Sato [34] provided an outer bound for

C

, and it can be specialized to yield an upper bound for the sum-rate capacity of the general broadcast channel as follows:

$\begin{matrix} C_{sum} \leq C_{Sato} ≜ min_{Q \in Q^{'}} max_{P_{T}} I_{P_{T} Q_{X, Y ∣ T}} (X, Y; T), \end{matrix}$

(5)

where the set $Q^{'}$ is defined as

$\begin{matrix} Q^{'} = T, Q_{Y,} \end{matrix}$

(6)

i.e., the set of conditional distributions for which the marginal conditional distributions $P_{X | T}$ and $P_{Y | T}$ are preserved. The inner maximization is over the possible marginal distribution of the random variable $P_{T}$ in the alphabet $T$ . The form already bears a certain similarity to (4). Note that for channels on general alphabets (i.e., not necessarily optimized on a compact space), the maximization should be replaced by the supremum and the minimization by the infimum. Due to the minimax form, the meaning is not yet clear, but the max–min inequality (weak duality) implies that

$\begin{matrix} min_{Q \in Q^{'}} max_{P_{T}} I_{P_{T} Q_{X, Y ∣ T}} (X, Y; T) \geq max_{P_{T}} min_{Q \in Q^{'}} I_{P_{T} Q_{X, Y ∣ T}} (X, Y; T) = max_{P_{T}} min_{Q \in Q} I_{Q_{X, Y, T}} (X, Y; T), \end{matrix}$

(7)

where the equality is by the definition of $Q^{'}$ , and the set $Q$ is exactly the one defined in (4) with $Q_{X, T} = P_{T} Q_{X | T}$ and $Q_{Y, T} = P_{T} Q_{Y | T}$ . The inner minimization of this form is exactly the same as the second term in (3). Though the max–min form does not yield a true upper bound of the sum-rate capacity, in the PID setting we consider, $P_{T}$ is always fixed; therefore, the max–min and min–max forms are in fact equivalent in this setting. The equivalence in mathematical forms does not fully explain the significance of this connection, and we will need to consider Sato’s bound more carefully. Let us define the following quantity for notational simplicity:

$\begin{matrix} R_{P_{T}} ≜ min_{Q \in Q^{'}} I_{P_{T} Q_{X, Y ∣ T}} (X, Y; T) . \end{matrix}$

(8)

Sato’s outer bound was derived using the following argument. For a channel that has a single input signal T and a single output signal Y on a channel

P_{Y | T}

, Shannon’s channel coding theorem [28] states that the channel capacity is given by

{max}_{P_{T}} I_{P_{T} P_{Y | T}} (Y; T)

. Moreover, for any fixed distribution

P_{T}

, the rate

I_{P_{T} P_{Y | T}} (Y; T)

is achievable, where the probability distribution

P_{T}

represents the statistical signaling pattern of the underlying codes. Turning our attention back to the broadcast channel with a transition probability

P_{X, Y | T}

, if the receivers are allowed to cooperate fully and share the two output signals X and Y—i.e., they become a single virtual receiver (see Figure 1b)—then clearly the maximum rate achievable would be

{max}_{P_{T}} I_{P_{T} Q_{X, Y ∣ T}} (X, Y; T)

. However, Sato further observed that the error probability of any code should not differ on any broadcast channel

Q_{X, Y ∣ T} \in Q^{'}

, even if the transition distribution

Q_{X, Y ∣ T}

is different from the true broadcast channel transition probability

P_{X, Y ∣ T}

. This is because the channel outputs only depend on the marginal transition probabilities

P_{X | T}

and

P_{Y | T}

, respectively, and the decoders only use their respective channel outputs to decode. Therefore, we can obtain an upper bound by choosing the worst channel configuration

Q \in Q

, i.e., the outer minimization in (5).

With the interpretation of Sato’s upper bound above, it becomes clear that

R_{P_{T}}

is essentially an upper bound on the sum rate of the broadcast channel where the receivers are not allowed to cooperate, when the input signaling pattern is fixed to follow

P_{T}

. On the other hand, the quantity

I_{P_{T} P_{X, Y ∣ T}} (X, Y; T)

is the rate that can be achieved by allowing the two receivers to fully cooperate, also with

P_{T}

being the input signaling pattern. In this sense,

I^{(S)} (T; X, Y)

defined in (3) is a lower bound on the difference between the sum rate with full cooperation and that without any cooperation, with the input signaling pattern following

P_{T}

This connection provides an operational interpretation of PID for general distributions. Essentially, synergistic information can be viewed as a surrogate of the cooperative gain. When this lower bound is in fact also achievable, $I^{(S)} (T; X, Y)$ would be exactly equal to the cooperative gain. In the corresponding learning setting, it is the difference between what can be inferred about T by using both X and Y in a non-cooperative manner, and what can be inferred by using them jointly. This indeed matches our expectations for the synergistic information. In the next subsection, we consider a special case when Sato’s bound is indeed achievable, and the lower bound mentioned above becomes exact.

In one sense, this operational interpretation is quite intuitive as explained above, but on the other hand, it is also quite surprising. For example, in broadcast channels, a more general setup allows the transmitter to also send a common message to both receivers [35,36], in addition to the two individual messages to the two respective receivers. It would appear plausible to expect this generalized setting to be more closely connected to the PID setting with the common message related to the common information, yet this turns out to be not the case here. Moreover, a dual communication problem studied in information theory is the multiple access channel (see, e.g., [28]), where two transmitters wish to communicate to the same receiver simultaneously. The readers may also wonder if an operational meaning should be extracted on this channel, instead of on the broadcast channel. However, note that in the PID setting, we are inferring T from X and Y, which is similar to the decoding process in the broadcast channel, instead of the multiple access channel. Moreover, in the multiple access channel, the joint distribution of the two transmitters’ inputs is always independent when the two transmitters cannot cooperate, and this will not match the PID setting under consideration. Another seemingly related problem setting studied in the information theory literature is the common information between two random variables [37,38]; however, for the PID defined in [13], this approach also does not yield a meaningful interpretation.

4.2. Gaussian MIMO Broadcast Channel and Gaussian PID

One setting where a full capacity region characterization is indeed known is the Gaussian multiple-input multiple-output (MIMO) channel [32,39]. In the two-user Gaussian MIMO broadcast channel, the channel transition probability

P_{\vec{X} | \vec{T}}

and

P_{\vec{Y} | \vec{T}}

are given, with

\vec{T}

being the transmitter input variable, and

\vec{X}, \vec{Y}

the channel outputs at the two individual receivers. The channel is usually defined as follows:

$\begin{matrix} \vec{X} = H_{X} \vec{T} + {\vec{n}}_{X} \end{matrix}$

(9)

$\begin{matrix} \vec{Y} = H_{Y} \vec{T} + {\vec{n}}_{Y}, \end{matrix}$

(10)

where $H_{X}$ and $H_{y}$ are two channel matrices, the additive noise vector ${\vec{n}}_{X}$ is independent of $\vec{T}$ , and similarly, ${\vec{n}}_{Y}$ is independent of $\vec{T}$ . For a fixed input signaling distribution $P_{T}$ , the pairwise marginal distributions $P_{\vec{T}, \vec{X}}$ and $P_{\vec{T}, \vec{Y}}$ are well specified. Conversely, for any joint distribution $P_{\vec{T}, \vec{X}, \vec{Y}}$ , where the marginals $P_{\vec{T}, \vec{X}}$ and $P_{\vec{T}, \vec{Y}}$ are jointly Gaussian, respectively, we can represent their relation in the form above via a Gram–Schmidt orthogonalization. Note that the joint distribution of ${\vec{n}}_{X}, {\vec{n}}_{Y}$ is not fully specified here, as the noise vectors ${\vec{n}}_{X}$ and ${\vec{n}}_{Y}$ are not necessarily jointly Gaussian, but can be dependent in a more sophisticated manner. The standard Gaussian MIMO broadcast problem usually specifies the noises zero-mean Gaussian with certain fixed covariances, and there is also a covariance constraint on the transmitter’s signaling $\vec{T}$ . The problem can be further simplified using certain linear transformations, as discussed below.

Let us assume

Σ_{{\vec{n}}_{X}}

and

Σ_{{\vec{n}}_{Y}}

are full rank for now (when

Σ_{{\vec{n}}_{X}}

and

Σ_{{\vec{n}}_{Y}}

are not fully rank, a limiting argument can be invoked to show the same conclusion holds), and in this case, it is clearly without loss of generality to assume that

Σ_{{\vec{n}}_{i}}

’s are in fact identity matrices, since otherwise, we can perform receiver-side linear transforms to make them so, i.e., through a transformation based on the eigenvalue decomposition of

Σ_{{\vec{n}}_{X}}

and

Σ_{{\vec{n}}_{Y}}

, respectively. For the same reason, we can assume

Σ_{\vec{T}}

is an identity matrix, through a linear transformation at the transmitter. These reductions to independent noise and independent channel input are often referred to as the transmitter precoding transformation and the receiver precoding transformations in the communication literature; see, e.g., [39].

For the two-user Gaussian broadcast channel, the worst channel configuration problem we discussed in the general setting is essentially the least favorable noise problem considered by Yu and Cioffi [39] with the simplification above, where the noise relation between

{\vec{n}}_{X}

and

{\vec{n}}_{Y}

needs to be identified for a channel that makes it the hardest to communicate. It was shown in [39] that the least favorable noise problem can be recast as an optimization problem:

$\begin{matrix} minimize : & log \frac{| H Σ_{\vec{T}} H^{T} + Σ_{\vec{n}} |}{| Σ_{\vec{n}} |} \end{matrix}$

(11)

$\begin{matrix} subject to : & Σ_{{\vec{n}}_{x}} = I, \end{matrix}$

(12)

where $H^{T} = [H_{X}^{T}, H_{Y}^{T}]$ , and $\vec{n} = ({\vec{n}}_{X}, {\vec{n}}_{Y})$ , when $H Σ_{\vec{T}} H^{T}$ is nonsingular. It can be shown that the problem is convex.

Yu and Cioffi also showed that in this setting, Sato’s upper bound is achievable—i.e., it is exactly the sum-rate capacity. Moreover, for any input signaling

P_{\vec{T}}

that is Gaussian-distributed, the corresponding

R_{P_{T}}

can be achieved on this broadcast channel through a more sophisticated scheme known as dirty-paper coding [40]. Therefore, when the pairwise marginals

P_{\vec{T}, \vec{X}}

and

P_{\vec{T}, \vec{X}}

are jointly Gaussian, respectively, the synergistic information

I^{(S)} (\vec{T}; \vec{X}, \vec{Y})

is exactly the cooperative gain of the corresponding Gaussian broadcast channel using this specific input signaling pattern

P_{\vec{T}}

. The connection in the Gaussian setting is of particular interest, given the practical importance of the Gaussian PID, which was thoroughly explored in [41].

Source link

Chao Tian www.mdpi.com

Greenberg News

An Operational Interpretation of Partial Information Decomposition

4.1. PID via Sato’s Outer Bound

4.2. Gaussian MIMO Broadcast Channel and Gaussian PID

Greenberg

4.1. PID via Sato’s Outer Bound

4.2. Gaussian MIMO Broadcast Channel and Gaussian PID

Related Posts

How Trump vetoed the world’s plan to reduce shipping emissions

Going Beyond Beekeeping to Protect Pollinators

Processes, Vol. 13, Pages 3389: Wellbore Stability in Interbedded Weak Formations Utilizing a Shear-Based Method: Numerical Realization and Analysis

Greenberg