Shearer’s lemma

infoTheory
Published

02 10 2023

The Shearer’s lemma (Chung et al. 1986) is concerned with a generalization of the sub-additivity of the Shannon Entropy,

\[ H(X_1, \ldots, X_N) \; \leq \; H(X_1) + \ldots + H(X_N). \]

Instead, consider an integer \(t \geq 1\) and a family \(S_1, \ldots, S_K\) of subsets of \(\{1, \ldots, N\}\) such that any index \(1 \leq n \leq N\) appears in at least \(t\) of these subsets. Note that for a subset \(S_i = \{ \alpha_1, \ldots, \alpha_{r_i}\}\) with \(\alpha_1 < \ldots < \alpha_{r_i}\) we have

\[ \begin{align} H(X_{S_i}) &\equiv H(X_{\alpha_1}, \ldots, X_{\alpha_{r_i}})\\ &= H(X_{\alpha_1}) + H(X_{\alpha_2} | X_{\alpha_1}) + \ldots + H(X_{\alpha_{r_i}} | X_{\alpha_{r_i-1}}, \ldots, X_{\alpha_1} ) \\ &\geq H(X_{\alpha_1}) + H(X_{\alpha_2} | X_{1:(\alpha_2-1)}) + \ldots + H(X_{\alpha_{r_i}} | X_{1:\alpha_{r_i}-1}). \end{align} \tag{1}\]

Since each index appears in at least \(t\) of the subsets, summing Equation 1 over all the subset \(S_i\) yields

\[ \sum_{i=1}^K H(X_{S_k}) \geq t \, \sum_{i=1}^N H(X_i \, | X_{1:(i-1)}) = t \, H(X). \]

This means that the following inequality holds,

\[ H(X) \leq \frac{1}{t} \, \sum_{k=1}^K H(X_{S_k}) \]

Indeed, the standard sub-additivity property of the entropy corresponds to the set \(S_k = [k]\) for \(1 \leq k \leq N\) and \(t=1\).

Application: projection on hyperplanes

Consider a measurable set \(A \subset \mathbb{R}^n\) and call \(A_k\) the projection of \(A\) on the hyperplane \(\{x=(x_1, \ldots, x_n) \in \mathbb{R}^n \, : \, x_k=0\}\). A Theorem of Loomis and Whitney (Loomis and Whitney 1949) states that the lebesgue measure \(|A|\) of the set \(A\) satisfies

\[ |A| \; \leq \; \prod_{k=1}^n |A_k|^{1/(n-1)}. \]

In other words, if all the projections \(A_k\) of the set \(A\) are small then, necessarily, the set \(A\) itself is small. To proceed, one can approximate this set \(A\) with the union \(A_{\varepsilon}\) of small cubes of side \(\varepsilon\) centred on \(\varepsilon\, \mathbb{Z}^n\). If one can prove the statement for \(A^{[\varepsilon]}\), the results follows from a standard approximation argument (ie. outer measure). Now, each cube can be indexed with a \(n\)-uple of integers \((x_1, \ldots, x_n) \in \mathbb{Z}^n\), and one can consider the random variable \(X=(X_1, \ldots, X_n)\) that is uniformly distributed on the set of cubes coordinates. Because \(2^{H(X)} = |A^{[\varepsilon]}| / \varepsilon^n\) and \(2^{H(X_2, \ldots, X_n)} = |A_1^{[\varepsilon]}| / \varepsilon^n\) etc…, choosing the subsets \(S_i=[1:n] \setminus \{i\}\) and \(t = (n-1)\) in Shearer’s Lemma immediately gives the conclusion.

References

Chung, Fan RK, Ronald L Graham, Peter Frankl, and James B Shearer. 1986. “Some Intersection Theorems for Ordered Sets and Graphs.” Journal of Combinatorial Theory, Series A 43 (1). Academic Press: 23–37.
Loomis, Lynn H, and Hassler Whitney. 1949. “An Inequality Related to the Isoperimetric Inequality.”