Proof of the Joint Entropy Formula using Conditional Entropy

Goal: Prove that:

$$ H(X, Y) = H(X) + H(Y | X) $$

Where:

Step 1: Express Joint Entropy Using the Chain Rule

The joint entropy of \( X \) and \( Y \) can be written using the chain rule as:

$$ H(X, Y) = - \sum_{x \in X, y \in Y} p(x, y) \log p(x, y) $$

Now, using the fact that \( p(x, y) = p(x) p(y | x) \), we can rewrite this as:

$$ H(X, Y) = - \sum_{x \in X, y \in Y} p(x) p(y | x) \log \left( p(x) p(y | x) \right) $$

Using the logarithmic property \( \log(ab) = \log(a) + \log(b) \), we get:

$$ H(X, Y) = - \sum_{x \in X, y \in Y} p(x) p(y | x) \left( \log p(x) + \log p(y | x) \right) $$

Step 2: Split the Sum

Distribute the terms in the sum:

$$ H(X, Y) = - \sum_{x \in X, y \in Y} p(x) p(y | x) \log p(x) - \sum_{x \in X, y \in Y} p(x) p(y | x) \log p(y | x) $$

Step 3: Simplify the First Term

The first term is:

$$ - \sum_{x \in X, y \in Y} p(x) p(y | x) \log p(x) $$

Since \( p(x) \) does not depend on \( y \), we can factor it out:

$$ - \sum_{x \in X} p(x) \log p(x) \sum_{y \in Y} p(y | x) $$

Now, the sum \( \sum_{y \in Y} p(y | x) \) is the total probability distribution of \( Y \) conditioned on \( X = x \). Since \( p(y | x) \) is a conditional probability distribution, we have:

$$ \underline{\sum_{y \in Y} p(y | x) = 1} $$

So, the first term simplifies to:

$$ - \sum_{x \in X} p(x) \log p(x) \times 1 = - \sum_{x \in X} p(x) \log p(x) = H(X) $$

Step 4: Simplify the Second Term

The second term is:

$$ - \sum_{x \in X, y \in Y} p(x) p(y | x) \log p(y | x) $$

Since \( p(x) \) does not depend on \( y \), we can factor it out of the sum over \( y \) to get:

$$ - \sum_{x \in X} p(x) \sum_{y \in Y} p(y | x) \log p(y | x) $$

This is the definition of the conditional entropy \( H(Y | X) \), so we have:

$$ H(Y | X) = - \sum_{x \in X} p(x) \sum_{y \in Y} p(y | x) \log p(y | x) $$

Now, we notice that the sum \( \sum_{x \in X} p(x) = 1 \) because \( p(x) \) is the marginal probability of \( X \), and thus:

$$ \underline{\sum_{x \in X} p(x) = 1} $$

Step 5: Combine the Results

Now we can combine the results from Step 3 and Step 4:

$$ H(X, Y) = H(X) + H(Y | X) $$

Conclusion: We have shown that:

$$ H(X, Y) = H(X) + H(Y | X) $$