Goal: Prove that:
$$ H(X, Y) = H(X) + H(Y | X) $$
Where:
The joint entropy of \( X \) and \( Y \) can be written using the chain rule as:
$$ H(X, Y) = - \sum_{x \in X, y \in Y} p(x, y) \log p(x, y) $$
Now, using the fact that \( p(x, y) = p(x) p(y | x) \), we can rewrite this as:
$$ H(X, Y) = - \sum_{x \in X, y \in Y} p(x) p(y | x) \log \left( p(x) p(y | x) \right) $$
Using the logarithmic property \( \log(ab) = \log(a) + \log(b) \), we get:
$$ H(X, Y) = - \sum_{x \in X, y \in Y} p(x) p(y | x) \left( \log p(x) + \log p(y | x) \right) $$
Distribute the terms in the sum:
$$ H(X, Y) = - \sum_{x \in X, y \in Y} p(x) p(y | x) \log p(x) - \sum_{x \in X, y \in Y} p(x) p(y | x) \log p(y | x) $$
The first term is:
$$ - \sum_{x \in X, y \in Y} p(x) p(y | x) \log p(x) $$
Since \( p(x) \) does not depend on \( y \), we can factor it out:
$$ - \sum_{x \in X} p(x) \log p(x) \sum_{y \in Y} p(y | x) $$
Now, the sum \( \sum_{y \in Y} p(y | x) \) is the total probability distribution of \( Y \) conditioned on \( X = x \). Since \( p(y | x) \) is a conditional probability distribution, we have:
$$ \underline{\sum_{y \in Y} p(y | x) = 1} $$
So, the first term simplifies to:
$$ - \sum_{x \in X} p(x) \log p(x) \times 1 = - \sum_{x \in X} p(x) \log p(x) = H(X) $$
The second term is:
$$ - \sum_{x \in X, y \in Y} p(x) p(y | x) \log p(y | x) $$
Since \( p(x) \) does not depend on \( y \), we can factor it out of the sum over \( y \) to get:
$$ - \sum_{x \in X} p(x) \sum_{y \in Y} p(y | x) \log p(y | x) $$
This is the definition of the conditional entropy \( H(Y | X) \), so we have:
$$ H(Y | X) = - \sum_{x \in X} p(x) \sum_{y \in Y} p(y | x) \log p(y | x) $$
Now, we notice that the sum \( \sum_{x \in X} p(x) = 1 \) because \( p(x) \) is the marginal probability of \( X \), and thus:
$$ \underline{\sum_{x \in X} p(x) = 1} $$
Now we can combine the results from Step 3 and Step 4:
$$ H(X, Y) = H(X) + H(Y | X) $$
Conclusion: We have shown that:
$$ H(X, Y) = H(X) + H(Y | X) $$