Conditional Distributions (measure theory)
Conditional Distributions
Suppose (Ω,ΣΩ) is a measureable
metric space with (probability) measure μ. Data are
defined by the values the measurable map
X(ω) take on.
Data space relative to
X:Ω→X is
(X,ΣX,μX) where
μX is the pushforward of μ by
X defined to be
μX:=μ∘X−1.
Similarly, parameter space is
(Θ,ΣΘ,μθ)
defined relative to the measurable map
θ:Ω→Θ.
A conditional (probability) distribution of
X given θ is a function
μX∣θ:Ω×ΣX→[0,1]
such that
- fix AX∈ΣX, then
ϑ↦μX∣θ(ϑ,AX) is a
measurable map for ϑ∈Θ, and
- fix ϑ∈Θ, then
AX↦μX∣θ(ϑ,AX) is a
measure on (X,ΣX).
The existence of such a function is proved in Theorem 33.3 of Billingley
(2012). We list some properties of such a function next.
If there exists a measure λ on
(X,ΣX) such that
μX∣θ≪λ for all ϑ∈Θ,
then the Radon-Nikodym derviative (density function) of
μX∣θ exists and is written as
dμX∣θ/dλ=ρX∣θ.
In other words,
μX∣θ(ϑ,AX)=∫AXρX∣θ(x)dλ(x)
for every ϑ∈Θ and for all
AX∈ΣX.
The joint measure μX,θ is defined as
μX,θ(AX×BΘ)=∫BΘμX∣θ(ϑ,AX)dμθ(ϑ)
Replacing the conditional distribution with its density function to get
μX,θ(AX×BΘ)=∫BΘ∫AXρX∣θ(x)dλ(x)dμθ(ϑ)
By Tonelli's
theorem, a specific case of Fubini's theorem for non-negative measurable
maps, we can switch the order of integration.
μX,θ(AX×BΘ)=∫AX∫BΘρX∣θ(x)dμθ(ϑ)dλ(x)
The marginal distribution μX can be recovered as
μX(AX)=∫AX∫ΘρX∣Θ(x)dμΘ(ϑ)dλ(x)
This implies a version of the Radon-Nikodym derivative of the marginal distribution
dλdμX=∫ΘρX∣Θ(x)dμΘ(ϑ)
Similarly, there exists a conditional distribution of θ given X,
μθ∣X:Ω×ΣΘ↦[0,1].
The joint distribution is recovered as
μX,θ(AX×BΘ)=∫AXμθ∣X(x,BΘ)dμX(x)
Bayes' Theorem
Building on the context above, the conditional distribution of
θ given
X can be written as
μθ∣X(x,BΘ)=∫BΘ∫ΘρX∣θ(x)dμθ(ϑ)ρX∣θdμθ(ϑ)
Which means the density function is
dμθdμθ∣X=∫ΘρX∣θ(x)dμθ(ϑ)ρX∣θ Below we sketch a proof of Bayes' Theorem following Schervish (2012).
Notice that there are two ways to represent the joint distribution
μX,θ.
By
(1), we have
μX,θ(AX×BΘ)=∫AX∫BΘρX∣θ(x)dμθ(ϑ)dλ(x)
By
(2), we have
μX,θ(AX×BΘ)=∫AXμθ∣X(x,BΘ)dμX(x)=∫AXμθ∣X(x,BΘ)∫ΘρX∣θ(x)dμθ(ϑ)dλ(x)
From the last two displays, the terms inside the integrals (over
AX) must be equal.
μθ∣X(x,BΘ)∫ΘρX∣θ(x)dμθ(ϑ)=∫BΘρX∣θ(x)dμθ(ϑ)
Rearranging, we find an expression for the conditional distribution of
θ given
X.
μθ∣X(x,BΘ)=∫BΘ∫ΘρX∣θ(x)dμθ(ϑ)ρX∣θ(x)dμθ(ϑ)
The density function with respect to
μθ is then
dμθdμθ∣X=∫ΘρX∣θ(x)dμθ(ϑ)ρX∣θ
If we further assume that there exists a measure
λ on
Θ such that
μθ≪λ,
then there exists a version of the Radon-Nikodym derivative which recovers
what is commonly taught as Bayes' theorem to undergraduates, written in terms
of density functions.
By the chain rule for Radon-Nikodym derivatives
ρθ∣X=dλμθ∣X=dμθdμθ∣Xdλdμθ
Next expand the Radon-Nikodym derivatives.
ρθ∣X=dμθdμθ∣Xdλdμθ=∫ΘρX∣θ(x)dμθ(ϑ)ρX∣θρθ
Last, expand the term
dμθ inside the integral.
ρθ∣X=∫ΘρX∣θ(x)ρθ(ϑ)dλ(ϑ)ρX∣θρθ References
- Billingsley, P. (2012). Probability and measure. John Wiley & Sons.
- Schervish, M. J. (2012). Theory of statistics. Springer Science & Business Media.