joint work with Barbara Grigg and Chase Wilson
\[ \begin{align} \theta^* & \sim \text{Normal}(\theta^{(m)}, \Sigma) \\ u & \sim U(0, 1) \\ r & = \text{min}\left(1, \frac{p(\theta^*) \cdot \text{normal}(\theta^{(m)} | \theta^*, \Sigma)}{p(\theta^{(m)}) \cdot \text{normal}(\theta^* | \theta^{(m)}, \Sigma)} \right) \\ a & = u <= r \\ \theta^{(m + 1)} & = a \cdot \theta^* + (1 - a) \cdot \theta^{(m)} \\ \end{align} \]
\[ \begin{align} \rho & \sim G \quad \# \in \mathbb{R}^D; \text{ direction} \\ \xi^* & \sim f(\xi) = \frac{p(\rho\xi + \theta^{(m)})}{\int p(\rho \xi + \theta^{(m)}) d\xi} \\ \theta^{(m + 1)} & = \rho \xi^* + \theta^{(m)} \\ \end{align} \]
Kullback–Leibler Hit and Run
\[ \begin{align} \rho & \sim Normal(...) \\ \rho & = \rho / ||\rho|| \quad \# \text{ unit direction}\\ \eta^* & = \text{argmin}_{\eta} \text{KL}(\, q(\xi | \eta) \, || \, p(\rho \xi + \theta^{(m)}) \,) \\ \xi^* & \sim Q(\eta^*) \\ \theta^* &= \rho \xi^* + \theta^{(m)} \\ u & \sim U(0, 1) \\ r & = \text{min}\left(1, \frac{p(\theta^*) \cdot \text{q}(0 | \eta^*)}{p(\theta^{(m)}) \cdot \text{q}(\xi^* | \eta^*)} \right) \\ a & = u <= r \\ \theta^{(m + 1)} & = a \cdot \theta^* + (1 - a) \cdot \theta^{(m)} \\ \end{align} \]
When \(Q\) represents the family of one dimensional Normal distirbutions with density function \(q\), where \(\eta = (\mu, \sigma)\)
\[ \begin{align} \text{KL}&(\, q(\xi | \eta) \, || \, p(\rho \xi + \theta^{(m)}) \,) \\ & = \int_{\mathbb{R}} (\log{q(\xi | \eta)} - \log{p(\rho \xi + \theta^{(m)})}) q(\xi | \eta) d\xi\\ & \approx -\log{\sigma} - \sum_{n} w_n \log{p( \, l(x_n | \eta) \, )} \\ & =: L(\eta) \end{align} \]
where \(x_n, w_n\) are the roots of the Hermite polynomials and associated weights from Gauss-Hermite quadrature.
\(Q\) represents the family of sinh-arcsinh distributions, where \(\eta = (\mu, \sigma, \delta, \epsilon)\)
Let \(Z \sim \text{Normal}(0, 1)\). Then \(Q \sim T(Z)\) where
\[T(z) = \mu + \sigma \text{sinh}\left( \frac{\text{arsinh}(z) + \epsilon}{\delta} \right)\]
\(\eta \in \mathbb{R}^d\) where \(d \in \{2, 4\}\). Use scipy and Bridgestan JAX Bridgestan
funnel.stan: 11 dimensional funnel; \(10^7\) iterations
earnings.stan: poorly scaled linear regression
normal.stan: 100 dimensional isotropic Gaussian
vs hand tuned MH
ar1.stan: 100 dimensional AR(1); \(10^5\) iterations
any thoughts on