---
title: "4. Method"
output: rmarkdown::html_vignette
bibliography: biblio.bib
vignette: >
  %\VignetteIndexEntry{4. Method}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## General setup

Let us set up the notations first. Suppose a there exists a partition of a
region $\mathrm{D} \in \mathcal{R}^2$ (e.g., a city). This partition is denoted by
$A_i$, $i = 1, \ldots, n$. Moreover, there exists another partition of the same
city, denoted $B_j$, where $j = 1, \ldots, m$. These partitions can be seen as
two different administrative divisions within the same city. It is common for
different government agencies to release data for different divisions of a same
city, country, or state.

## Model-based approach


Consequently,

$$\mathrm{E}[Y(A_i)] = \frac{1}{\lvert A_i \rvert} \int_{A_i}
\mathrm{E}[Y(\mathbf{s})] \, \mathrm{d} \mathbf{s} = \frac{1}{\lvert A_i \rvert}
\int_{A_i} \mu \, \mathrm{d} \mathbf{s} = \mu$$

and

\begin{align*}
\mathrm{Cov}[Y(A_i), Y(A_j)] & = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert}
\int_{A_i \times A_j} \mathrm{Cov}[Y(\mathbf{s}, Y(\mathbf{s}')] \, \mathrm{d}
\mathbf{s} \, \mathrm{d} \mathbf{s'} \\
& = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \mathrm{C}(
\lVert \mathbf{s} - \mathbf{s}' \rVert ; \boldsymbol{\theta}) \, \mathrm{d}
\mathbf{s} \, \mathrm{d} \mathbf{s'},
\end{align*}

where $\lVert \mathbf{s} - \mathbf{s}' \rVert$ is the Euclidean distance between
the coordinates $\mathbf{s}$ and $\mathbf{s}'$, and $\mathrm{C}( \lVert
\mathbf{s} - \mathbf{s}' \rVert ; \boldsymbol{\theta})$ is an isotropic
covariance function depending on the parameter $\boldsymbol{\theta}$.


Assume we observe a random variable $Y(\cdot)$ at each region $A_i$ and we are
interested in predict/estimate this variable in each of the regions $B_j$. Now
suppose the random variable $Y(\cdot)$ varies continuously over $\mathrm{D}$ and is
defined as follows
$$Y(\mathbf{s}) = \mu + S(\mathbf{s}) + \varepsilon(\mathbf{s}), \, \mathbf{s} \in
\mathrm{D} \subset \mathcal{R}^2.$$

where
$$
 S(\cdot) \sim \mathrm{GP}(0, \sigma^2 \rho(\cdot; \, \phi, \kappa)) \;
 \text{ and } \;
\varepsilon(\cdot) \overset{\mathrm{i.i.d.}}{\sim} \mathrm{N}(0, \sigma^2 \rho(\cdot;
\, \phi, \kappa)),
$$
where $S$ and $\varepsilon$ are independent. For now, let's make the
unrealistic assumption that all those parameters are known. Then, our assumption
is that the observed data is as follows

\begin{align*}
Y(A_i) & = \frac{1}{\lvert A_i \rvert} \int_{A_i} Y(\mathbf{s}) \, \mathrm{d}
\mathbf{s} \\
& = \frac{1}{\lvert A_i \rvert} \int_{A_i} [\mu + S(\mathbf{s}) +
\varepsilon(\mathbf{s})] \, \mathrm{d} \mathbf{s} \\
& = \mu + \frac{1}{\lvert A_i \rvert} \int_{A_i} S(\mathbf{s}) \mathrm{d}
\mathbf{s} + \frac{1}{\lvert A_i \rvert} \int_{A_i} \varepsilon(\mathbf{s})
\mathrm{d} \mathbf{s},
\end{align*}

where $\lvert \cdot \rvert$ returns the area of a polygon.
Furthermore, it can be shown that (using Fubini's Theorem and some algebraic
manipulation)
$$
\mathrm{Cov}(Y(A_i), Y(A_j)) = \frac{\sigma^2}{\lvert A_i \rvert \lvert A_j \rvert}
\int_{A_i \times A_j} \rho( \lVert \mathbf{s} - \mathbf{s}' \rVert; \, \phi,
\kappa ) \, \mathrm{d} \mathbf{s} \, \mathrm{d} \mathbf{s}' + \mathbf{I}(i = j)
\frac{\tau}{\lvert A_i \rvert},
$$
where $\rho(\cdot ; \, \phi, \kappa)$ is a positive definite correlation
function. Now, let $\mathrm{R}_{\kappa}(\phi)$ be a correlation matrix such that
$$
\mathrm{R}_{\kappa}(\phi)_{ij} = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert}
\int_{A_i \times A_j} \rho( \lVert \mathbf{s} - \mathbf{s}' \rVert; \, \phi,
\kappa ) \, \mathrm{d} \mathbf{s} \, \mathrm{d} \mathbf{s}',
$$
thus,
$$
Y(A_1, \cdots, A_n) \sim \mathrm{N}( \mu \mathbf{1}_n, \sigma^2
\mathrm{R}_{\kappa}(\phi) + \tau \mathrm{diag}(\lvert A_1 \rvert^{-1}, \ldots,
\lvert A_1 \rvert^{-1})).
$$
Then, if we assume $(Y^{\top}(A_1, \cdots, A_n), Y^{\top}(B_1, \cdots,
A_m)^{\top})$ to be jointly normal, we use can the conditional mean of
$Y^{\top}(B_1, \cdots, A_m)^{\top}$ given $Y^{\top}(A_1, \cdots, A_n)$ to
estimate the observed random variable in the partition $B_1, \ldots, B_m$.

---

Now, suppose the parameters $\boldsymbol{\theta} = (\mu, \sigma^2, \phi, \tau)$
are unknown. The Likelihood of $Y(A_1, \ldots, A_n)$ can still be computed.


In particular, if we use the parametrization $\alpha = \tau / \sigma^2$, we have
closed form for the Maximum Likelihood estimators both for $\mu$ and
$\sigma^2$. Thus, we can optimize the profile likelihood for $\phi$ and $\alpha$
numerically. Then, we resort on conditional Normal properties again to compute
the predictions in a new different set of regions.

## Areal Interpolation (AI)

Areal interpolation is a nonparametric approach that interpolates $Y(A_i)$'s to
construct $Y(B_j)$'s. Define an $m \times n$ matrix $\mathbf{W} = \{ w_{ij} \}$,
where $w_{ij}$ is the weight associated with the polygon $A_i$ in constructing
$Y(B_j)$.  The weights are
$w_{ij} = \lvert A_i \cap B_j \rvert / \lvert B_j \rvert$ [@goodchild1980areal;
@gotway2002combining].  The interpolation for
$\hat Y(B_1, \ldots, B_m)$ is constructed as
\begin{equation}
  \label{eq:np-est}
  \hat{Y}(B_1, \ldots, B_m) = \mathbf{W}
  Y(A_1, \ldots, A_n).
\end{equation}
The expectation and variance of the predictor are, respectively,
\[
  \mathrm{E}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W}
  \mathrm{E}[Y(A_1, \ldots, A_n)]
\]
and
\begin{equation}
  \label{eq:np-matcov}
  \textrm{Var}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W}
  \textrm{Var}[Y(A_1, \ldots, A_n)] \mathbf{W}^{\top}.
\end{equation}
In practice, the covariance matrix $\textrm{Var}[Y(A_1, \ldots, A_n)]$
is unknown and, consequently needs to be estimated.

The variance each predictor $\text{Var}[\hat Y(B_i)]$ is needed as an
uncertainty measure. It relies on both the variances of $Y(A_j)$'s and their
covariances:
\begin{align}
  \label{eq:np-single-var}
  \textrm{Var}[\hat{Y}(B_i)] 
  = \sum_{i = 1}^n w^2_{ij} \textrm{Var} \left [ Y(A_i) \right ] + 2 \sum_{l
  \neq i} w_{ij} w_{il} \textrm{Cov} \left[ Y(A_i), Y(A_l) \right].
\end{align}
The variances are often observed in survey data, but the covariances are
not. For practical purpose, we propose an approximation for
$\textrm{Cov}[ Y(A_i), Y(A_l)]$ based on Moran's I, a global spatial
autocorrelation. Specifically, let $\rho_I$ be the Moran's I calculated with a
weight matrix constructed with first-degree neighbors. That is, $\rho_I$ is the
average of the pairwise correlation for all neighboring pairs. For regions $A_i$
and $A_l$, if they are neighbors of each other, our approximation is
\begin{align}
  \label{eq:cova}
  \textrm{Cov} \left[ Y(A_i), Y(A_l) \right]
  = \rho_I \sqrt{\text{Var}[Y(A_i)]  \text{Var}[Y(A_l)]}.
\end{align}
The covariance between non-neighboring $A_i$ and $A_l$ is discarded. The final
uncertainty approximation for $\textrm{Var}[\hat{Y}(B_i)]$ will be an
underestimate. Alternatively, we can derive, at least, an upper bound for the
variance of the estimates by using a simple application from the
Cauchy--Schwartz inequality, in which case, $\rho_I$ is replaced with~1.

# Reference