Besag-York-Mollié (BYM) model

An introduction to BYM model in R with examples

Background

The BYM model, also known as the Besag-York-Mollié model, is a spatial statistical model used for analyzing disease mapping and other spatially structured data. Named after its developers Julian Besag, Jeremy York, and Annie Mollié, it was introduced in their 1991 paper titled “Bayesian Image Restoration, with Two Applications in Spatial Statistics.”

The BYM model is a hierarchical Bayesian model that incorporates both spatially structured and unstructured random effects to account for spatial autocorrelation and overdispersion in the data. The model can be represented as:

\[Y_i \sim Poisson(E_i * exp(α + U_i + V_i))\quad\quad (1)\]

where:

$Y_i$ is the observed number of cases in area i $E_i$ is the expected number of cases in area i α is the overall log-relative risk $U_i$ is the spatially unstructured random effect for area i $V_i$ is the spatially structured random effect for area i The spatially structured component ($V_i$) captures the spatial dependence between neighboring areas, while the unstructured component ($U_i$) accounts for any remaining variability not explained by the structured component.

It is a special case of the conditional autoregressive (CAR) model. In a CAR model, the response variable in each area is modeled as a linear combination of its neighboring areas, conditional on the values of the response variable in the other areas. This means that the spatial dependence is modeled indirectly through a conditional distribution, and the weights used to define the spatial relationship between areas are conditional on the values of the response variable.

Application

The BYM model is useful in various applications where spatially structured data is of interest. Here are a few real-world examples where the BYM model can be applied:

Disease mapping and epidemiology: The BYM model can be used to analyze the spatial distribution of diseases, identify areas with unusually high or low disease incidence, and estimate relative risks. This information can help public health authorities allocate resources and target interventions more effectively. For example, the BYM model can be applied to study the spatial distribution of cancer cases, malaria, or COVID-19.

Environmental risk assessment: The BYM model can be applied to assess environmental risks associated with air pollution, water contamination, or soil pollution. By understanding the spatial patterns of pollutants or hazardous substances, researchers can identify areas with increased risk and inform policy decisions aimed at reducing exposure.

Crime analysis: The BYM model can be used to study the spatial distribution of crime incidents and identify hotspots. This information can help law enforcement agencies allocate resources and develop targeted strategies to reduce crime in specific areas.

Biodiversity and species distribution: The BYM model can be applied to study the spatial distribution of different species and identify areas with high or low biodiversity. This can help guide conservation efforts, habitat restoration, and wildlife management strategies.

Advantages

Spatial autocorrelation: The BYM model accounts for spatial autocorrelation by incorporating a spatially structured random effect. This enables the model to capture spatial patterns in the data more accurately than non-spatial models.

Overdispersion: The BYM model includes an unstructured random effect to account for overdispersion in the data, which is common in count data, such as disease cases or crime incidents.

Hierarchical Bayesian framework: The BYM model is a hierarchical Bayesian model, which allows for the incorporation of prior information and provides a coherent framework for uncertainty quantification. This makes the model robust and interpretable.

Flexibility: The BYM model is flexible and can be extended to incorporate additional covariates, random effects, or more complex spatial structures. This makes it suitable for a wide range of applications and data types.

Implementation in R

To illustrate the use of the BYM model in R, we will use the INLA package, which provides an efficient approach to fitting spatial models using Integrated Nested Laplace Approximation (INLA). In this example, we’ll simulate some spatial data and fit the BYM model to it.

First, install and load the necessary packages:

install.packages("INLA", repos=c("https://inla.r-inla-download.org/R/stable", getOption("repos")))
install.packages("spdep")  # For spatial data manipulation
library(INLA)
library(spdep)

The following example will show how to create a synthetic spatial dataset, generate the BYM model components, fit the model, and visualize the results:

#------------------------------------------------------------------------------
#-
#-  North Carolina SIDS data
#-
#-    From Roger Bivand's web site  
#-
#-    https://r-spatial.github.io/spdep/articles/sids.html
#-
#------------------------------------------------------------------------------
library( spdep )
library( tmap )
nc = st_read( system.file("shapes/sids.shp" , package="spData" )[1] , quiet=TRUE )
st_crs(nc) = "+proj=longlat +datum=NAD27"
row.names(nc) = as.character(nc$FIPSNO)
class( nc )
nc$SIDSrate79 = nc$SID79 / nc$BIR79


tm_shape( nc ) +
  tm_borders( lwd=2 ) +
  tm_fill( col="SIDSrate79" )

nb = poly2nb( nc )
nb2INLA( "nc.adj" , nb )
g = inla.read.graph( filename="nc.adj" )
hyperPriors = list( prec.unstruct = list(prior="loggamma",param=c(1,0.01)) ,
                    prec.spatial  = list(prior="loggamma",param=c(1,0.01))  ) 
nc$NWBIRrate79 = nc$NWBIR79 / nc$BIR79

nc$ID = 1:nrow(nc)
formula =  SID79 ~ NWBIRrate79 + f( ID , model="bym" , graph=g , 
                                    hyper = hyperPriors )

mod = inla( formula , family="poisson" , data=nc , offset=log(BIR79) )
summary( mod )

mod$summary.fixed$mean[1]
mod$summary.fixed$mean[2]

Figure 1: SIDS Rate 79