Using StratoBayes with real-world data

For a basic introduction to StratoBayes, see the StratoBayes vignette. For more information, or to install the software, please visit https://stratobayes.github.io/.
Resources:
- 📥 Download tutorial as ZIP
- 📄 View .qmd on GitHub
- 📦 View data on GitHub

This tutorial is to take stratigraphic data from two sections and shift and stretch or squeeze one section relative to the other, to align them:

Figure 1: Sulphur isotope data from two sites on their original depth scale.

Figure 2: North Sea data has been shifted and squeezed to align with the Staithes data.

In the figure above, the depths associated with the sulphur isotope data from a North Sea well (blue) have been transformed to match the data from the Staithes well (red).

Below is a step-by-step guide to correlate the two wells using the StratoBayes R package, starting with an introduction to the data sets involved.

Permian – Triassic sulphur isotope data

The Permian-Triassic strata of the United Kingdom are difficult to date due to a scarcity of biostratigraphic constraints. Salisbury et al. (2023) have used sulphur isotope ratios from evaporites (\(\delta^{34}S_{evap}\)) to correlate a a well from the southern North Sea Basin (42/28-2) with a drillcore from Staithes (Cleveland Basin, Yorkshire, UK). The \(\delta^{34}S_{evap}\) of evaporites is assumed to reflect the ceoval isotopic composition of seawater.

Figure 3: Palaeogeographic context of the two wells, Salisbury et al. (2023)

The authors correlated the two curves by visually matching several peaks and troughs of the \(\delta^{34}S_{evap}\) curves:

Figure 4: Correlation of \(\delta^{34}S_{evap}\) from the two wells, Salisbury et al. (2023)

We will now try to recreate that correlation using StratoBayes.

Preparing the data

📥 Download the tutorial as ZIP and unpack or extract it. Open the StratoBayes-Tutorial-1.qmd file to run the code chunks in this tutorial.

We have downloaded the data from the North Sea well from the Supplementary Materials of Salisbury et al. (2023), and received the data from Staithes from the authors of Salisbury et al. (2022). These data were saved as .xlsx files. To facilitate reading the data in R, we copied the relevant data (depth and \(\delta^{34}S_{evap}\) measurements) into new Excel spreadsheets with clean column names, and saved them as .csv files.

For reading the files, we first check what our working directory is:

getwd()

[1] "D:/Github/stratobayes.github.io/tutorial-1"

If you are running the code chunks in the StratoBayes1_Correlation.qmd file, your working directory will be wherever that file is saved. Otherwise, if you are running the code from a regular R script, you can set your desired working directory using setwd.

Assuming the data sets are in a folder named “data” in your working directory, we can start reading the data with R:

# set random seed to ensure consistency of results
set.seed(1)
# read the .csv files
north_sea <- read.csv("data/North_Sea-d34S.csv")
staithes <- read.csv("data/Staithes-S20-d34S.csv")
# display the beginning of the data.frames
head(north_sea)

  depth d34S
1  4960 12.2
2  5000 14.0
3  5100 13.6
4  5200 14.7
5  5440 15.8
6  5500 15.8

head(staithes)

    depth  d34S
1 1280.17 15.86
2 1280.67 16.13
3 1300.00 18.44
4 1300.17 15.72
5 1303.00 17.05
6 1303.92 15.60

The StratoBayes package expects all chemostratigraphic data to be in a single spreadsheet or data.frame. We can achieve this by combining the two data.frames:

# add site identifiers:
north_sea$site <- "North Sea"
staithes$site <- "Staithes"
# combine data.frames
dataset <- rbind(north_sea, staithes)
# transform depth measurements from feet to m
dataset$depth <- dataset$depth * 0.3048
# check n data points per site:
table(dataset$site)


North Sea  Staithes 
       47       364

The StratoBayes workflow

Reading data with StratoBayes

Now we can process this data with the StratoBayes package. First, we need to load the package:

# load the StratoBayes library
library("StratoBayes")

We start by reading it with the StratData() function. We specify “d34S” as the name of the column holding our stratigraphic signal. As the Staithes section has a lot more data than the North Sea section, we specify “Staithes” as our reference site. Because the data are organized along depth – representing the vertical (or “z”) stratigraphic dimension – and this information is stored in the “depth” column, we set both zScale and zColumn to “depth”.

# read data with the StratData function
stratDat <- StratData(signal = dataset, signalColumn = "d34S", 
                      referenceSite = "Staithes", 
                      zScale = "depth", zColumn = "depth")
# check the class of this object
class(stratDat)

[1] "StratData" "list"

stratDat is now an object of class “StratData”. The print() function will recognise this and print some information on the dataset:

# this is equivalent to print(stratDat) or print.StratData(stratDat)
stratDat

Stratigraphic data comprising 1 signal from 2 sites

The plot() function visualises the signal data from both sites:

# visualise stratDat using plot.StratData(stratDat)
plot(stratDat)

Figure 5: Sulphur isotope data from the Staithes and the North Sea well

Notice that depths have been internally multiplied by -1 to allow for calculations as if measurements were on the height scale, and for easier plotting.

Building a stratigraphic model

We can now think about how we want to align the two sections. A simple option would be to shift the North Sea section and apply a “stretch” factor that compresses the section to match the Staithes section. The Staithes section is thus our reference section and remains unchanged. We need a single “stretch” factor or sedimentation rate (\(\gamma\)) for the aligned section, as well as an offset or shift that corresponds to the depth in the reference section to which the bottom of the aligned section will be shifted (\(\alpha\)).

Calculating the depth at the reference site (Staithes) that corresponds to a depth at the North Sea site is done as follows: \[depth_{Staithes} = \alpha + \gamma * \Delta_{North Sea}~,\]
where \(\Delta_{North Sea}\) is the distance from the bottom of the North Sea section to the depth of interest.

Note that depths are internally multiplied by -1 to allow for calculations as if measurements were on the height scale, and for easier plotting.

The Bayesian framework

How do we find the \(\alpha\) and \(\gamma\) that lead to the best alignment?

StratoBayes estimates those parameters in the Bayesian framework by

placing priors on \(\alpha\) and \(\gamma\)
defining a likelihood based on the deviations of the sulphur isotope data from a cubic spline fitted to the (depth-shifted) sulphur isotope data from both sites
running a Markov-chain Monte Carlo (MCMC) simulation to find the posterior distribution of \(\alpha\) and \(\gamma\)

Priors

We will need to place priors on the shift and the stretch factor. To find out what the priors need to look like, we use the StratModelTemplate() function. We specify that the sections will be aligned on a height or depth scale (alignmentScale = “height”), rather than on an age scale. We further specify that our sedimentation rate model has one rate per site (sedModel = “site”).

# get a template for the priors
StratModelTemplate(stratData = stratDat, alignmentScale = "height", sedModel = "site")


priors <- structure(list(
  "alpha_North Sea" = UniformPrior(min = , max = ),
  "gammaLog_North Sea" = NormalPrior(mean = , sd = )),
  class = c("StratPrior", "list"))

model <- StratModel(stratData = stratDat,
                    priors = priors,
                    alignmentScale = "height",
                    sedModel = "site",
                    alphaPosition = "middle",
                    nKnots = 25)

result <- RunStratModel(stratObject = stratDat,
                        stratModel = model,
                        nRun = 1,
                        nIter = 1000)

`stratData` consists of depth-scale measurements. To facilitate data processing, depths have been converted to heights by multiplying them by `-1`. Please use negative depths (i.e. heights) to specify the priors on `alpha` parameters.

The template suggests using a uniform prior for the shift \(\alpha\), and to use a normal prior for the stretch factor on the log scale, \(\log \gamma\). The log scale is used for \(\gamma\) as this ensures that ratios are treated symmetrically, and that \(\gamma\) cannot be negative.

Because we are working on the depth scale, we need to specify priors on \(\alpha\) on the negative depth scale.

To allow the possibility for partial overlap with the Staithes section, we can for example allow \(\alpha\) to range from -2000 to -500. For the relative sedimentation rate \(\gamma\), we might initially assume that it may be around 1, which would imply equivalent sedimentation rates in the Staithes and the North Sea section. We use a mean of \(\log(1) = 0\), and a standard deviation of \(1/2\).

# fill the prior template
priors <- structure(list(
  `alpha_North Sea` = UniformPrior(min = -2000, max = -500),
  `gammaLog_North Sea` = NormalPrior(mean = 0, sd = 0.5)),
class = c("list", "StratPrior"))

Running the model

To run the MCMC to estimate the posterior of the model parameters, we need to pass stratDat and the priors to the RunStratModel() function. We also need to specify that the model should be run on the “height” scale (as opposed to the “age” scale), and that our sedimentation rate model equals one rate per “site”. We run a single model run (nRun = 1) for 4000 iterations (nIter = 4000).

# run model and save results to an R object named "result"
result <- RunStratModel(stratObject = stratDat, alignmentScale = "height", 
                        sedModel = "site", priors = priors,  
                        nRun = 1, nIter = 4000)

Processing the model output

Plotting an alignment

We can visualise the correlation resulting from the model run with the plot function. Again, the function will recognise that result is an object of class “StratPosterior” and visualise it accordingly. We can use standard plot arguments such as “xlab” to modify the resulting figure.

plot(result, xlab = expression(delta ^ 34 * S))

Figure 6: Most likely alignment estimated by the model run. The North Sea data are plotted at the median reference section depths.

Per default, plot() shows the data from the reference site (Staithes) along with the shifted section (North Sea) on the depth scale of the reference section. The heights of the data points from the North Sea curve are drawn using the iteration that is closest to median reference section depths (\(50^{th}\) percentile) of a subset of reference depths (see Eichenseer et al. in review), calculated from the posterior draws corresponding to the most likely alignment (alignment 1/2). In this case, the result suggests more than one (i.e. two) possible alignments. The different, discrete alignments are determined by performing a clustering analysis on the posterior samples of the model parameters, \(\alpha_{North Sea}\) and \(\log \gamma_{North Sea}\) in our case. Per default, the first half of all samples (iterations of the MCMC) are discarded as burn-in, and are not included in the clustering analysis or in the display of the results.

Let us visualise the other possible alignments alongside the first:

par(mar = c(4, 4, 1, 1), mfrow = c(1, 2))
plot(result, xlab = expression(delta^34*S), overridePar = FALSE, separateSites = F)
plot(result, xlab = expression(delta^34*S), overridePar = FALSE, alignment = 2, separateSites = F)

Figure 7: Three distinct alignments found in the posterior of the model run.

The two alignments don’t look very different – the major peak at -600 m is always aligned.

The parameter estimates

To understand why the results suggest these different alignments, we can inspect the parameter estimates. Visualising the posterior samples in a histogram works with the hist() function. We want to show both the \(\alpha_{North Sea}\) and \(\log \gamma_{North Sea}\) parameters, and colour the parameter values by the alignment they have been classified in the cluster analysis:

par(mar = c(4, 4, 0.5, 0.5))
hist(result, parameters = c(1, 2), colourBy = "alignment", prior = F)

Figure 8: Histograms of the posterior samples of the alpha and gamma parameter.

Now it is more clear why the results suggest two distinct alignments. The clustering algorithm has cut the left tail of the distributions, and a few posterior samples have not been assigned to any cluster (“alignment 0”).

We can also visualise the parameters in 2D:

ScatterPlot(result, colourBy = "alignment")

Figure 9: Cross plot of the posterior samples of the alpha and gamma parameter.

If we are unhappy with the clustering analysis, we could try to change it. For example, we could use the hierarchical dbscan (“hdbscan”) method instead of the default protoclust (“proto”) method, using the Cluster() function:

newClust <- Cluster(result, clusterMethod = "hdbscan", minPts = 10)
ScatterPlot(result, colourBy = "alignment", stratCluster = newClust)

Figure 10: Cross plot of the posterior samples of the alpha and gamma parameter, using hdbscan clustering

This looks like a more natural separation into two clusters.

Assessing convergence

An important step in evaluating the model results is to check whether the chain(s) of our Markov chain Monte Carlo simulation have converged. Trace plots, in which the posterior samples of model parameters are shown in the order in which they were obtained, are a great tool for that:

par(mar = c(4, 4, 0.5, 0.5))
TracePlot(result, parameters = c(1, 2))

Figure 11: Trace plot showing the posterior samples of the alpha and gamma parameter in the sequence in which they were obtained during the MCMC (after burn-in).

Here we can see that the chain looks like it may have converged (the posterior samples don’t seem to shift into new, unexplored areas over time). There still seems to be a lot of autocorrelation in the chain, and if we had let the model run for longer, we would probably get somewhat different estimates for the model parameters and the probabilities of different alignments.

Another tool to assess convergence is to to look at the posterior probabilities associated with each of the samples obtained from the MCMC:

TracePlot(result, parameters = "posterior")

Figure 12: Evolution of the log posterior density during the MCMC (after burn-in).

The log posterior being relatively stable is a good sign. If it would increase with increasing iteration number, this would be a tell that the chain hasn’t converged yet.

Ideally, more than one independent model run is conducted. If they all give a similar answer, that is a good sign.

Depths on reference scale

To get the depth in the reference section (Staithes) that correspond to the shifted depths of the aligned North Sea section, the StratMap() function can be used. For example, the depths at the Staithes site that correspond to \(-1900 \, \text{m}\) and \(-2400 \, \text{m}\) at the North Sea site (site = 2) can be computed as follows:

StratMap(result, heights = c(-1900, -2400), site = 2, alignment = "all")

  height      mean       sd      2.5%       50%     97.5%
1  -1900 -636.8809 2.080428 -640.8139 -636.8918 -633.0691
2  -2400 -956.8522 6.191525 -969.7813 -955.6859 -948.1615

A depth of \(-1900 \, \text{m}\) from the North Sea well corresponds to a mean depth of \(-637 \, \text{m}\) in the Staithes well. Note that this depth comes with uncertainty, in this case, the 95% credible interval, spanned by the \(0.025\) and the \(0.975\) quantiles ranges from \(-640.8 \, \text{ to } -633.1 \, \text{m}\). This is quite a low uncertainty, as the the prominent \(\delta^{34}S_{evap}\) peak allows for precise alignment.

The uncertainty around the reference depth corresponding to \(-2400 \, \text{m}\) in the North Sea well is much higher because each of the two distinct alignments shown earlier results in a different reference depth for the lower part of the section, and there is considerable variation also within the alignment clusters.

If we want to just use one alignment to compute the reference depth and uncertainty, we can specify for example alignment = 1 to use alignment 1:

StratMap(result, heights = c(-1900, -2400), site = 2, alignment = 1)

  height      mean       sd      2.5%       50%     97.5%
1  -1900 -636.9152 2.101876 -640.8139 -636.9495 -633.0691
2  -2400 -955.4025 3.829975 -963.2279 -955.1406 -948.0990

Here, the uncertainty of the reference depth corresponding to \(-2400 \, \text{m}\) is much lower.

We can visualise the reference depths corresponding to North Sea depths by using the StratMapPlot() function:

StratMapPlot(result, alignment = "all")

Figure 13: Median depths (line) with 95% credible intervals (shading) in the reference section (Staithes) corresponding to depths in the North Sea section, using all alignments from the posterior (after burn-in).