getwd()
[1] "D:/Github/stratobayes.github.io/tutorial-1"
For a basic introduction to StratoBayes, see the StratoBayes vignette. For more information, or to install the software, please visit https://stratobayes.github.io/.
Resources:
- 📥 Download tutorial as ZIP
- 📄 View.qmd
on GitHub
- 📦 Viewdata
on GitHub
This tutorial is to take stratigraphic data from two sections and shift and stretch or squeeze one section relative to the other, to align them:
In the figure above, the depths associated with the sulphur isotope data from a North Sea well (blue) have been transformed to match the data from the Staithes well (red).
Below is a step-by-step guide to correlate the two wells using the StratoBayes
R package, starting with an introduction to the data sets involved.
The Permian-Triassic strata of the United Kingdom are difficult to date due to a scarcity of biostratigraphic constraints. Salisbury et al. (2023) have used sulphur isotope ratios from evaporites (\(\delta^{34}S_{evap}\)) to correlate a a well from the southern North Sea Basin (42/28-2) with a drillcore from Staithes (Cleveland Basin, Yorkshire, UK). The \(\delta^{34}S_{evap}\) of evaporites is assumed to reflect the ceoval isotopic composition of seawater.
The authors correlated the two curves by visually matching several peaks and troughs of the \(\delta^{34}S_{evap}\) curves:
We will now try to recreate that correlation using StratoBayes.
📥 Download the tutorial as ZIP and unpack or extract it. Open the StratoBayes-Tutorial-1.qmd
file to run the code chunks in this tutorial.
We have downloaded the data from the North Sea well from the Supplementary Materials of Salisbury et al. (2023), and received the data from Staithes from the authors of Salisbury et al. (2022). These data were saved as .xlsx files. To facilitate reading the data in R, we copied the relevant data (depth and \(\delta^{34}S_{evap}\) measurements) into new Excel spreadsheets with clean column names, and saved them as .csv files.
For reading the files, we first check what our working directory is:
getwd()
[1] "D:/Github/stratobayes.github.io/tutorial-1"
If you are running the code chunks in the StratoBayes1_Correlation.qmd
file, your working directory will be wherever that file is saved. Otherwise, if you are running the code from a regular R script, you can set your desired working directory using setwd
.
Assuming the data sets are in a folder named “data” in your working directory, we can start reading the data with R:
# set random seed to ensure consistency of results
set.seed(1)
# read the .csv files
<- read.csv("data/North_Sea-d34S.csv")
north_sea <- read.csv("data/Staithes-S20-d34S.csv")
staithes # display the beginning of the data.frames
head(north_sea)
depth d34S
1 4960 12.2
2 5000 14.0
3 5100 13.6
4 5200 14.7
5 5440 15.8
6 5500 15.8
head(staithes)
depth d34S
1 1280.17 15.86
2 1280.67 16.13
3 1300.00 18.44
4 1300.17 15.72
5 1303.00 17.05
6 1303.92 15.60
The StratoBayes package expects all chemostratigraphic data to be in a single spreadsheet or data.frame. We can achieve this by combining the two data.frames:
# add site identifiers:
$site <- "North Sea"
north_sea$site <- "Staithes"
staithes# combine data.frames
<- rbind(north_sea, staithes)
dataset # transform depth measurements from feet to m
$depth <- dataset$depth * 0.3048
dataset# check n data points per site:
table(dataset$site)
North Sea Staithes
47 364
Now we can process this data with the StratoBayes package. First, we need to load the package:
# load the StratoBayes library
library("StratoBayes")
We start by reading it with the StratData()
function. We specify “d34S” as the name of the column holding our stratigraphic signal. As the Staithes section has a lot more data than the North Sea section, we specify “Staithes” as our reference site. Because the data are organized along depth – representing the vertical (or “z”) stratigraphic dimension – and this information is stored in the “depth” column, we set both zScale
and zColumn
to “depth”.
# read data with the StratData function
<- StratData(signal = dataset, signalColumn = "d34S",
stratDat referenceSite = "Staithes",
zScale = "depth", zColumn = "depth")
# check the class of this object
class(stratDat)
[1] "StratData" "list"
stratDat
is now an object of class “StratData”. The print()
function will recognise this and print some information on the dataset:
# this is equivalent to print(stratDat) or print.StratData(stratDat)
stratDat
Stratigraphic data comprising 1 signal from 2 sites
The plot()
function visualises the signal data from both sites:
# visualise stratDat using plot.StratData(stratDat)
plot(stratDat)
Notice that depths have been internally multiplied by -1 to allow for calculations as if measurements were on the height scale, and for easier plotting.
We can now think about how we want to align the two sections. A simple option would be to shift the North Sea section and apply a “stretch” factor that compresses the section to match the Staithes section. The Staithes section is thus our reference section and remains unchanged. We need a single “stretch” factor or sedimentation rate (\(\gamma\)) for the aligned section, as well as an offset or shift that corresponds to the depth in the reference section to which the bottom of the aligned section will be shifted (\(\alpha\)).
Calculating the depth at the reference site (Staithes) that corresponds to a depth at the North Sea site is done as follows: \[depth_{Staithes} = \alpha + \gamma * \Delta_{North Sea}~,\]
where \(\Delta_{North Sea}\) is the distance from the bottom of the North Sea section to the depth of interest.
Note that depths are internally multiplied by -1 to allow for calculations as if measurements were on the height scale, and for easier plotting.
How do we find the \(\alpha\) and \(\gamma\) that lead to the best alignment?
StratoBayes estimates those parameters in the Bayesian framework by
placing priors on \(\alpha\) and \(\gamma\)
defining a likelihood based on the deviations of the sulphur isotope data from a cubic spline fitted to the (depth-shifted) sulphur isotope data from both sites
running a Markov-chain Monte Carlo (MCMC) simulation to find the posterior distribution of \(\alpha\) and \(\gamma\)
We will need to place priors on the shift and the stretch factor. To find out what the priors need to look like, we use the StratModelTemplate()
function. We specify that the sections will be aligned on a height or depth scale (alignmentScale = “height”), rather than on an age scale. We further specify that our sedimentation rate model has one rate per site (sedModel = “site”).
# get a template for the priors
StratModelTemplate(stratData = stratDat, alignmentScale = "height", sedModel = "site")
priors <- structure(list(
"alpha_North Sea" = UniformPrior(min = , max = ),
"gammaLog_North Sea" = NormalPrior(mean = , sd = )),
class = c("StratPrior", "list"))
model <- StratModel(stratData = stratDat,
priors = priors,
alignmentScale = "height",
sedModel = "site",
alphaPosition = "middle",
nKnots = 25)
result <- RunStratModel(stratObject = stratDat,
stratModel = model,
nRun = 1,
nIter = 1000)
`stratData` consists of depth-scale measurements. To facilitate data processing, depths have been converted to heights by multiplying them by `-1`. Please use negative depths (i.e. heights) to specify the priors on `alpha` parameters.
The template suggests using a uniform prior for the shift \(\alpha\), and to use a normal prior for the stretch factor on the log scale, \(\log \gamma\). The log scale is used for \(\gamma\) as this ensures that ratios are treated symmetrically, and that \(\gamma\) cannot be negative.
Because we are working on the depth scale, we need to specify priors on \(\alpha\) on the negative depth scale.
To allow the possibility for partial overlap with the Staithes section, we can for example allow \(\alpha\) to range from -2000 to -500. For the relative sedimentation rate \(\gamma\), we might initially assume that it may be around 1, which would imply equivalent sedimentation rates in the Staithes and the North Sea section. We use a mean of \(\log(1) = 0\), and a standard deviation of \(1/2\).
# fill the prior template
<- structure(list(
priors `alpha_North Sea` = UniformPrior(min = -2000, max = -500),
`gammaLog_North Sea` = NormalPrior(mean = 0, sd = 0.5)),
class = c("list", "StratPrior"))
To run the MCMC to estimate the posterior of the model parameters, we need to pass stratDat
and the priors
to the RunStratModel()
function. We also need to specify that the model should be run on the “height” scale (as opposed to the “age” scale), and that our sedimentation rate model equals one rate per “site”. We run a single model run (nRun = 1
) for 4000 iterations (nIter = 4000
).
# run model and save results to an R object named "result"
<- RunStratModel(stratObject = stratDat, alignmentScale = "height",
result sedModel = "site", priors = priors,
nRun = 1, nIter = 4000)
We can visualise the correlation resulting from the model run with the plot
function. Again, the function will recognise that result
is an object of class “StratPosterior” and visualise it accordingly. We can use standard plot arguments such as “xlab” to modify the resulting figure.
plot(result, xlab = expression(delta ^ 34 * S))
Per default, plot()
shows the data from the reference site (Staithes) along with the shifted section (North Sea) on the depth scale of the reference section. The heights of the data points from the North Sea curve are drawn using the iteration that is closest to median reference section depths (\(50^{th}\) percentile) of a subset of reference depths (see Eichenseer et al. in review), calculated from the posterior draws corresponding to the most likely alignment (alignment 1/2). In this case, the result suggests more than one (i.e. two) possible alignments. The different, discrete alignments are determined by performing a clustering analysis on the posterior samples of the model parameters, \(\alpha_{North Sea}\) and \(\log \gamma_{North Sea}\) in our case. Per default, the first half of all samples (iterations of the MCMC) are discarded as burn-in, and are not included in the clustering analysis or in the display of the results.
Let us visualise the other possible alignments alongside the first:
par(mar = c(4, 4, 1, 1), mfrow = c(1, 2))
plot(result, xlab = expression(delta^34*S), overridePar = FALSE, separateSites = F)
plot(result, xlab = expression(delta^34*S), overridePar = FALSE, alignment = 2, separateSites = F)
The two alignments don’t look very different – the major peak at -600 m is always aligned.
To understand why the results suggest these different alignments, we can inspect the parameter estimates. Visualising the posterior samples in a histogram works with the hist()
function. We want to show both the \(\alpha_{North Sea}\) and \(\log \gamma_{North Sea}\) parameters, and colour the parameter values by the alignment they have been classified in the cluster analysis:
par(mar = c(4, 4, 0.5, 0.5))
hist(result, parameters = c(1, 2), colourBy = "alignment", prior = F)
Now it is more clear why the results suggest two distinct alignments. The clustering algorithm has cut the left tail of the distributions, and a few posterior samples have not been assigned to any cluster (“alignment 0”).
We can also visualise the parameters in 2D:
ScatterPlot(result, colourBy = "alignment")
If we are unhappy with the clustering analysis, we could try to change it. For example, we could use the hierarchical dbscan (“hdbscan”) method instead of the default protoclust (“proto”) method, using the Cluster()
function:
<- Cluster(result, clusterMethod = "hdbscan", minPts = 10)
newClust ScatterPlot(result, colourBy = "alignment", stratCluster = newClust)
This looks like a more natural separation into two clusters.
An important step in evaluating the model results is to check whether the chain(s) of our Markov chain Monte Carlo simulation have converged. Trace plots, in which the posterior samples of model parameters are shown in the order in which they were obtained, are a great tool for that:
par(mar = c(4, 4, 0.5, 0.5))
TracePlot(result, parameters = c(1, 2))
Here we can see that the chain looks like it may have converged (the posterior samples don’t seem to shift into new, unexplored areas over time). There still seems to be a lot of autocorrelation in the chain, and if we had let the model run for longer, we would probably get somewhat different estimates for the model parameters and the probabilities of different alignments.
Another tool to assess convergence is to to look at the posterior probabilities associated with each of the samples obtained from the MCMC:
TracePlot(result, parameters = "posterior")
The log posterior being relatively stable is a good sign. If it would increase with increasing iteration number, this would be a tell that the chain hasn’t converged yet.
Ideally, more than one independent model run is conducted. If they all give a similar answer, that is a good sign.
To get the depth in the reference section (Staithes) that correspond to the shifted depths of the aligned North Sea section, the StratMap()
function can be used. For example, the depths at the Staithes site that correspond to \(-1900 \, \text{m}\) and \(-2400 \, \text{m}\) at the North Sea site (site = 2
) can be computed as follows:
StratMap(result, heights = c(-1900, -2400), site = 2, alignment = "all")
height mean sd 2.5% 50% 97.5%
1 -1900 -636.8809 2.080428 -640.8139 -636.8918 -633.0691
2 -2400 -956.8522 6.191525 -969.7813 -955.6859 -948.1615
A depth of \(-1900 \, \text{m}\) from the North Sea well corresponds to a mean depth of \(-637 \, \text{m}\) in the Staithes well. Note that this depth comes with uncertainty, in this case, the 95% credible interval, spanned by the \(0.025\) and the \(0.975\) quantiles ranges from \(-640.8 \, \text{ to } -633.1 \, \text{m}\). This is quite a low uncertainty, as the the prominent \(\delta^{34}S_{evap}\) peak allows for precise alignment.
The uncertainty around the reference depth corresponding to \(-2400 \, \text{m}\) in the North Sea well is much higher because each of the two distinct alignments shown earlier results in a different reference depth for the lower part of the section, and there is considerable variation also within the alignment clusters.
If we want to just use one alignment to compute the reference depth and uncertainty, we can specify for example alignment = 1
to use alignment 1:
StratMap(result, heights = c(-1900, -2400), site = 2, alignment = 1)
height mean sd 2.5% 50% 97.5%
1 -1900 -636.9152 2.101876 -640.8139 -636.9495 -633.0691
2 -2400 -955.4025 3.829975 -963.2279 -955.1406 -948.0990
Here, the uncertainty of the reference depth corresponding to \(-2400 \, \text{m}\) is much lower.
We can visualise the reference depths corresponding to North Sea depths by using the StratMapPlot()
function:
StratMapPlot(result, alignment = "all")