Here is a vignette to help you select a list of disparity metrics
tailored to your own dataset and your own biological question. We will
be doing that using the dispRity
package Guillerme (2018). You can find the slides
introducing this vignette here.
And a recording of the whole workshop here.
## Installing the package
install.packages("dispRity")
## Loading the package
library(dispRity)
You feel uncomfortable with
R
? You can also follow this tutorial using themoms
graphical interface.
Note that for the purpose of the exercise step 1 and 2 are reverted here. I would argue that having a biological question should be the first step but in practice, it’s also often totally fine to start with the dataset and then figure out which question would be cool to study.
For the purpose of this dataset, we will use one of the demo dataset
from the dispRity
package, but feel free to choose one of
the other demo dataset or, even better, use your own dataset!
## Loading the demo datasets
data(demo_data)
Here is a summary of the demo datasets you can choose from:
study | field | taxonomic group | traits | trait space | size | groups | question |
---|---|---|---|---|---|---|---|
Beck and Lee (2014) | Palaeontology | Mammalia | discrete morphological phylogenetic data | Ordination of a distance matrix (PCO) | 106x105 | 52 crown vs. 54 stem | Are living mammals and their ancestors more disparate than their stem mammals? |
Wright (2017) | Palaeontology | Crinoidea | discrete morphological phylogenetic data | Ordination of a distance matrix (PCO) | 42x41 | 16 before vs. 23 after | Is there an effect of the mass Ordovician mass extinction on crinoids disparity? |
Marcy et al. (2016) | Evolution | Rodentia | skull 2D landmark coordinates | Ordination of a Procrustes Superimposition (PCA) | 454x134 | 225 Megascapheus vs. 229 Thomomys | Is there convergence in skull shape between these two genera of gophers? |
Hopkins and Pearson (2016) | Evolution | Trilobita | 3D landmark coordinates | Ordination of a Procrustes Superimposition (PCA) | 46x46 | 36 adults vs. 10 juveniles | How are trilobites growing? |
Jones et al. (2015) | Ecology | Plantae | Communities species compositions | Ordination of a Jaccard distance matrix (PCO) | 48x47 | 24 aspens vs. 24 grasslands | Are aspens and grasslands dispersing differently? |
Healy et al. (2019) | Ecology | Animalia | Life history traits | Ordination of continuous traits (PCA) | 285*6 | 83 ecthotherms vs. 202 endotherms | Do endotherms have more diversified life history strategies than ectotherms? |
For this example I will be using the dataset from Beck and Lee (2014).
## Selecting a dataset
my_data <- demo_data$beck
If you want to use your own data, you’ll have to make it into a
dispRity
object to follow this tutorial easily. This can be easily done usingcustom.subsets
orchrono.subsets
depending if you want to group your data by a certain variable or through time. You can refer to the use of each function (and all other ones) using?custom.subsets
or?chrono.subsets
.
In
moms
select in the “Select the type of space to use:” menu (top left), choose “Demo” and select your favourite dataset in the “Select a demo matrix” list. You can also input your own matrix using choosing “Input” in the “Select the type of space to use:” menu (top left) and then choose a file.
The rest of the choice of metrics will depend on the properties of your dataset. Although there is advanced ways to correctly measure properties of multidimensional dasets (homscedasticity, normality, etc.), visualising can already tell a lot.
## Visualising the two first dimensions
plot(my_data)
Note however that the dataset has 103 other dimensions so a 2 visualisation is often only superficial.
With the dataset in mind, we can then identify the mechanism and the process at hand (or the other way around). In the chosen example, we can ask the question “Do modern mammals (crown) evolve more disparate body shapes than archaic ones?”. The mechanism here would be simply be evolution and the process here will be the age of the group. In other words, is there does “evolution has an effect (e.g. a different outcome) on the age of the group?”.
We can then choose a pattern: the disparity metric, function diversity metric, dissimilarity metric, space occupancy metric, etc.
In terms of disparity here, we might be interested in two aspects: the diversity of body shapes can be expressed as changes in:
In a very contrasting scenario, we’d have a group that has a big size and high density against one with a small size and low density.
There is no miracle recipe for making a list of metrics. One easy way is to first look at what has been done before (e.g. what are they using in your favourite paper). For example, we have tested and played around with a diversity of metrics in Guillerme et al. (2020) and Guillerme et al. (2024) (TL:DR; for both papers: different things measure things differently).
Note here that the sum and products pairs for the ranges and variances are effectively measuring either the “perimeter” or the “surface/volume” in n dimensions.
dtt
)? The
average pairwise distance - but squared?Alternatively, you can design your very own metric! We’ll see a how to example later on.
Once we have our list, we can test it using the dispRity
package.
To test the metric, it’s relatively easy, you can just use the
test.metric
function. This function will gradually
transform your trait space space following one of the implemented
algorithm and show how your metric changes in response to the changes in
trait space.
The different transformations (called “shifts”) that are currently implemented are:
"random"
: just randomly removing data:"size"
: removing data from the edges of the trait
space:"density"
: removing data with the bigger nearest
neigbhour distances:"evenness"
: pseudo-randomly removing data in
proportions with higher density (“flattening the curve”)