Here is a vignette to help you select a list of disparity metrics tailored to your own dataset and your own biological question. We will be doing that using the dispRity package Guillerme (2018). You can find the slides introducing this vignette here. And a recording of the whole workshop here.

## Installing the package
install.packages("dispRity")
## Loading the package
library(dispRity)

You feel uncomfortable with R? You can also follow this tutorial using the moms graphical interface.

2 - Select a dataset

Note that for the purpose of the exercise step 1 and 2 are reverted here. I would argue that having a biological question should be the first step but in practice, it’s also often totally fine to start with the dataset and then figure out which question would be cool to study.

For the purpose of this dataset, we will use one of the demo dataset from the dispRity package, but feel free to choose one of the other demo dataset or, even better, use your own dataset!

## Loading the demo datasets
data(demo_data)

Here is a summary of the demo datasets you can choose from:

study field taxonomic group traits trait space size groups question
Beck and Lee (2014) Palaeontology Mammalia discrete morphological phylogenetic data Ordination of a distance matrix (PCO) 106x105 52 crown vs. 54 stem Are living mammals and their ancestors more disparate than their stem mammals?
Wright (2017) Palaeontology Crinoidea discrete morphological phylogenetic data Ordination of a distance matrix (PCO) 42x41 16 before vs. 23 after Is there an effect of the mass Ordovician mass extinction on crinoids disparity?
Marcy et al. (2016) Evolution Rodentia skull 2D landmark coordinates Ordination of a Procrustes Superimposition (PCA) 454x134 225 Megascapheus vs. 229 Thomomys Is there convergence in skull shape between these two genera of gophers?
Hopkins and Pearson (2016) Evolution Trilobita 3D landmark coordinates Ordination of a Procrustes Superimposition (PCA) 46x46 36 adults vs. 10 juveniles How are trilobites growing?
Jones et al. (2015) Ecology Plantae Communities species compositions Ordination of a Jaccard distance matrix (PCO) 48x47 24 aspens vs. 24 grasslands Are aspens and grasslands dispersing differently?
Healy et al. (2019) Ecology Animalia Life history traits Ordination of continuous traits (PCA) 285*6 83 ecthotherms vs. 202 endotherms Do endotherms have more diversified life history strategies than ectotherms?

For this example I will be using the dataset from Beck and Lee (2014).

## Selecting a dataset
my_data <- demo_data$beck

If you want to use your own data, you’ll have to make it into a dispRity object to follow this tutorial easily. This can be easily done using custom.subsets or chrono.subsets depending if you want to group your data by a certain variable or through time. You can refer to the use of each function (and all other ones) using ?custom.subsets or ?chrono.subsets.

In moms select in the “Select the type of space to use:” menu (top left), choose “Demo” and select your favourite dataset in the “Select a demo matrix” list. You can also input your own matrix using choosing “Input” in the “Select the type of space to use:” menu (top left) and then choose a file.

Have a look at your dataset

The rest of the choice of metrics will depend on the properties of your dataset. Although there is advanced ways to correctly measure properties of multidimensional dasets (homscedasticity, normality, etc.), visualising can already tell a lot.

## Visualising the two first dimensions
plot(my_data)

Note however that the dataset has 103 other dimensions so a 2 visualisation is often only superficial.

1 - Identify the mechanism and the process

With the dataset in mind, we can then identify the mechanism and the process at hand (or the other way around). In the chosen example, we can ask the question “Do modern mammals (crown) evolve more disparate body shapes than archaic ones?”. The mechanism here would be simply be evolution and the process here will be the age of the group. In other words, is there does “evolution has an effect (e.g. a different outcome) on the age of the group?”.

We can then choose a pattern: the disparity metric, function diversity metric, dissimilarity metric, space occupancy metric, etc.

3 - Select an aspect of the trait space that will answer your question

In terms of disparity here, we might be interested in two aspects: the diversity of body shapes can be expressed as changes in:

In a very contrasting scenario, we’d have a group that has a big size and high density against one with a small size and low density.

4 - Make a list of potential metrics

There is no miracle recipe for making a list of metrics. One easy way is to first look at what has been done before (e.g. what are they using in your favourite paper). For example, we have tested and played around with a diversity of metrics in Guillerme et al. (2020) and Guillerme et al. (2024) (TL:DR; for both papers: different things measure things differently).

Changes in size of the trait space:

  1. sum of ranges? The sum of the spreads of the data (spreads perimeter?).
  2. product of ranges? The surface/volume/hypervolume of the square/cube/hypercube that contains all the data.
  3. sum of variances? Same as for the sum of ranges but using the squared standard deviation in the data rather than the spread.
  4. product of variances? Same as for the product of ranges but using the squared standard deviation in the data rather than the spread.
  5. convex hull surface? The surface of the smallest polygon that contains all the data.
  6. convex hull volume? The volume of the smallest polygon that contains all the data.

Note here that the sum and products pairs for the ranges and variances are effectively measuring either the “perimeter” or the “surface/volume” in n dimensions.

Changes in density of the trait space:

  1. mean distance to centroid? The average distance between the group center and each observation.
  2. mean nearest neighbor distance? The average distance between each observation and it’s closest relative.
  3. mean squared pairwise distance (like in dtt)? The average pairwise distance - but squared?
  4. minimum spanning tree average length? The average branch length of the shortest tree that connects all observations.

Alternatively, you can design your very own metric! We’ll see a how to example later on.

5 - Test which metric would best work for your dataset and question

Once we have our list, we can test it using the dispRity package.

To test the metric, it’s relatively easy, you can just use the test.metric function. This function will gradually transform your trait space space following one of the implemented algorithm and show how your metric changes in response to the changes in trait space.

The different transformations (called “shifts”) that are currently implemented are: