Primary Biodiversity Data
Observations of the occurrence of a species are a fundamental unit of biodiversity data. We will explore in this unit, where to look for open-access occurrence data, how to access those sources from R, and tools for visualizing point distributions of species.
Library ‘spocc’
A great tool from the rOpenSci consortium (a group of developers building R capacity for open science).
Package details on GitHub
Tutorial here
We should all have spocc installed, but if not try:
install.packages('spocc')
With spocc installed we can try a simple query of the GBIF database that we have seen briefly before.
library(spocc)
## Warning in fun(libname, pkgname): rgeos: versions of GEOS runtime 3.7.1-CAPI-1.11.1
## and GEOS at installation 3.7.0-CAPI-1.11.0differ
spdist <- occ(query='Crotalus horridus', from='gbif')
## Registered S3 method overwritten by 'crul':
## method from
## as.character.form_file httr
The data are returned as an “S3 class” object. Somewhere in there is a tidyverse tibble (like a data frame table, but not).
print(spdist) ## Not obvious what or where the data are
View(spdist)
Maybe it’s still not obvious how we get in. To view an element of the data returned we use the “$” operator and call each by name. In general it’s easier to convert these to regular R data frame objects since not everything we want to do with these data is compatible with the tidyverse/spocc formatting.
df = as.data.frame(occ2df(spdist$gbif))
#Also try:
#head(df)
#colnames(df) #!! That's a lot of columns!!
mapr: Exploratory interactive mapping of species distribution data.
To create interactive graphics showing species occurrence locations and some metadata we can use ‘mapr’. This library uses a JavaScript library known as leaflet and Open Street Maps services (and others!) to create interactive maps that you can navigate through and click on points to pop-up metadata about each occurrence.
If not already done:
install.packages('mapr')
Then call map_leaflet() either on the spocc object:
library(mapr)
map_leaflet(spdist)
OR with the data.frame:
map_leaflet(df)
‘mapr’ shows the data for the first few columns in each pop-up tab. We can control what is shown there by only passing some columns to map_leaflet().
map_leaflet(df[,c('name', 'longitude', 'latitude', 'stateProvince', 'country', 'year', 'occurrenceID')])
Specifying columns makes it much easier to sift through large amounts of data to check sources and look for patterns of bias.
NOTE: mapr only works with data formatted by spocc and related libraries.
More with spocc queries.
Do you notice something odd when you run:
nrow(df)
## [1] 500
Check how many records are returned for the same search on the GBIF website
Our query only returned the first 500 records because that is the default for the occ() function.
We can fix that:
spdist2 <- occ(query='Crotalus horridus', limit=2500)
map_leaflet(spdist2)