The neotoma2 R Package

closeup of several Neotoma sites in the Caribbean.
closeup of several Neotoma sites in the Caribbean.

Neotoma Resources

The Neotoma Paleoecology Database is a domain-specific data resource containing millions of fossil records from around the globe, covering the last 5.4 million years. The neotoma2 R package simplifies some of the data structures and concepts to facilitate statistical analysis and visualization. Users may wish to gain a deeper understanding of the resource itself, or build more complex data objects and relationships. For those users a partial list is provided here, including a table of code examples focusing on different geographic regions, languages and dataset types.

Resources

Neotoma Data Structure

c. b. site a. bounding box site collection unit
Three panels showing context for Neotoma’s geographic representation of sites. In panel a a site is defined by the boundaries of a lake. The site also has a bounding box, and the core location is defined by a collection unit within the site that is defined with precise coordinates. In panel b a site is defined as a single point, for example, from a textual reference indicating the site is at the intersection of two roads. Here the site and collection unit share the unique point location. In panel c we show how that site location may be obfuscated using a bounding box as the site delimiter. In this case the collection unit would not be defined (but is represented as the triangle for illustration). Figure obtained from the Neotoma Database Manual.

Data in Neotoma is associated with sites, specific locations with lat/long coordinates. Within a site, there may be one or more collection units – locations at which samples are physically collected within the site. For example, an archaeological site may have one or more collection units, pits within a broader dig site; a pollen sampling site on a lake may have multiple collection units – core sites within the lake basin. Collection units may have higher resolution GPS locations, but are considered to be part of the broader site. Within a collection unit data is collected at various [analysis units] from which samples are obtained.

Because Neotoma is made up of a number of constituent databases (e.g., the Indo-Pacific Pollen Database, NANODe, FAUNMAP), a set of samples associated with a collection unit are assigned to a single dataset associated with a particular dataset type (e.g., pollen, diatom, vertebrate fauna) and constituent database.

Site: Nath Sagar Collection Unit 1 Collection Unit 2 Chronology Chronology AnalysisUnit Sample CharcoalDataset PollenDataset DiatomDataset
Figure. The structure of sites, collection units and datasets within Neotoma. A site contains one or more collection units. Chronologies are associated with collection units. Data of a common type (pollen, diatoms, vertebrate fauna) are assigned to a dataset.

Researchers often begin by searching for sites within a particular study area, whether that is defined by geographic or political boundaries. From there they interrogate the available datasets for their particular dataset type of interest. When they find records of interest, they will then often call for the data and associated chronologies.

The neotoma2 R package is intended to act as the intermediary to support these research activities using the Neotoma Paleoecology Database. Because R is not a relational database, we needed to modify the data structures of the objects. To do this the package uses a set of S4 objects to represent different elements within the database.

A diagram showing the different major classes within the neotoma2 R package, and the way the elements are related to one another. Individual boxes represent the major classes (sites, site, collectionunits, etc.). Each box then has a list of the specific metadata contained within the class, and the variable type (e.g., siteid: integer). Below these are the functions that can be applied to the object (e.g., [[<-).

It is important to note, here and elsewhere: Almost everything you will interact with is a sites object. A sites object is the general currency of this package. sites may have more or less metadat