Glossary of Terms

All complex subjects have their own terminology that sometimes makes it hard for new people to break into the field. This sometimes includes uncommon words, but more often than not a subject will have very specific meanings for common words - the discussion of errors vs mistakes in this video is a good example of this.

This glossary is a reference of some of the uncommon terms and specific definitions of more common words that you will encounter throughout Data Tree and your broader dealings with data. 

Many of these definitions come from the course materials and experts that helped develop Data Tree. Others come from the CASRAI Dictionary. Those definitions are kindly made available under a Creative Commons Attribution 4.0 International License.

Browse the glossary using this index

Special | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | ALL



The range is the difference between the maximum and the minimum values. It is a simple measure of the spread of the data.

Raw Data

Raw data are data that have not been processed for meaningful use. A raw dataset is exactly what is collected, before any data cleaning, processing or analysis has been completed. 

It is often useful to store raw data as well as the cleaned, processed data, as it can help your work to be more easily reproduced. If another researcher has your raw data and the steps you used to process and analyse, they can recreate your results. This has to be balanced with the cost of storing raw data, and the likelihood of the raw data being useful compared to data that has undergone an initial process of data cleaning.

Reference research data

A static or organic conglomeration or collection of smaller (peer reviewed) datasets, most probably published and curated, e.g. UK Tide Gauge Network, IUCN Red List of Endangered Species

Research Data

The evidence that underpins the answer to the research question

 - UK Concordat on Open Research Data (2016)

Recorded factual material commonly retained by and accepted in the scientific community as necessary to validate research findings.

 - EPSRC Policy Framework on Research Data

Data that are used as primary sources to support technical or scientific enquiry, research, scholarship, or artistic activity, and that are used as evidence in the research process and/or are commonly accepted in the research community as necessary to validate research findings and results. All other digital and non-digital content have the potential of becoming research data. Research data may be experimental data, observational data, operational data, third party data, public sector data, monitoring data, processed data, or repurposed data.

- CASRAI Dictionary

Research Data Lifecycle

A model to conceptualise the different stages through which data pass during the research process, and the data management activities that relate to those stages.

The model used throughout Data Tree has six stages, corresponding to different activities during the life of a research project. Other institutions or paradigms have slight variations on these stages, but the broad concepts are applicable no matter how you choose to categorise your research activities. 

Our model is based on the UK Data Service model from 2011, and has the following stages: 

  • Re-using Data: Often considered both the start and the end of the cycle. Your research might start by gathering secondary data, and your own research outputs might be later used by yourself or others in different sectors.
  • Creating Data: Data collection or generation activities.
  • Processing Data: The tasks of turning raw data into analysis-ready data. This includes quality control checks, data cleaning and documentation.
  • Analysing Data: Includes data visualisations and statistical analysis; tasks that involve the process of getting information out of your data.
  • Preserving Data: The tasks of putting your data into a location for long-term storage and access, such as a data repository.
  • Making Data Accessible: This includes not just ensuring your data can be accessed, but also making others aware of your data. This might include publishing in a data journal and adding appropriate licences to your preserved data.


In satellite imagery, the satellite's sensors operate in three channels, red, green and blue separately, and can be combined to give a colour image.