Wednesday, 29 June 2022, 9:15 PM
Site: Datatree - Data Training Engaging End-users
Course: Introduction to Data Tree (Intro)
Glossary: Glossary of Terms

Machine learning

The study and practice of designing systems that can learn, adjust, and improve automatically, based on the data fed to them. This typically involves implementation of predictive and statistical algorithms that focus on 'correct' behaviour and insights as data flows through the system.


A big data algorithm for scheduling work on a computing cluster. The process involves splitting the problem set up, mapping it to different nodes (map), and computing over them to produce intermediate results, shuffling the results to align like sets, and then reducing the results by outputting a single value for each set (reduce).


Prefix denoting a factor of 106 or a million


Background or contextual data about a dataset. Literally "data about data". Metadata is required to enable someone to properly understand and interpret a main dataset


  • The research questions that the data was collected to address
  • Any relevant environmental conditions affecting the main variables
  • The instruments used to collect or generate the data, including specification and calibration details.
  • The methodology used to collect or generate the data,
  • Definitions of variables, including units - also called a data dictionary


METeorological Aviation Report, a weather observation taken at a certain location, most likely an airfield, for use by pilots and weather forecasters. The METAR coding standard is agreed between civil aviation and weather authorities.


Representation of a real world situation. The word “model” is used in many ways and means different things, depending on the discipline. For example a meteorologist might think of a global climate model, used for weather forecasting, while an agronomist might think of a crop simulation model, used to estimate crop growth and yields. Statistical models form the bedrock of data analysis. A statistical model is a simple description of a process that may have given rise to observed data.