Glossary of Terms

All complex subjects have their own terminology that sometimes makes it hard for new people to break into the field. This sometimes includes uncommon words, but more often than not a subject will have very specific meanings for common words - the discussion of errors vs mistakes in this video is a good example of this.

This glossary is a reference of some of the uncommon terms and specific definitions of more common words that you will encounter throughout Data Tree and your broader dealings with data.

Many of these definitions come from the course materials and experts that helped develop Data Tree. Others come from the CASRAI Dictionary. Those definitions are kindly made available under a Creative Commons Attribution 4.0 International License.

Browse the glossary using this index

Special | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | ALL

C

Catalogue

A type of collection that describes, and points to features of another collection.

- CASRAI Dictionary

Cataloguing

An intellectual process of describing objects in accordance with accepted library principles, particularly those of subject and classification order.

- CASRAI Dictionary

Categorical Variable

A variable with values that range over categories, rather than being numerical. Examples include gender (male, female), paint colour (red, white, blue), type of animal (elephant, leopard, lion). Some categorical variables are ordinal.

Causation

The capacity of one variable to influence another. The first variable may bring the second into existence or may cause the incidence of the second variable to fluctuate. RELATED TERM. Correlation

- CASRAI Dictionary

Change log

Tracks the progress of each change from submission through review, approval, implementation and closure. The log can be managed manually by using a document or spreadsheet, or it can be managed automatically with a software or Web-based tool.

- CASRAI Dictionary

Checksum

To test if a file has changed over time. A checksum is a type of metadata and an important property of a data object to allow verifying identity and integrity. Also called a hash, a checksum is a randomly generated piece of data that is used to verify the fixity or stability of a digital object. It is most commonly used to detect whether some representation of digital object has changed over time. This is associated with PIDs but can be found and tested independently of PID systems.

- CASRAI Dictionary

Citable data

A type of referable data that has undergone quality assessment and can be referred to as citations in publications and as part of research objects.

- CASRAI Dictionary

Climate

Long-term weather patterns for a location or area, measured in averages, maxima and minima. Typically a minimum of 30 years of weather is considered to be the basis of a climate.

Climate simulation

Using computer models and quantitative methods to represent the atmosphere, oceans, land, ice and energy budget of the Earth.

Cloud computing

A large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualised, dynamically- scalable, managed computing power, storage, platforms and services are delivered on demand to external customers over the Internet.

Key elements:

it is a specialised distributed computing paradigm;
it is massively scalable;
it can be encapsulated as an abstract entity that delivers different levels of services to customers outside the Cloud;
it is driven by economies of scale; and,
the services can be dynamically configured (via virtualisation or other approaches) and delivered on demand.

- CASRAI Dictionary

Cluster computing

Using multiple machines linked together and managing their collective capabilities to complete tasks. Computer clusters require a cluster management layer which handles communication between the individual nodes and coordinates work assignment.

Comma separated values

A file that contains the values in a table as a series of ASCII text lines organized so that each column value is separated by a comma from the next column's value and each row starts a new line.

- CASRAI Dictionary

Keyword(s):

Compute intensive

Any computer application that requires a lot of computation, such as meteorology programs and other scientific applications.

- CASRAI Dictionary

Computer code

1. Computer code, or source code: A series of computer instructions written in some human readable computer language, usually stored in a text file. Computer code should include explanatory comments. 2. Machine code: Source code is 'compiled' or 'interpreted' to produce computer executable code. 3. A code is a collection of mandatory standards, which has been codified by a governmental authority and thus become part of the law for the jurisdiction represented by that authority. Examples include the National Building Code and the National Electrical Code. SYNONYM. Code; Source code; Script

- CASRAI Dictionary

Confidence interval

A confidence interval gives an estimated range of values that is likely to include an unknown population parameter. For example suppose a study of planting dates for maize, and the interest is in estimating the upper quartile, i.e. the date by which a farmer will be able to plant in ¾ of the years. Suppose the estimate from the sample is day 332, i.e. 27th November and the 95% confidence interval is from day 325 to 339, i.e. 20th November to 4th December. Then the interpretation is that the true upper quartile is highly likely to be within this period. The width of the confidence interval gives an idea of how uncertain we are about the unknown parameter (see precision). A very wide interval (in the example it is ± 7 days) may indicate that more data needs to be collected before an effective analysis can be undertaken.

Confidential information

Any information obtained by a person on the understanding that they will not disclose it to others, or obtained in circumstances where it is expected that they will not disclose it. For example, the law assumes that whenever people give personal information to health professionals caring for them, it is confidential as long as it remains personally identifiable.

- CASRAI Dictionary

Keyword(s):

Confidentiality

1. The duties and practices of people and organizations to ensure that individualsí personal information only flows from one entity to another according to legislated or otherwise broadly accepted norms and policies.

2. In the context of of health data: Confidentiality is breached whenever personal information is communicated that is not authorized by legislation, professional obligations, or under contractual duties.

- CASRAI Dictionary

Continuous variable

A numeric variable is continuous if the observations may take any value within an interval. Variables such as height, weight and temperature are continuous. In descriptive statistics the distinction between discrete and continuous variables is not very important. The same summary measures, like mean, median and standard deviation can be used. There is often a bigger difference once inferential methods are used in the analysis. The model that is assumed to generate a discrete variable is different to models that are appropriate for a continuous variable. Hence different parameters are estimated and used. (See also discrete variable, mixed variable.)

Copernicus

Earth Observation programme of the European Space Agency, primarily using the Sentinel series of satellites, to improve the understanding of and management of the environment.

Correlation

A statistical measure that indicates the extent to which two or more variables fluctuate together. Correlation does not imply causation. There may be, for example, an unknown factor that influences both variables similarly.

- CASRAI Dictionary

Cryosphere

The part of the Earth-system where water is frozen, including glaciers and sea-ice.

Curation

The activity of managing and promoting the use of data from their point of creation to ensure that they are fit for contemporary purpose and available for discovery and reuse. For dynamic datasets this may mean continuous enrichment or updating to keep them fit for purpose. Higher levels of curation will also involve links with annotation and with other published materials.

- CASRAI Dictionary