Glossary of Terms
All complex subjects have their own terminology that sometimes makes it hard for new people to break into the field. This sometimes includes uncommon words, but more often than not a subject will have very specific meanings for common words - the discussion of errors vs mistakes in this video is a good example of this.
This glossary is a reference of some of the uncommon terms and specific definitions of more common words that you will encounter throughout Data Tree and your broader dealings with data.
Many of these definitions come from the course materials and experts that helped develop Data Tree. Others come from the CASRAI Dictionary. Those definitions are kindly made available under a Creative Commons Attribution 4.0 International License.
Special | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | ALL
C |
---|
CatalogueA type of collection that describes, and points to features of another collection. | |
CataloguingAn intellectual process of describing objects in accordance with accepted library principles, particularly those of subject and classification order. | |
Categorical VariableA variable with values that range over categories, rather than being numerical. Examples include gender (male, female), paint colour (red, white, blue), type of animal (elephant, leopard, lion). Some categorical variables are ordinal. | |
CausationThe capacity of one variable to influence another. The first variable may bring the second into existence or may cause the incidence of the second variable to fluctuate. RELATED TERM. Correlation | |
Change logTracks the progress of each change from submission through review, approval, implementation and closure. The log can be managed manually by using a document or spreadsheet, or it can be managed automatically with a software or Web-based tool. | |
ChecksumTo test if a file has changed over time. A checksum is a type of metadata and an important property of a data object to allow verifying identity and integrity. Also called a hash, a checksum is a randomly generated piece of data that is used to verify the fixity or stability of a digital object. It is most commonly used to detect whether some representation of digital object has changed over time. This is associated with PIDs but can be found and tested independently of PID systems. | |
Citable dataA type of referable data that has undergone quality assessment and can be referred to as citations in publications and as part of research objects. | |
Climate | |
Climate simulationUsing computer models and quantitative methods to represent the atmosphere, oceans, land, ice and energy budget of the Earth. | |
Cloud computingA large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualised, dynamically- scalable, managed computing power, storage, platforms and services are delivered on demand to external customers over the Internet. Key elements:
| |
Comma separated valuesA file that contains the values in a table as a series of ASCII text lines organized so that each column value is separated by a comma from the next column's value and each row starts a new line. | ||
Compute intensiveAny computer application that requires a lot of computation, such as meteorology programs and other scientific applications. | |
Computer code1. Computer code, or source code: A series of computer instructions written in some human readable computer language, usually stored in a text file. Computer code should include explanatory comments. 2. Machine code: Source code is 'compiled' or 'interpreted' to produce computer executable code. 3. A code is a collection of mandatory standards, which has been codified by a governmental authority and thus become part of the law for the jurisdiction represented by that authority. Examples include the National Building Code and the National Electrical Code. SYNONYM. Code; Source code; Script | |
Confidence intervalA confidence interval gives an estimated range of values that is likely to include an unknown population parameter.
For example suppose a study of planting dates for maize, and the interest is in estimating the upper quartile, i.e. the date by which a farmer will be able to plant in ¾ of the years. Suppose the estimate from the sample is day 332, i.e. 27th November and the 95% confidence interval is from day 325 to 339, i.e. 20th November to 4th December. Then the interpretation is that the true upper quartile is highly likely to be within this period.
The width of the confidence interval gives an idea of how uncertain we are about the unknown parameter (see precision). A very wide interval (in the example it is ± 7 days) may indicate that more data needs to be collected before an effective analysis can be undertaken. | |
Confidential informationAny information obtained by a person on the understanding that they will not disclose it to others, or obtained in circumstances where it is expected that they will not disclose it. For example, the law assumes that whenever people give personal information to health professionals caring for them, it is confidential as long as it remains personally identifiable. | ||
Confidentiality1. The duties and practices of people and organizations to ensure that individualsí personal information only flows from one entity to another according to legislated or otherwise broadly accepted norms and policies. 2. In the context of of health data: Confidentiality is breached whenever personal information is communicated that is not authorized by legislation, professional obligations, or under contractual duties. | |
Continuous variableA numeric variable is continuous if the observations may take any value within an interval. Variables such as height, weight and temperature are continuous.
In descriptive statistics the distinction between discrete and continuous variables is not very important. The same summary measures, like mean, median and standard deviation can be used.
There is often a bigger difference once inferential methods are used in the analysis. The model that is assumed to generate a discrete variable is different to models that are appropriate for a continuous variable. Hence different parameters are estimated and used. (See also discrete variable, mixed variable.) | |
CopernicusEarth Observation programme of the European Space Agency, primarily using the Sentinel series of satellites, to improve the understanding of and management of the environment. | |
CorrelationA statistical measure that indicates the extent to which two or more variables fluctuate together. Correlation does not imply causation. There may be, for example, an unknown factor that influences both variables similarly. | |
CryosphereThe part of the Earth-system where water is frozen, including glaciers and sea-ice. | |
CurationThe activity of managing and promoting the use of data from their point of creation to ensure that they are fit for contemporary purpose and available for discovery and reuse. For dynamic datasets this may mean continuous enrichment or updating to keep them fit for purpose. Higher levels of curation will also involve links with annotation and with other published materials. | |