Glossary of Terms


All complex subjects have their own terminology that sometimes makes it hard for new people to break into the field. This sometimes includes uncommon words, but more often than not a subject will have very specific meanings for common words - the discussion of errors vs mistakes in this video is a good example of this.

This glossary is a reference of some of the uncommon terms and specific definitions of more common words that you will encounter throughout Data Tree and your broader dealings with data. 

Many of these definitions come from the course materials and experts that helped develop Data Tree. Others come from the CASRAI Dictionary. Those definitions are kindly made available under a Creative Commons Attribution 4.0 International License.



Browse the glossary using this index

Special | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | ALL

S

Sample

A sample is a group of units, selected from a larger group (the population). By studying the sample it is hoped to draw valid conclusions (inferences) about the population. A sample is usually used because the population is too large to study in its entirety. The sample should be representative of the population. This is best achieved by random sampling. The sample is then called a random sample.

Sampling Distribution

A sampling distribution describes the probabilities associated with an estimator, when a random sample is drawn from a population. The random sample is considered as one of the many samples that might have been taken. Each would have given a different value for the estimator. The distribution of these different values is called the sampling distribution of the estimator. Deriving the sampling distribution is the first step in calculating a confidence interval, or in conducting a hypothesis test.

Satellite imagery

An image of part of the Earth taken using artificial satellites in orbit around the Earth. These images have a variety of uses including

Secondary Data

Existing data which is being reused for a purpose other than the one for which it was collected.

Sentinel satellites

A family of Earth Observation satellite missions by the European Space Agency http://m.esa.int/Our_Activities/Observing_the_Earth/Copernicus/Overview4

Signal to noise ratio

A measure of how much useful information there is in a system, a phrase applied generally but originating in electrical systems to indicate the strength of the information (signal) compared to unwanted interference (noise), a low signal to noise ratio means that it is difficult to determine the useful information.

Simulation research data

Research data generated from test models where the model and metadata may be more important than the output data from the model.

Examples: Climate or ocean circulation models.

Skew

If the distribution (or “shape”) of a variable is not symmetrical about the median or the mean it is said to be skew. The distribution has positive skewness if the tail of high values is longer than the tail of low values, and negative skewness if the reverse is true.

Smart Meter

A new kind of energy meter that can digitally send meter readings to your energy supplier and come with in home display units, to see in real-time how much energy is being used in a household.

Software developer

A person who researches, designs, programs and tests computer code.

Stakeholder

Individuals, groups or organisations that have an interest or share in an undertaking or relationship and its outcome - they may be affected by it, impact or influence it, and in some way be accountable for it.

- CASRAI Dictionary

Standard deviation

The standard deviation (s.d.) is a commonly used summary measure of variation or spread of a set of data. It is a “typical” distance from the mean. Usually, about 70% of the observations are closer than 1 standard deviation from the mean and most (about 95%) are within 2 s.d. of the mean.

Standard error

The standard error (s.e.) is a measure of precision. It is a key component of statistical inference. The standard error of an estimator is a measure of how close it is likely to be, to the parameter it is estimating.

Stream processing

The practice of computing over individual data items as they move through a system. This allows for real-time analysis of the data being fed to the system and is useful for time-sensitive operations using high velocity metrics.