Glossary of Terms

All complex subjects have their own terminology that sometimes makes it hard for new people to break into the field. This sometimes includes uncommon words, but more often than not a subject will have very specific meanings for common words - the discussion of errors vs mistakes in this video is a good example of this.

This glossary is a reference of some of the uncommon terms and specific definitions of more common words that you will encounter throughout Data Tree and your broader dealings with data.

Many of these definitions come from the course materials and experts that helped develop Data Tree. Others come from the CASRAI Dictionary. Those definitions are kindly made available under a Creative Commons Attribution 4.0 International License.

Browse the glossary using this index

Special | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | ALL

P

p-value

The probability value (p-value) of a hypothesis test is the probability of getting a value of the test statistic as extreme, or more extreme, than the one observed, if the null hypothesis is true. Small p-values suggest the null hypothesis is unlikely to be true. The smaller it is, the more convincing is the evidence to reject the null hypothesis. In the pre-computer era it was common to select a particular p-value, (often 0.05 or 5%) and reject H0 if (and only if) the calculated probability was less than this fixed value. Now it is much more common to calculate the exact p-value and interpret the data accordingly.

Parameter

A parameter is a numerical value of a population, such as the population mean. The population values are often modelled from a distribution. Then the shape of the distribution depends on its parameters. For example the parameters of the normal distribution are the mean, μ and the standard deviation, σ. For the binomial distribution, the parameters are the number of trials, n, and the probability of success, θ.

Percentile

The pth percentile of a list is the number such that at least p% of the values in the list are no larger than it. So the lower quartile is the 25th percentile and the median is the 50th percentile. One definition used to give percentiles, is that the p’th percentile is the 100/p*(n+1)’th observation. For example, with 7 observations, the 25th percentile is the 100/25*8 = 2nd observation in the sorted list. Similarly, the 20th percentile = 100/20*8 = 1.6th observation.

Peta-

Prefix denoting a factor of 1015 or a million billion

Physical data

Data in the form of physical samples.

Examples: Soil samples, ice cores.

Polar orbiting

A satellite orbit passing above or nearly above both poles on each orbit. Polar orbiting satellites have a lower altitude above the Earth's surface than geostationary satellites and therefore an increased resolution.

Population

A population is a collection of units being studied. This might be the set of all people in a country. Units can be people, places, objects, years, drugs, or many other things. The term population is also used for the infinite population of all possible results of a sequence of statistical trials, for example, tossing a coin. Much of statistics is concerned with estimating numerical properties (parameters) of an entire population from a random sample of units from the population.

Precision

Precision is a measure of how close an estimator is expected to be to the true value of a parameter. Precision is usually expressed in terms of the standard error of the estimator. Less precision is reflected by a larger standard error.

Primary Data

Data that has been created or collected first hand to answer the specific research question.

Proportion

For a variable with n observations, of which the frequency of a particular characteristic is r, the proportion is r/n. For example if the frequency of replanting was 11 times in 55 years, then the proportion was 11/55 = 0.2 of the years, or one fifth of the years. (See also percentages.)

Provenance

In the case of data, the process of tracing and recording the origins of data and its movements between databases. Data's full history including how and why it got to its present palace.

Proxy

In the case of data, other data that you may use and/or transform when you do not have a direct measurement of the data you require.