Glossary of Terms
All complex subjects have their own terminology that sometimes makes it hard for new people to break into the field. This sometimes includes uncommon words, but more often than not a subject will have very specific meanings for common words - the discussion of errors vs mistakes in this video is a good example of this.
This glossary is a reference of some of the uncommon terms and specific definitions of more common words that you will encounter throughout Data Tree and your broader dealings with data.
Many of these definitions come from the course materials and experts that helped develop Data Tree. Others come from the CASRAI Dictionary. Those definitions are kindly made available under a Creative Commons Attribution 4.0 International License.
Special | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | ALL
1. An evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that have the potential to be mined for information.
2. Data that would take too much time and cost too much money to load into relational databases for analysis (typically petabytes and exabytes of data).
3. Extensive datasets/collections/linked data primarily characterized by big volume, extensive variety, high velocity (creation and use), and/or variability that together require a scalable architecture for efficient data storage, manipulation, and analysis. In general, the size is beyond the ability of typical database software tools to capture, store, manage and analyze. It is assumed that as technology advances over time, the size of datasets that qualify as big data will increase. Also the definition can vary by sector, depending on what kind of software tools are commonly available and what sizes of datasets are common in a particular industry. With those caveats, big data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes).
Any device whose workings are not understood by or accessible to its user.
Digital materials which are not intended to have an analogue equivalent, either as the originating source or as a result of conversion to analogue form.
This term is used to differentiate them from
- CASRAI Dictionary
A graphical representation of numerical data, based on the five-number summary and introduced by John Wilder Turkey in 1970. The diagram has a scale in one direction only. A rectangular box is drawn, extending from the lower quartile to the upper quartile, with the median shown dividing the box. ‘Whiskers’ are then drawn extending from the end of the box to the greatest and least values. Multiple boxplots, arranged side by side, can be used for the comparison of several samples.