Introduction to Statistics
Definitions
The quantities measured in a study are called random variables, and a particular outcome is called an observation. Several observations are collectively known as data. The collection of all possible outcomes is called the population.
In practice, we cannot usually observe the whole population. Instead we observe a sub-set of the population, known as a sample. In order to ensure that the sample we take is representative of the whole population, we usually take a random sample in which all members of the population are equally likely to be selected for inclusion in the sample. For example, if we are interested in conducting a survey of the amount of physical exercise undertaken by the general public, surveying people entering and leaving a gymnasium would provide a biased sample of the population, and the results obtained would not generalise to the population at large.
Variables are either qualitative or quantitative. Qualitative variables have non-numeric outcomes, with no natural ordering. For example, gender, disease status, and type of car are all qualitative variables. Quantitative variables have numeric outcomes. For example, survival time, height, age, number of children, and number of faults are all quantitative variables.
Quantitative variables can be discrete or continuous.
Discrete random variables have outcomes which can take only a countable number of possible values. These possible values are usually taken to be integers, but don’t have to be. For example, number of children and number of faults are discrete random variables which take only integer values, but your score in a quiz where “half” marks are awarded is a discrete quantitative random variable which can take on non-integer values. Continuous random variables can take any value over some continuous scale. For example, survival time and height are continuous random variables. Often, continuous random variables are rounded to the nearest integer, but they are still considered to be continuous variables if there is an underlying continuous scale. Age is a good example of this.