# Basic Statistical Concepts

## What is Statistics?

## What is statistics?

Statistics is defined as the science of learning from data, and of measuring, controlling, and communicating uncertainty; and it thereby provides the navigation essential for controlling the course of scientific and societal advance. The objective of statistics is to make an inference about a population based on information contained in a sample taken from that population. The theory of statistics is a theory of information concerned with quantifying information, designing experiments or procedures for data collection and analyzing data. The goal of statistics is to minimize the cost of a specified quantity of information and to use this information to make inferences.

Statistics plays a pivotal role in economics, business, political, other social and natural sciences. The course is also applied in sports, military, among others. A person well versed in statistics is able to “play the background” to every profession by providing the necessary facts and figures that are required to make informed decisions.

## Different Branches of Statistics

For every problem that a statistician encounters in his career, there are two categories of how they can classify the type of statistics they should use. The two categories are descriptive and inferential statistics.

### 1. Descriptive statistics

Descriptive statistics involves the collection and presentation of data for use. Often, this is the first stage of statistical analysis. Descriptive statistics is a very important stage because it involves designing tools that will enable you provide the right answers to your questions. It involves designing experiments or studies that are not prone to errors that can lead to costly mistakes in decision making. It also involves the actual collection of data.

### 2. Inferential statistics

Inferential statistics involves drawing right conclusions from the data. Usually, inferential statistics involves interpretation and drawing conclusions from what was presented as descriptive statistics. Most social scientists perform experiments on a small proportion of the population and then generalize to the entire population. The generalizations that they make from the small proportion of the population guide in making decision about the characteristics of the entire population.

## Definitions Common in Statistics

### 1. Element

Elements are the entities upon which an observation is made (on which data is collected).

### 2. Variable

A variable is a characteristic of interest for the element. This is a characteristic that describes a person, a quantity, or even an idea. One key attribute of a variable is that it varies.

### 3. Measurements and data

Measurement collected on each variable for every element in a study provide the data. Data is the raw information from which we create statistics. The set of measurements obtained for a particular element is called an observation.

## Levels of Measurement

### 1. Nominal scale

When the data for a variable consist of labels or names used to identify an attribute of the element, the scale of measurement is considered a nominal scale.

### 2. Ordinal scale

The scale of measurement for a variable is called an ordinal scale if the data exhibit the properties of nominal data and the order and rank of the data is important. Ordinal scale data can be arranged in an ordering scheme. For example, satisfaction can be ordered as mostly preferred, less preferred, and not preferred at all; income levels can be classified as Lower, middle, upper; pain can be classified as None, low, moderate, severe.

### 3. Interval scale

The scale of measurement becomes an interval scale if the data show the properties of ordinal data and the interval between values is expressed in terms of a fixed unit of measurement. An example of interval data would be the number of goods deliveries at a depot between 10am and 5pm.

### 4. Ratio scale

The scale of measurement for a variable is ratio scale if the data have all the properties of interval data and the ratio of two values is meaningful. An example of ratio scale would be the grams of fat consumed per day for adults.

Variables can be classified into two broad categories numerical and categorical data (Definition of categorical data missing in content).

## Numerical or Quantitative Data

Numerical data possess the attribute that they carry meaning as measurement entities. For instance, prices, indices, age, number of farmers in an area. Sometimes numerical data are referred to as quantitative data. Quantitative data can also be divided into two groups namely continuous and discrete. The key word in quantitative data is that they carry numeric data. Numerical data can be further broken into two types: discrete and continuous

### 1. Discrete data

Discrete data carries /measurements that can be counted for example number of farming families in an area. When you can count a fixed number of items in a population, the discrete data is said to be finite. However, when you cannot count it, we say that it is countably infinite. Continuous data also present measurements. However, the key distinction is that you cannot possibly count them but can only express them as intervals on the real number line. For example, you cannot state the actual exchange rate because it has infinite number of decimals but you can only round off the decimals to present an approximate point on the number line. The other example is the growth rate of a tree.

### 2. Qualitative data

Qualitative data includes labels or names used to identify attributes of elements. Qualitative data can take nominal, ordinal, interval or ratio scale. Quantitative data require numeric values that indicate how much or how many. A qualitative variable is a variable with qualitative data and a quantitative variable has a quantitative data.

### Cross sectional and time series

Further, there are two major types of economic data namely cross sectional and time series data. Cross section data are data collected at the same or approximately the same point in time. A cross-sectional data set consists of a sample of individuals, households, farms, cities, districts, countries, or a variety of other units, taken at a given point in time.

Time series data are data collected over several time periods. Of note, panel data is a mixture of cross sectional and time series data. A time series data set consists of observations on a variable or several variables over time.

## Data Sources

There are several ways how one can obtain data. First, there are two kind of data sources, primary and secondary data sources. In collecting primary data, the researcher goes to the field and collects the data. Secondary data is data already collected by institutions.

## Experimental and Survey Data

In an experiment, the researcher collects data on the treated and controlled items. The key word in experimental research is the ability to control other variables while letting others vary. In the real world where everything varies and it is difficult to control some variables, researchers usually conduct surveys to mimic experiments. A survey is an examination of a system in operation in which the investigator does not have an opportunity to assign different conditions to the objects of the study. Experimental data can be easy to handle and might be clean while survey data is usually associated with a lot of anomalies.

## Steps of a Statistical Work and Scientific Method

The scientific method is a collection of techniques for examining occurrences, acquiring new knowledge, or modifying, refuting and integrating old knowledge. The scientific method plays an important role in statistical analysis. In the scientific method the scientist has to:

### 1. Make observations

In general, this involves asking questions about what you are observing around you and examining whether your observations change from one circumstance to another.

### 2. Think of interesting questions

Ask yourself questions about why some things happen. Why does the observation take a certain pattern?

### 3. Formulate hypotheses

Attempt to assert some general causes of the established pattern that you observed. Provide some general explanations to the phenomenon that you might have been experiencing.

### 4. Develop testable predictions

In this stage, the scientist attempts to develop hypotheses that can be tested

### 5. Gather data and test the hypotheses

This stage collects data and uses statistical procedures to accept or reject the hypotheses.

### 6. Refine, alter, expand or reject hypotheses

This stage repeats step 4 and 5 and attempts to refine, alter, expand the scope or reject hypothesis.

### 7. Formulate general theories

Having conducted stage 6 the scientist is then convinced that the results are true and formulates general theories.