[ad_1]

You can say the measurement of dispersion is an **up-gradation of the measurement of central tendency**. When you have two data sets then the mean can be the same for the two datasets. But both datasets are different. (Different Category). That’s why the measurement of dispersion comes to exist. There are three types of Measurement of Dispersion

**Range****Interquartile Range****Standard Deviation**

In this method of measurement, you will know how the data is spread out through **range variance** and **Standard Deviation. **Let’s know each of them in detail.

## Range

It is the difference between the **largest and smallest** values in the dataset. You can easily calculate. It is affected by outliers and also not consider all the data values in the dataset. It is mostly used to find the **minimum and maximum values** in the dataset. For example, consider the following values in a set.

`{ 10,3,50,45,30,100,75 } `

Here minimum value is 3 and the maximum value is 100. Therefore range will be the difference between the maximum and minimum value. So here range is 100-3 that is 97.

## Standard Deviation

Standard Deviation or SD is a commonly used measure of dispersion. It tells how the data values are spread out about the mean of the dataset. Before calculation standard deviation you have to first find the mean, variance of the values. The formulae for the SD is the s**quare root of the sum** of **squared deviations** from the **mean divided by the number of observations**.

## How to calculate the SD?

### Step 1: Find the sample mean.

### Step 2: Calculate Sample Standard Deviation using the result form the Step1.

The above formulae are for the Sample Standard Deviation. If you want to calculate Population SD then You will use N ( Number of Elements ) in the denominator.

## How to calculate Standard Deviation (SD ) in Excel?

In Excel, you can use The following formulae for finding the Standard Deviation using the following syntax.

`= STDEV(number1, [number2] ,....)`

The number2 is optional. For example when I have put the = STDEV(B2:B) then I get the result 27.81. It means our data values are spread from the mean by 27.81.

### Where you can use Standard Deviation in Real Life?

**Stock Market -To measure the Risk of the market. More risk more Standard Deviation****Financial Prediction****Weather Forecasting****Manufacturing Plants – For testing and quality control.****Real Estate – Buying risk of the household.**

### Interquartile Range

Interquartile Range is the measurement of variability in the dataset. Here you divide the dataset into four parts using **quartiles**. The value that divides the set into halves is known as quartiles. Q1, Q2, and Q3.

Initially, Q2 will divide the datasets into two half datasets. Now you have two sub-datasets upper sub-dataset and the lower sub-dataset.

Q1 is the median for the upper half of the sub-dataset and Q3 is the median for the lower half of the sub-dataset.

The interquartile range is equal to **Q3 -Q1**. In this measurement of dispersion, for considering the outlier you have to determine the fence for it. The formulae for the fence is **1.5 times the IQ** (Interquartile Range). In addition, If you will take any number outside the fence then it will be an outlier. The fence determines the value that can be maximum and all the values outside it will be an outlier.

*IQR = Q3-Q1*

*Fence = 1.5 IQR*

Identifying Outliers in machine learning is an important task. You can find outliers in your data using the IQR techniques. Why we remove outliers? Its because it affects the training process of the dataset that leads to weak model building and less accuracy to predict the inputs. You have to carefully handle the outliers to remove this problem.

## How to calculate Interquartile Range (IQR) in Excel?

In excel you can use the following functions for finding the Quartile.

`=QUARTILE(data, quartile number)`

`=QUARTILE(data,1), for Q1`

`=QUARTILE(data,2), for Q2`

`=QUARTILE(data,3), for Q3`

After finding all the quartiles you can use the mathematical calculation for finding the fence and IQR.

You can see the Q2 is the median of the whole data. The value of the IQR is 23 and Fence is 34.5. When you add the Fence value with Q3 then you will get the value 57.5. All the values outside 57.5 will be Outliers.

You can also draw Box plot like the above data.

## Conclusion

Measurement of dispersion tells how each value of the datasets is spread. In fact, the advantage of using the measurement of dispersion is that it considers all the values of the dataset. Now you must have understood the Measurement of Dispersion and how to calculate and use it in real life. If you have any questions then contact us. In the meantime, you can subscribe and like our Data Science Learner Page.

#### Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

[ad_2]

Source link