Normal Distribution, also called Gaussian distribution, is arguably the most important distribution from a statistical analysis perspective. As a Lean Six Sigma practitioner, one needs to understand this distribution, its characteristics and applications in the projects. We will look at all the details pertaining to Normal distribution and its application in Lean Six Sigma. Read on!
In the last posts, we discussed basic probability concepts and Probability distributions at length. We spoke about discrete and continuous probability distributions. We also looked at histograms and how to build the same to identify shape or distribution of the data. Click on the links to read the related posts as we will build on from there. (Opens in a new tab).
What is Normal Distribution?
Normal distribution is the data distribution that you get when the data is clustered around the center (mean) of the data and extends towards both sides almost symmetrically. This means, maximum number of data points are at the center of the range of your data set as compared to both ends of the range.
Weights of students in a class
Let us take a simple example of weights of students in a particular class with 50 students. Most of the students will normally have approximately the same weight. This would be close to the average or mean weight of the class. There would be a few students who are overweight and a few who are underweight.
If you actually collect such data, you will get something similar to the below table.
42.9 | 46.5 | 48.1 | 48.8 | 49.7 |
44.4 | 47.3 | 48.2 | 48.9 | 49.7 |
44.5 | 47.4 | 48.3 | 49.1 | 50.0 |
44.6 | 47.6 | 48.4 | 49.2 | 50.4 |
46.4 | 47.7 | 48.7 | 49.3 | 50.4 |
50.4 | 51.0 | 51.2 | 51.7 | 53.1 |
50.4 | 51.0 | 51.4 | 51.9 | 53.4 |
50.6 | 51.1 | 51.4 | 52.1 | 53.6 |
50.7 | 51.2 | 51.6 | 52.2 | 54.8 |
50.7 | 51.2 | 51.6 | 53.1 | 56.4 |
Observations from the Weights Data
If you carefully observe or even better, plot a histogram using the above data, you will see the below points;
- The average weight of the class if 49.9, very close to 50.
- The weights are clustered around this mean – 33 students weigh between 48 and 52 kgs out of 50 students
- There are only 3 students with weight less than 46 kgs, just one less than 44 kgs
- Similarly, there are only 2 students who weigh more than 54 kgs, just one above 56 kgs
Below table shows the frequency of data points and the graph shows the histogram of the above data.
Intervals | Weight Frequency |
42 kgs to 44 kgs | 1 |
44 kgs to 46 kgs | 2 |
46 kgs to 48 kgs | 6 |
48 kgs to 50 kgs | 13 |
50 kgs to 52 kgs | 20 |
52 kgs to 54 kgs | 6 |
54 kgs to 56 kgs | 1 |
56 kgs to 58 kgs | 1 |
Such a data set is said to be normally distributed.
Parameters of Normal Distribution
Any probability distribution usually is defined by some key parameters. These parameters go on to define the shape and hence the distribution of the data.
For Normal distribution, Mean and Standard deviation are such parameters. The shape of your normal distribution will depend on the mean and the standard deviation of your data set. Let’s see how!
Mean
Mean is the mathematical average of the data. It is the sum of all data values in your data set divided by the total number of values you have in the data set.
Mean is used the represent the central tendency of the data. As discussed earlier, most of the values in the normally distributed data will be clustered around the mean. In terms of probabilities, the probability of a value being close to the mean is way more than the probability of a value falling farther away from the mean.
The shape of the normal curve will change when the mean of the data changes. The whole normal distribution curve will move to the left or right depending on the change in the mean.
Standard Deviation
Standard deviation is the measure for variation in your data. It essentially represents how close or distant the data values are spread from the mean value. Hence, standard deviation dictates the width of your normal distribution curve.
Related Post : What are the measures of Variation?
When the standard deviation is small, the curve is tall and narrow. And when the standard deviation is big, the curve is short and wide. Look at the below image depicting the same.
Thus, the actual shape of your normal distribution curve will depend on these 2 parameters, mean and standard deviation.
Now that we understand what normal distribution is and how the shape of the curve varies based on mean and standard deviation, let us take a look at some of its characteristics.
Characteristics of a Perfect Normal Distribution
A perfect normal distribution will always follow the below characteristics. Do not confuse this with standard normal distribution. It is different and we will talk about it later in this post.
1. It will always look like a bell shaped curve
2. The mean will always be at the center of the curve
3. Half of the data points will always be greater than the mean, that is, on the right side of the mean
4. Other half of the data points will always be smaller than the mean, that is, on the left side of the mean
5. The mean, mode and median of the perfectly normally distributed data will always be equal
6. 68.2% of the data values will always be between +/- 1 standard deviation from the mean
7. 95.4% of the data values will always be between +/- 2 standard deviations from the mean
8. 99.7% of the data values will always be between +/- 3 standard deviations from the mean
Below normal distribution shows the percentage of values within each standard deviation range. Take a minute to go through the same.
What does these probabilities mean?
The data of weights mentioned above has a mean of 49.9 kgs and a standard deviation of 2.7. And it follows normal distribution. This means, if you pick up any random student from the class, there is a 68.2% probability that this student weigh between 47.2 kgs and 52.6 kgs (+/- 1 std dev). There is also a 95.4% chance that this student weighs between 44.5 kgs and 55.3 kgs (+/- 2 std dev).
Thus, once you know that a particular data set follows normal distribution and know the parameters (mean and standard deviation), you can predict the probability of the random variable taking a value within a range.
The same is true for any normally distributed data set.
One important point to remember here.
Normal distribution is not the same as Symmetrical distribution. All normal distributions are symmetrical, however, not all symmetrical distributions are normal.
Standard Normal Distribution
We saw that the normal distribution curve can have take various shapes depending on the mean and standard deviation of the data. There can be one with a mean of 50 and standard deviation of 3 and there can be another with a mean of 100 and a standard deviation of 5. The next question that we need to answer is, how to compare such different normally distributed data sets or processes.
Converting the data into a Standard normal distribution is the answer.
This is a distribution with a mean of ‘Zero’ and a standard deviation of 1.
All normal distributions can be converted to standard normal distribution. This is done by calculating the standard score or Z score for each of the data value in your data set. And then we can compare them since they are on the same scale. This distribution is also caller a Z-Distribution.
Essentially, a Z score of a data point represents how far the said data point is from the mean. If you have a Z Score of 0, it means the data point is the mean of the data. A Z score of 1 means that the data point is 1 standard deviation on the right side of the mean (mean + 1 std dev). A Z score of -1 means that the data point is 1 standard deviation on the left side of the mean (mean – 1 std dev).
By the way, do check out the Certified Lean Six Sigma Black Belt Handbook – it is one of the most essential guide for anyone trying to get certified as LSS Black belt or in general wants to understand LSS and improve processes. – check it out here.
Master Lean and Six Sigma Acronyms in No Time!
The Ultimate Guide to LSS Lingo – Yours for Free
Subscribe and Get Your Hands on the Most Comprehensive List of 220+ LSS Acronyms Available. No more searching for definitions, no more confusion. Just pure expertise at your fingertips. Get your free guide and other ebooks and templates today. Download Now!
How to calculate the Z Scores?
It is quite simple. All you need to know is the mean and the standard deviation of your data set.
To calculate the Z Score for a data point, simply substract the mean of the data from the data value and divide it by the standard deviation. That’s it.
The mean weight of our example class is 49.9 and the standard deviation is 2.7. The weight of the first student was 42.9 kgs, the first value in our data set. To calculate the Z score of this value, first substract the mean from this value.
We get 42.9 – 49.9 = 7
Next divide this subtraction by the standard deviation.
We get -7/2.7 = -2.59
This is the Z Score for 42.9 kgs.
Similarly, for 54.8 kgs, another value in our data set, the Z score is (54.8-49.9)/2.7 = 1.81. You can calculate the Z scores for all the values using the same method. See the below table;
Weights | Weight – Mean | Z Score |
50.40 | 0.51 | 0.1938 |
52.10 | 2.21 | 0.8349 |
50.60 | 0.71 | 0.2692 |
51.40 | 1.51 | 0.5709 |
48.80 | -1.09 | -0.4095 |
50.40 | 0.51 | 0.1938 |
50.70 | 0.81 | 0.3069 |
47.70 | -2.19 | -0.8243 |
44.60 | -5.29 | -1.9933 |
48.10 | -1.79 | -0.6735 |
42.90 | -6.99 | -2.6343 |
48.30 | -1.59 | -0.5981 |
50.70 | 0.81 | 0.3069 |
48.20 | -1.69 | -0.6358 |
44.50 | -5.39 | -2.0310 |
49.70 | -0.19 | -0.0701 |
51.00 | 1.11 | 0.4201 |
47.40 | -2.49 | -0.9374 |
53.60 | 3.71 | 1.4005 |
49.70 | -0.19 | -0.0701 |
51.10 | 1.21 | 0.4578 |
51.20 | 1.31 | 0.4955 |
50.40 | 0.51 | 0.1938 |
48.90 | -0.99 | -0.3718 |
50.40 | 0.51 | 0.1938 |
51.60 | 1.71 | 0.6463 |
51.40 | 1.51 | 0.5709 |
56.40 | 6.51 | 2.4563 |
50.00 | 0.11 | 0.0430 |
51.70 | 1.81 | 0.6840 |
53.10 | 3.21 | 1.2119 |
46.50 | -3.39 | -1.2768 |
48.70 | -1.19 | -0.4472 |
51.20 | 1.31 | 0.4955 |
47.60 | -2.29 | -0.8620 |
51.60 | 1.71 | 0.6463 |
53.40 | 3.51 | 1.3251 |
47.30 | -2.59 | -0.9751 |
53.10 | 3.21 | 1.2119 |
48.40 | -1.49 | -0.5603 |
49.30 | -0.59 | -0.2210 |
54.80 | 4.91 | 1.8530 |
49.10 | -0.79 | -0.2964 |
51.20 | 1.31 | 0.4955 |
44.40 | -5.49 | -2.0687 |
49.20 | -0.69 | -0.2587 |
51.90 | 2.01 | 0.7594 |
52.20 | 2.31 | 0.8726 |
46.40 | -3.49 | -1.3145 |
51.00 | 1.11 | 0.4201 |
The distribution for Z Scores will look like as shown below. This is the Standard Normal Distribution for weights data of our class.
Now, if you convert the weights data of multiple classes into Standard Normal Distributions, with a mean of 0 and a standard deviation of 1, you can easily compare the weights of all the classes. You can also see if a particular student is overweight or underweight with respect to his or her class. And you can also make statements such as ” student A from class VI is doing better than student B from class X” based on which side of the mean this student stands. Thus, standard normal distribution helps us to “compare apples and oranges” as well 🙂
Importance of Normal Distribution
It is quite evident how important a normal distribution is in statistically analyzing data based on what we have discussed above. Specially for drawing conclusions about the population based on the sample data. Hence, it is also extremely critical for every Lean Six Sigma practitioner to understand Normal distribution. Since this is exactly what we do, statistically analyze data and draw conclusions about the population based on the sample.
Apart from what we already discussed, there are other reasons for importance of Normal Distribution.
For lean six sigma project, you will either have data which follows normal distribution or non-normal distribution. More often than not, you will see that process data follows Normal Distribution.
A lot of tests that you will do in your Analyze phase as well as some in the measure phase assumes that your data follows normal distribution.
Even if you have data which does not follow normal distribution, if you select multiple samples from the same data set, the means of such samples also tend to follow normal distribution. This is specially important because, in our processes, it is not possible to capture the data for whole population. We usually pick up multiple samples from the population. The distribution of the means of such samples will follow normal distribution irrespective of the distribution of the population data. More on this when we discuss Central Limit Theorem.
This is quite sufficient about Normal Distribution that a Lean Six Sigma practitioner needs to understand. Do let me know if there is anything I missed or if you have any comments / questions in the comments section below. I will surely get back to you. Don’t forget to subscribe so you wont miss on the latest posts!
Master Lean and Six Sigma Acronyms in No Time!
The Ultimate Guide to LSS Lingo – Yours for Free
Subscribe and Get Your Hands on the Most Comprehensive List of 220+ LSS Acronyms Available. No more searching for definitions, no more confusion. Just pure expertise at your fingertips. Get your free guide and other ebooks and templates today. Download Now!
Sachin Naik
Passionate about improving processes and systems | Lean Six Sigma practitioner, trainer and coach for 14+ years consulting giant corporations and fortune 500 companies on Operational Excellence | Start-up enthusiast | Change Management and Design Thinking student | Love to ride and drive
Great content! Super high-quality! Keep it up! 🙂
Thanks a lot for the encouraging words. Appreciate the feedback.
You’ve among the best web sites.
Thank you! Appreciate the feedback.
Thanks Brian, appreciate the feedback.
Can you tell us more about this? I’d love to find out more
details.
Sure, please let me know what more on this topic or which other topic you would want me to cover.
I truly enjoy reading your blog and I look forward to your new updates.
Unbelievably user friendly website. Immense details offered on couple of clicks on.
My brother suggested I might like this website.
He was entirely right. This post truly made my day.
You cann’t imagine just how much time I had spent for this information! Thanks!
Sachin Ji, This is the best simplified and filtered knowledge in a nut shell. Kudos and appreciate your good work.
This is an excellent article that clearly explains the concept of normal distribution and its importance in Lean Six Sigma. The use of real-world examples and illustrations made it easy for me to understand how it is applied in practice. The article also highlighted the importance of understanding the underlying assumptions of the normal distribution and how it can impact the interpretation of data. I am impressed by the clear, concise, and informative nature of this article. It is a great resource for anyone looking to improve their knowledge of statistical process control. Thank you for sharing!
It was great thing to learn. thank you.