Time for another rendition of our series covering the key data visualizations to know for data scientists!

In the past month, we’ve gone into box plots, scatter plots and histograms… this week, we’re featuring the arguably most recognizable graph of all time: **bar graphs.**

Good ol’ bar graphs are some of the easiest and simplest forms of data visualizations, but let’s take it beyond what we learned in fourth-grade math and get into some of the more advanced questions when it comes to bar graphs, including when to use bar graphs, how to use them, and how to make them in Python.

Plus, let’s also talk through what the differences between bar graphs vs. histograms vs. box plots are and when to use which type of visualization.

## What are Bar Graphs?

**A bar graph (usually) contains an x-axis, a y-axis and is made up of several bars. These bars should ideally allow you to compare your data across different categories.**

Let’s make this a little clearer by using an example: say we are plotting my monthly personal budget into different categories and seeing which categories are my most expensive expenses.

The bar graph makes it very easy to compare the different categories.

And yes, my coffee expenses are truly astronomical.

In our bar graph, we’ve decided to order the expenses from largest to smallest. We also show the value of each bar at the top above the bar.

### Difference Between Bar Graphs vs Histograms

Now you may be asking, *“Hold up, Max. That looks exactly like the histograms we talked about last week…”*

Histograms and bar graphs are notoriously known for being brothers from another mother, but they serve different purposes and have clear distinctions.

In our bar graph above, we’re comparing the total dollar amounts of the different categories of my monthly budget.

**For bar graphs, we are always looking at categorical data**and basically*measuring the amount of each category*, and then we show the results with a bar representing each individual category.**Histograms, on the other hand, show numerical data**and look at the*occurrences of different values within the same category*.

Let’s do an example, and use the same situation as above.

In the bar graph above, we plotted the different categories within my budget and how much I spend per expense category.

A histogram, instead, would only plot the data from within one of those categories.

So if we just took my ‘grocery shopping’ category, we would maybe use a histogram to show how much I tend to spend on every grocery trip; the histogram could then be divided into bins with a range of $5 each, as shown below.

*(That one expensive trip is from the holiday party that my girlfriend insists we host.)*

**So the key difference between histograms and bar graphs is whether you are plotting numerical or categorical data.**

Curious about data science but not sure where to start?

**Join my free class** where I share 3 secrets to Data Science and give you a 10-week roadmap to getting going!

## When to Use Bar Graphs

You’ll want to use bar graphs when your data is a choice between different categories (like the different expense categories we looked at above).

You’ll want to use histograms when your data is numerical and show the different values that one category can take on (such as the distribution of grocery costs per trip or the distribution of the number of steps taken in a day by college students).

If you want to learn more about histograms, **you can read all about them in my in-depth blog post here.**

## How to Use Bar Graphs

Fortunately, well-done bar graphs are one of the most intuitive types of visualizations to read.

**All you have to do is compare the bars for the categories that you’re interested in.**

Let’s say you’re working at a local car dealership, and you want to quickly understand what types of cars are selling the best. You group your cars into different categories: minivans, sports cars, SUVs, luxury cars, and muscle cars. Then, you visualize how many of each category of car you sold this year.

*You can now quickly see that minivans and SUVs are your top performers. *

Although the ‘Other’ category is also decently large, it represents cars from many different types of categories that you didn’t feel like needed to be labeled as their own categories.

You’ll often see this, and although it may be worth taking a look to see if there’s one car type in the ‘Other’ category that dominates, you’ll often find that the ‘Other’ category is large because it’s made up of many different categories that all only bring a small contribution.

Interestingly, you see that sport, luxury, and muscle cars don’t really move as well as you thought.

Just based on this bar graph, **you can learn that your customers seem to prefer buying cars that provide some utility.** You may determine that what minivans and SUVs have in common is that they have a lot of space for people, and they have a large trunk which means you can transport more.

From this simple bar graph, you can already start to formulate a marketing plan where you can shift some of your advertising and marketing budget into pushing your best-performing products further; for example, you can highlight the spacious quality of minivans and SUVs and also talk about some of the great deals you’re offering on them.

This bar graph alone, of course, shouldn’t be the only research you do on your customers and your business, but **it is a very quick and easy visualization to guide you in the right direction of what to focus on.**

## How to Make a Bar Graph in Python

Creating bar graphs in Python is very straightforward. We’ll use the Python data visualization library, Matplotlib, to help us create all the visualizations.

The way that you want your data to look is to just be a simple list, where each element contains the count for a specific category.

If we use the car types example from above, our data would look like:

carsSold = [167,40,112,20,19,80]

It’s a good idea to create a separate list that also stores the names of each category. For our current example, that would be:

carCategories = [“mini van”, “sport”, “SUV”, “luxury”, “muscle”,”other”]

Using the two lists of data, we see, for example, that minivans were sold 167 times, because they are stored in the same index. Make sure you keep your order consistent so that you don’t accidentally assign a wrong name to a bar.

To create the bar graph, all you have to do now is the following:

import matplotlib.pyplot as plt

plt.bar(x=range(0,len(carsSold)),height=carsSold)

plt.xticks(range(0,len(carsSold)),carCategories)

plt.show()

This will give us the following graph:

If we look at the code above, for each bar, we had to give the x position and the height of the bar. In our case, the height of the bar is just the number of units sold, and for the x positions, we just created numbers from 0 up to, but not including, the number of car types we have, using the Python range function.

In the next line, we then replaced these numbers with the appropriate name for each bar. And that’s it!

Of course, we can easily still add some customization, like making our bars red with a black outline, adding in a title and axis label, showing the count above each bar, and making our graph square, like this:

import matplotlib.pyplot as plt

plt.figure(figsize=(8,8))

plt.bar(x=range(0,len(carsSold)),height=carsSold,color=”red”, edgecolor=”black”)

plt.xticks(range(0,len(carsSold)),carCategories)

for i in range(len(carsSold)):

plt.text(x=i,y=carsSold[i]+2,s=str(carsSold[i]),ha=’center’, va=’center’, fontsize=12)

plt.title(“Cars types sold this year, N = “+str(sum(carsSold)))

plt.ylabel(“Units sold”)

plt.xlabel(“Car type”)

plt.show()

*Ta-dah!*

And that’s honestly just the tip of the iceberg for customization in Python.

In case you’re wondering if there’s an easy way to have Matplotlib sort the bars for you in increasing or decreasing order; *unfortunately, the answer is no, at least not right now.*

If you want to have your bars show in some specific order then you’re going to need to set the order yourself by sorting your data.

When you’re doing this though, make sure that you also sort your labels in the new order, so that you’re still labeling each bar correctly.

The easiest way to do sort by ascending order is by using a Pandas *(another Python programming library)* dataframe, like this:

import pandas as pd

df = pd.DataFrame({“sold”:carsSold,”type”:carCategories}).sort_values(“sold”, ascending=False).reset_index()

You can then also directly use the dataframe values in your code by replacing carsSold by df[“sold”] and carCategories by df[“type”], like so:

import matplotlib.pyplot as plt

import pandas as pd

df = pd.DataFrame({“sold”:carsSold,”type”:carCategories}).sort_values(“sold”, ascending=False).reset_index()

plt.figure(figsize=(8,8))

plt.bar(x=range(0,len(df[“sold”])),height=df[“sold”],color=”red”, edgecolor=”black”)

plt.xticks(range(0,len(df[“sold”])),df[“type”])

for i in range(len(df[“sold”])):

plt.text(x=i,y=df[“sold”][i]+2,s=str(df[“sold”][i]),ha=’center’,va=’center’,fontsize=12)

plt.title(“Cars types sold this year, N = “+str(sum(df[“sold”])))

plt.ylabel(“Units sold”)

plt.xlabel(“Car type”)

plt.savefig(“example basic bar graph colored and ordered.png”,bbox_inches=”tight”)

plt.show()

This outputs the following graph:

*And there you go!*

## Limitations of Bar Graphs

Bar graphs can be very powerful graphs because they’re very straightforward and simple to read. However, **this simplicity can also cause problems. **

Since bar graphs only show one value for each category, they don’t allow you to dive into the individual categories in more detail.

For example, imagine a survey where respondents had to rate different types of beverages on a scale of 1 – 10 based on how much they like the beverage.

If we wanted to do a bar graph with the data, we would then have to select the one value that we want to show for each category (since we can’t show multiple values).

So because of that, in an effort to consider a value that may be representative of your data, we decide to show the mean rating for each beverage, but that leads to the potential problem of faulty data analyses because you are only considering the average and not the full range of ratings were given to each drink.

My advice? **If each categorical value comes from a distribution of numbers and you’re just plotting one of those numbers (like the mean, median or mode), it may be better to compare your categories using box plots rather than using bar graphs.**

### When to Use Bar Graphs vs Box Plots

If your categories are just showing a count or you have a simple response like a ‘yes’ or ‘no’ then bar graphs are a great tool to use. There’s no variability that you need to show because all of that will be reflected in the height of the bar.

For the cases where each category can take on a range of values, such as a rating between 1-10, it may be a good idea to use box plots instead. This way, you can still compare different categories, and you also get a sense of the spread of values within each category.

If you want to read more about box plots and how to use them, **you can read more about them in my in-depth blog post here.**

And that’s about it on bar graphs!

The simple bar graph is definitely the most straightforward of all the data visualization types that we’ve covered thus far, and I hope this blog post has shed a little more light onto this type of graph for you!

## Want more free help on getting started with data science?

If becoming a data scientist sounds like something you’d like to do, and you’d like to learn more about how you can get started, **check out my free “How To Get Started As A Data Scientist” Workshop.**

We go through everything we’ve covered in this blog post in more detail, dispel some common misconceptions, and give you a roadmap and checklist of what you need to do to get started to working as a Data Scientist.