An Introduction to Matplotlib – Python’s Data Visualization Library

An Introduction to Matplotlib – Python’s Data Visualization Library

What is matplotlib?

Matplotlib is a popular open-source library for data visualization in Python. It provides a variety of functions and tools for creating a wide range of plots and charts, including line plots, scatter plots, bar plots, histograms, pie charts, and more using the PyPlot Method.

official graphic for matplotlib library

One of the main features of matplotlib is its ability to create high-quality plots and charts with a simple and intuitive interface. Users can easily define the data to be plotted, the type of plot to be created, and various formatting options such as colors, line styles, and plot titles. Matplotlib also provides a number of customization options, allowing users to fine-tune the appearance of their plots and charts.

In addition to creating static plots and charts, matplotlib also provides tools for creating interactive plots and visualizations. Users can use matplotlib’s event handling and animation functions to create interactive plots that respond to user input or change over time.

Matplotlib is widely used in a variety of fields, including data analysis, scientific computing, and machine learning. It is often used in conjunction with other libraries such as NumPy and Pandas for data manipulation and analysis, and with libraries such as SciPy and scikit-learn for statistical analysis and machine learning.

PyPlot: Customization

PyPlot is the main plotting tool in the matplotlib library. It has many features that allow you to create visualizations directly in programs like jupyter notebook or Google Collab or whatever coding environment you work in. One of the basic features that is important to master is how to customize a chart. PyPlot allows you to customize many of the chart elements. Below is an example of the basic functionality.

import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

# Create the figure and axes objects
fig, ax = plt.subplots()

# Plot the data
ax.plot(x, y, 'o-', color='blue', linewidth=2, markersize=10)

# Customize the x and y axis labels
ax.set_xlabel('Stuff along the X')
ax.set_ylabel('Things along the Y')

# Customize the title
ax.set_title('Customized Chart')

# Customize the grid
ax.grid(True, linestyle='--', color='gray', alpha=0.7)

# Show the plot
plt.show()

This code creates a simple line chart with sample data. The chart is customized by setting the labels for the x and y axes, the title of the chart, the color and style of the line and markers, and the appearance of the grid. You can adjust the properties like color, linewidth, markersize, label and title etc to customize the chart. For more information on chart customization in PyPlot, check out this article I found with code examples over on Python Graph Gallery

PyPlot: Plot Types

PyPlot can produce a variety of different chart types as well. This can be useful for looking at your dataset in different ways or when needing to make different types of comparisons. Its quite likely we all remember from grade school that a line graph shows trends over time or relationships between two variables and that Bar or Column graphs are meant to show comparisons. In the Data world we use a variety of less common charts including histograms, scatter charts, and various metric graphs to demonstrate model effectiveness. Here is an example of code that demonstrates different plot types that the PyPlot library can produce:

import matplotlib.pyplot as plt
import numpy as np

# Sample data
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

# Create the figure and axes objects
fig, axs = plt.subplots(2, 2)

#Line plot
axs[0, 0].plot(x, y1, '-', color='blue', label='sin(x)')
axs[0, 0].plot(x, y2, '-', color='red', label='cos(x)')
axs[0, 0].set_title('Line Plot')
axs[0, 0].legend(loc="best")

#Scatter plot
axs[0, 1].scatter(x, y1, color='blue', label='sin(x)')
axs[0, 1].scatter(x, y2, color='red', label='cos(x)')
axs[0, 1].set_title('Scatter Plot')
axs[0, 1].legend(loc="best")

#Bar plot
axs[1, 0].bar(x, y1, color='blue', label='sin(x)')
axs[1, 0].bar(x, y2, color='red', label='cos(x)')
axs[1, 0].set_title('Bar Plot')
axs[1, 0].legend(loc="best")

#Histogram plot
axs[1, 1].hist(y1, bins=20, color='blue', histtype='bar', label='sin(x)')
axs[1, 1].hist(y2, bins=20, color='red', histtype='bar', label='cos(x)')
axs[1, 1].set_title('Histogram Plot')
axs[1, 1].legend(loc="best")

plt.tight_layout()
plt.show()

This code creates a 2×2 grid of subplots. The top left subplot shows a line plot of the sin(x) and cos(x) functions, the top right subplot shows a scatter plot of the same data, the bottom left subplot shows a bar plot and the bottom right subplot shows a histogram of the same data. It also shows the title, legend and x and y axis labels of each plot. You can experiment with different plot types and customize them as demonstrated above.

PyPlot: Annotation

Pyplot can also be used to annotate a chart, which can be helpful when you are trying to accent certain information when presenting it to stakeholders. The chart below uses this function to callout the Maximum and Minimum values presented on the graph.

import matplotlib.pyplot as plt
import numpy as np

# Sample data
x = np.linspace(0, 2*np.pi, 100)
y = np.sin(x)

# Create the figure and axes objects
fig, ax = plt.subplots()

# Plot the data
ax.plot(x, y, '-', color='blue')

# Annotate the maximum value
max_val = max(y)
max_ind = np.argmax(y)
ax.annotate(f'Max: {max_val:.2f}', xy=(x[max_ind], max_val), xytext=(x[max_ind]+0.1, max_val+0.2),
            arrowprops=dict(facecolor='red', shrink=0.05))

# Annotate the minimum value
min_val = min(y)
min_ind = np.argmin(y)
ax.annotate(f'Min: {min_val:.2f}', xy=(x[min_ind], min_val), xytext=(x[min_ind]-0.3, min_val-0.2),
            arrowprops=dict(facecolor='green', shrink=0.05))

# Customize the x and y axis labels
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')

# Show the plot
plt.show()
standard sine wave chart with two arrows indicating the minimum and maximum of the range of data.

Visualization with pandas

matplotlib can also be used alongside other libraries like pandas. I covered pandas in a previous article here. Integrating the use of matplotlib with pandas allows you to activate the power of the pandas DataFrame structure that is extremely versitile and user friendly. Here’s an example of code on how to get started with matplotlib and pandas together

import matplotlib.pyplot as plt
import pandas as pd

# Create a sample dataframe
data = {'name': ['John', 'Jane', 'Mike', 'Emily', 'Adam'],
        'age': [35, 28, 32, 42, 25],
        'income': [50000, 60000, 55000, 70000, 35000]}
df = pd.DataFrame(data)

# Use the 'plot' function of the dataframe to create a bar chart
df.plot(kind='bar', x='name', y='income', color='blue')

# Add labels and title
plt.xlabel('Name')
plt.ylabel('Income')
plt.title('Income by Name')

# Show the plot
plt.show()
bar graph comparing salaries

Takeaway: Matplotlib is a powerful library for data visualization.

Overall, matplotlib is a powerful and widely-used library for data visualization in Python. Its simple and intuitive interface, along with a range of customization options, make it a popular choice for creating high-quality plots and charts.

Get more information and code examples on matplotlib in the official documentation here.

The notebook with all the code examples I wrote for this article can be found here.

All the code examples from my articles are also available via my GitHub.

Similar Posts


Last Updated On: