NumPy – Fast and Powerful Math in Python

NumPy – Fast and Powerful Math in Python

What is NumPy?

NumPy is a library in Python that was designed to bring numerical computing to Python. It is best used to manage arrays, and to handle operations over arrays with speed. NumPy arrays are also used by a number of machine learning algorithms to improve processing speed over using DataFrames in pandas.

a blue cube an the word NumPy - the numpy logo

Check out my article on pandas here to learn more about DataFrames. You can learn more about the origins of the NumPy library here. The documentation page for NumPy is here for those looking for more technical information or details on any of the functions below. The Google Colab notebook for this article can be found here.

Dealing with Data in NumPy – Data Structures, Data Types, and Array Math

The real power of the NumPy library is in its ability to handle data in an array structure. Not only is it used to create arrays and matrices – which have a similar structure, but are different – but it has built in functions to handle operations between arrays, including arithmetic, comparisons, and data manipulation in the array structure.

Array Structure

Arrays basically consist of a series of lists, contained in square brackets inside of the np.array() function. To understand this a bit more, lets learn about lists in Python. Lists are a native data structure in Python which are fairly self explanatory in terms of purpose. Here’s the code for how to declare and print a list: 

In:
list = ['put', 'stuff', 'here']
print(list)

Out:
['put', 'stuff', 'here']

Learn more about lists here and here

One major benefit of NumPy is that is uses significantly less memory than standard list in Python. A One-dimensional NumPy array can hold the same data as a Python list. NumPy achieves this because it stores all the data as one datatype (array) while a list in Python stores each individual point separately. See a more detailed explanation of that here.

NumPy adds another layer to this by allowing us to generate and store a series of lists in an array. Here is the code for declaring a Two-Dimensional array:

In:
test_array = np.array([['if', 'you', 'put'], ['stuff', 'here', 'it'], 
                        ['shows', 'up', 'here']])
print(test_array)

Out:
[['if' 'you' 'put']
 ['stuff' 'here' 'it']
 ['shows' 'up' 'here']]

Arrays can handle a variety of data types, too. You can see a complete list here. You can nest arrays inside of np.array to create arrays of various dimensions, all while preserving the memory savings over lists in Python.

Slicing Data – Selecting Data by Index Reference

NumPy array data can be sliced or referenced using row and column indexes. Indexes in Python start at 0 or null for the first column or row in an array. The ending index is not inclusive. Lets take a look at the syntax or code to slice NumPy array data and then I’ll explain more:

test_slice = test_array[:,0:1]

This code takes a slice from test_array as defined above. Slicing via Indexes is done this way – array_name[starting_row:ending_row, starting_column:ending_column]. If you want to select to and from the end of a set, you can also leave it blank. Here are some examples:

test_slice = test_array[:,:] # Select Everything 
test_slice = test_array[:,0:1] # Select all data in column 1 
test_slice = test_array[0:1] # Select all data in row 1 
test_slice = test_array[0:2,2:3] # Select all data starting in rows 1 and 2, 
                                            # but only the last column

See the outputs in the Colab Notebook for this article.

You can also see a bit more details on slicing arrays using Array Indexing on this article here.

Other Data Structures

Matrix – numpy.matrix()

A matrix in NumPy is simply a specialized form of an array that retains its structure through operations. I’ll cover operations on arrays later on this this article. It has operators specifically for use with matrices.

Vectors

A vector is essentially a row in an array. Arrays are a series of vectors, which can indicate a overall trend of pattern as a whole, while a vector is singular. A vector is simply part of that larger whole. Here’s the code for creating a matrix and slicing a vector out of the test_array defined above.

In:
# Matrix
test_matrix = np.matrix(test_array)
print(test_matrix)

# Vector
test_vector = test_array[1:2] # Select all data in row 2
print(test_vector)

Out:
[['if' 'you' 'put']
 ['stuff' 'here' 'it']
 ['shows' 'up' 'here']]
[['stuff' 'here' 'it']]

Dealing with DataTypes

NumPy data structures can handle a variety of datatypes. You can also change data the data type of an array relatively easily. You do need to be careful when doing this to make sure you are converting to a datatype that wont cause errors.
For example, if we were to try to convert test_array to a numerical datatype, it would cause an error. This is specifically what would happen:
In:
example = test_array.astype('int64')
print(example)

Out:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-35edaf16e7d3> in <module>()
----> 1 example = test_array.astype('int64')
      2 print(example)

ValueError: invalid literal for int() with base 10: 'if'

It should be obvious that strings / lines of text wont convert into numbers. numpy.astype() is able convert numbers that may have mistakenly been assigned as any object datatype into numberical values, however. Here is a snippet of code showing this:

In:
to_int = np.array([['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']])
print(to_int)
print()
to_int = to_int.astype('int64')
print(to_int)

Out:
[['1' '2' '3']
 ['4' '5' '6']
 ['7' '8' '9']]

[[1 2 3]
 [4 5 6]
 [7 8 9]]

Structured Data

You can also use structured arrays to get something a bit closer to pandas DataFrames. Bascially, this allows us to create a custom datatype for an array, then store values in it. This article from Jake VanderPlas covers this very well in easy to read and understand form, so please check out his article to learn more about Structured Arrays in NumPy. Here’s a short snippet from his article:

In:
# from https://jakevdp.github.io/PythonDataScienceHandbook/02.09-structured-data-numpy.html

name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

x = np.zeros(4, dtype=int)

# Use a compound data type for structured arrays
data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)

data['name'] = name
data['age'] = age
data['weight'] = weight
print(data)

# Get all names
data['name']

Out:
[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]
[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )
 ('Doug', 19, 61.5)]
array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')

Array Math

You can do a whole variety of mathematical operations on arrays as well. This can be used to combine arrays, divide entire arrays by another array, or used in a number of other ways, too. There is a great article on pluralsight.com that provides an in-depth review of all the basic NumPy operations. There is also a whole set of more complicated mathematical functions that can be done using NumPy arrays. Check out this page in the NumPy documentation to learn more. [Add more here on the basic functions for array math]

Scaling – example of array of food in gram and vector of calories per gram (but come up with your own intersting take on this) – or come up with something different.

NumPy can do a lot – but how can is it used everyday?

Image Analysis and prediction

The memory benefits of NumPy make NumPy arrays a great data structure to feed into machine learning. I recently completed a project using Python, NumPy, and a few other libraries to predict the age of a person in an image. NumPy arrays were used to contain numerical values representing the shade and color values in each pixel of the images. We then trained an algorithm based using the real age as a target. The model ended up being able to accurately predict the age of people in other images. The Python Image Library (PIL) takes in an image and converts the values. We can then store those in a NumPy array. Here’s some code showing that example:

In:
image = Image.open('/content/drive/MyDrive/Data Business/Blog/Notebooks for Articles/Datasets/Male-Face-Transparent.png')
array = np.array(image)

array = array / 255
print(array)


plt.imshow(array, cmap='gray')
plt.colorbar()

Out:
[[[0.         0.         0.         0.        ]
  [0.         0.         0.         0.        ]
  [0.         0.         0.         0.        ]
  ...
  [0.         0.         0.         0.        ]
  [0.         0.         0.         0.        ]
  [0.         0.         0.         0.        ]]

 [[0.         0.         0.         0.        ]
  [0.         0.         0.         0.        ]
  [0.         0.         0.         0.        ]
  ...
  [0.         0.         0.         0.        ]
  [0.         0.         0.         0.        ]
  [0.         0.         0.         0.        ]]

 [[0.         0.         0.         0.        ]
  [0.         0.         0.         0.        ]
  [0.         0.         0.         0.        ]
  ...
  [0.         0.         0.         0.        ]
  [0.         0.         0.         0.        ]
  [0.         0.         0.         0.        ]]

 ...

 [[0.11372549 0.11372549 0.14901961 0.9372549 ]
  [0.10980392 0.10980392 0.14509804 1.        ]
  [0.10980392 0.10980392 0.14117647 1.        ]
  ...
  [0.21568627 0.21176471 0.23137255 0.59215686]
  [0.12941176 0.12941176 0.1372549  0.47843137]
  [0.02352941 0.02352941 0.02352941 0.36078431]]

 [[0.12941176 0.1254902  0.15686275 0.96862745]
  [0.1254902  0.1254902  0.15686275 1.        ]
  [0.12156863 0.1254902  0.16078431 1.        ]
  ...
  [0.20784314 0.2        0.22352941 0.60784314]
  [0.12156863 0.11372549 0.12941176 0.49411765]
  [0.02352941 0.01960784 0.01960784 0.37647059]]

 [[0.12941176 0.1254902  0.16078431 0.83529412]
  [0.12941176 0.12941176 0.16470588 0.9372549 ]
  [0.12941176 0.1372549  0.17254902 0.96862745]
  ...
  [0.16862745 0.16078431 0.18039216 0.59215686]
  [0.09411765 0.09019608 0.10196078 0.49411765]
  [0.01568627 0.01568627 0.01568627 0.38431373]]]
<matplotlib.colorbar.Colorbar at 0x7f9a217092d0>
See the full output in the article notebook here, under “NumPy and PIL”. NumPy arrays can also be used to feed data to any machine learning algorithm that accept NumPy arrays as a data structure. Chances are that most facial recognition systems either use or were developed in the past using the speed provided by NumPy array structures. This can be especially helpful for more complex models used for Deep Learning and Convolutional Neural Networks.

Go out and learn what else NumPy offers!

NumPy is an essential library to use when you need to do algebraic equations across large-scale datasets. If it exists in math, it can likely be done with a function in NumPy, with speed, efficiency, and not much more than the understanding required to know what calculations you need to run. This is what makes NumPy one of the best libraries in Python.

To learn more about NumPy, check out the documentation here.

___

Check out the links below for great resources to learn Python and NumPy – with real-life projects and examples.

Sign up for DataCamp today – https://www.datacamp.com/promo/build-the-future-2022 – Get 53% off for your first year

Practicum by Yandex – https://practicum.yandex.com/

Pathstream – https://www.pathstream.com