June 16, 2026 11 min to read

NumPy in One Post

The invisible foundation that every AI library is built on

In 2005, the Python scientific computing community was at war with itself.

Not a dramatic war. A quiet, frustrating, expensive one — the kind that wastes years of talented people’s time. There were two competing libraries that did essentially the same thing: Numeric, built in 1995 by Jim Hugunin, and Numarray, built in 2001 by a team at the Space Telescope Science Institute. Both gave Python fast numerical arrays. Both had passionate users. Neither was compatible with the other.

If you wrote code using Numeric, it didn’t work with Numarray. If a library you depended on used Numarray, you were stuck. The community was splitting in two, and neither half had enough momentum to win.

Travis Oliphant, a PhD student at Mayo Clinic who had been using both, decided he had seen enough. He took time away from his dissertation, read the source code of both libraries front to back, and spent the better part of a year merging them into a single unified package.

He called it NumPy.

That decision — one person, one year, one unified library — became the foundation on which nearly every major data science and AI tool in existence is built today.

Why Python Needed NumPy

Python is a beautiful language for writing code that humans can read. It is a terrible language for doing math at scale.

Here is why. When you create a Python list and put numbers in it, Python wraps each number in a full Python object. The number 42 is not just 42 — it is an object with a type, a reference count, methods you can call on it. That overhead is fine when you have 10 numbers. It becomes catastrophic when you have 10 million.

# Python list — slow for math
python_list = [1, 2, 3, 4, 5]

# Each element is a full Python object
# Iterating and doing math means Python overhead on every single one
result = [x * 2 for x in python_list]

NumPy solves this with one idea: store numbers as raw numbers, in a block of contiguous memory, all of the same type. No wrappers. No overhead. Just the numbers, packed together the way C does it.

import numpy as np

# NumPy array — fast for math
numpy_array = np.array([1, 2, 3, 4, 5])

# This runs in C, not Python — orders of magnitude faster
result = numpy_array * 2

The difference is not small. On large arrays, NumPy operations can be 100 to 400 times faster than equivalent Python loops. That gap is the entire reason the scientific computing world moved to NumPy and never looked back.

The Core Idea: The ndarray

Everything in NumPy is built around one data structure: the ndarray — short for N-dimensional array.

An ndarray can be 1-dimensional (a list of numbers), 2-dimensional (a grid — like a spreadsheet), 3-dimensional (a stack of grids), or as many dimensions as you need.

import numpy as np

# 1D array — a single row of numbers
v = np.array([1, 2, 3, 4, 5])
print(v.shape)  # (5,)

# 2D array — rows and columns, like a matrix
m = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])
print(m.shape)  # (3, 3)

# 3D array — a stack of matrices
# Think: 10 images, each 28x28 pixels
images = np.zeros((10, 28, 28))
print(images.shape)  # (10, 28, 28)

Those shapes — (5,), (3, 3), (10, 28, 28) — are how NumPy describes the structure of your data. You will see this shape notation everywhere in AI. When someone says “I have a tensor of shape (batch_size, sequence_length, hidden_dim)”, they are describing an ndarray.

Broadcasting: NumPy’s Superpower

The feature that takes the most getting used to — and becomes indispensable once you understand it — is broadcasting.

Broadcasting is NumPy’s ability to do math between arrays of different shapes without you writing a loop.

import numpy as np

# Add 10 to every element without a loop
arr = np.array([1, 2, 3, 4, 5])
result = arr + 10
print(result)  # [11 12 13 14 15]

# Add a different value to each column of a 2D array
matrix = np.array([[1, 2, 3],
                   [4, 5, 6]])

column_add = np.array([10, 20, 30])  # shape (3,) broadcasts across rows
result = matrix + column_add
print(result)
# [[11 22 33]
#  [14 25 36]]

In a Python loop this would be two nested for loops. In NumPy it is one line, running at C speed.

This matters enormously in AI because you constantly need to apply the same operation across thousands of data points at once. Broadcasting makes that natural.

The Math That Powers Neural Networks

Every neural network, at its most basic level, is doing one thing over and over: matrix multiplication. You take an input, multiply it by a matrix of weights, add a bias, apply an activation function, repeat.

NumPy makes this trivial:

import numpy as np

# Simulate a single layer of a neural network
np.random.seed(42)

# Input: 3 features for 5 data points  
X = np.random.randn(5, 3)

# Weights: 3 inputs -> 4 outputs
W = np.random.randn(3, 4)

# Bias: one per output
b = np.random.randn(4)

# Forward pass through one layer
# Matrix multiplication + bias
Z = X @ W + b   # @ is the matrix multiply operator

# Apply a simple activation (ReLU: max(0, x))
output = np.maximum(0, Z)

print("Input shape: ", X.shape)      # (5, 3)
print("Weight shape:", W.shape)      # (3, 4)
print("Output shape:", output.shape) # (5, 4)

This is, stripped to its essentials, what happens inside every dense layer of a neural network — in PyTorch, in TensorFlow, in scikit-learn. The frameworks give you the gradient computation and the training loop. Underneath, they are doing exactly this.

The Speed Difference, Made Visible

It is worth seeing the performance gap directly:

import numpy as np
import time

size = 1_000_000

# Python loop
python_list = list(range(size))
start = time.time()
python_result = [x ** 2 for x in python_list]
python_time = time.time() - start

# NumPy
numpy_array = np.arange(size)
start = time.time()
numpy_result = numpy_array ** 2
numpy_time = time.time() - start

print(f"Python loop: {python_time:.3f}s")
print(f"NumPy:       {numpy_time:.4f}s")
print(f"NumPy is {python_time / numpy_time:.0f}x faster")

On a typical machine you will see something like:

Python loop: 0.187s
NumPy:       0.0014s
NumPy is 134x faster

At a million elements, NumPy is 134 times faster. Scale that to the billions of operations a neural network performs during training and the difference is the reason deep learning is possible on practical hardware at all.

Essential NumPy in Ten Lines

The functions you will use constantly:

import numpy as np

# Create arrays
np.zeros((3, 4))           # all zeros
np.ones((3, 4))            # all ones
np.arange(0, 10, 2)        # [0, 2, 4, 6, 8]
np.linspace(0, 1, 5)       # 5 evenly spaced points from 0 to 1
np.random.randn(3, 3)      # random numbers, normal distribution

# Reshape — same data, different shape
arr = np.arange(12)
arr.reshape(3, 4)          # 12 elements as a 3x4 matrix
arr.reshape(2, 2, 3)       # or as a 3D array

# Statistics
arr.mean()                 # average
arr.std()                  # standard deviation
arr.min(), arr.max()       # extremes
arr.sum(axis=0)            # sum down columns (axis=0) or across rows (axis=1)

# Indexing — cleaner than Python lists
matrix = np.random.randn(5, 3)
matrix[0]                  # first row
matrix[:, 1]               # second column, all rows
matrix[matrix > 0]         # all positive values

Why This Is the Language of AI

Here is the part that surprises people when they first learn it: you can work with PyTorch or TensorFlow for months without realizing that you are basically using NumPy with extra features bolted on.

A PyTorch tensor is a NumPy array with:

GPU support (so operations run on a graphics card)
Automatic differentiation (so it can compute gradients for training)

A TensorFlow tensor is the same idea. Scikit-learn runs on NumPy arrays. Pandas is built on NumPy arrays. OpenCV uses NumPy arrays for images. When you load an image and a computer vision model processes it, that image is a 3D NumPy array: height × width × color channels.

NumPy is not one tool in the AI stack. It is the material the entire AI stack is made of.

Travis Oliphant spent a year merging two competing libraries because he was tired of the community being divided. He had no idea he was building the substrate on which the entire field of modern AI would run.

That is usually how the most important things get built — not as grand visions, but as solutions to problems someone was simply too frustrated to ignore any longer.

Next in this series: Pandas — the library that made working with real-world messy data bearable. Where it came from, the story behind its name, and why “dataframes” changed everything.

HenryBernreuter.com v1.0

NumPy in One Post

Why Python Needed NumPy

The Core Idea: The ndarray

Broadcasting: NumPy’s Superpower

The Math That Powers Neural Networks

The Speed Difference, Made Visible

Essential NumPy in Ten Lines

Why This Is the Language of AI

Pandas in One Post

Henry Bernreuter

NumPy in One Post

Why Python Needed NumPy

The Core Idea: The ndarray

Broadcasting: NumPy’s Superpower

The Math That Powers Neural Networks

The Speed Difference, Made Visible

Essential NumPy in Ten Lines

Why This Is the Language of AI

Pandas in One Post

Don't go yet!

LangChain in One Post

Python in One Post

Henry Bernreuter