NumPy Complete Guide (2025 Edition)

1 · Overview & Latest Version

NumPy (Numerical Python) is the foundational package for n‑dimensional array computing & scientific computation in Python. ⚑ Current stable version: 2.2.5 (19 Apr 2025)

Why a 2.x branch?
Breaking API changes introduced in 2.0.0 (Sept 2024) modernised the dtype system, simplified legacy behaviour, and aligned NumPy with PEP 684’s isolated interpreters. Key highlights:

New StringDType & extensible DType API
Removal of long‑deprecated aliases ( np.int, np.float …)
Typed @ operator defaults to matrix‑multiply rules
Cleaner C‑API & symbol versioning

2 · Installation & Environment Setup

2.1 Quick install

# CPython >= 3.9
python ‑m pip install ‑U numpy

2.2 Performance‑optimised wheels

Official wheels are compiled with OpenBLAS. For AVX‑512/ARM NEON tuned builds:

python ‑m pip install ‑U numpy‑{version}‑+mkl‑avx512‑cp310‑win_amd64.whl

2.3 From source (advanced)

git clone https://github.com/numpy/numpy
cd numpy
python ‑m pip install ‑r requirements.txt
python ‑m pip install ‑e .

3 · ndarray Fundamentals

3.1 What is an `ndarray`?

Homogenous contiguous (or Strided) block of memory
Metadata: shape, dtype, strides, ndim, itemsize
Vectorised operations via SIMD‑friendly ufuncs

3.2 Creating arrays


import numpy as np
arr = np.array([1, 2, 3], dtype=np.int32)
z   = np.zeros((2, 3))
iden = np.eye(4, k=0)    # identity
rnd = np.random.default_rng().normal(size=(2, 2))

3.3 Key attributes


arr.shape    # (3,)
arr.ndim     # 1
arr.dtype    # int32
arr.strides  # (4,) bytes between elements

4 · Data Types (`DType`)

4.1 Built‑in numerics

NumPy ships signed/unsigned ints (8–64 bit), floats (16–128 bit), complex, bool, and datetime/timedelta.

4.2 `StringDType` & Flexible types (2.x)

The new np.dtype("string") supports variable‑length UTF‑8 internally; brings parity with Pandas StringArray.

4.3 Custom dtypes


from numpy.dtypes import DTypeMeta

class RGB8(np.dtype, metaclass=DTypeMeta):
    def __new__(cls):
        return np.dtype([("r","u1"),("g","u1"),("b","u1")])

5 · Indexing, Slicing & Iterating

5.1 Basic slicing syntax

arr[ start:stop:step ]
arr[:, 1]     # column‑wise
arr[::‑1]     # reverse

5.2 Boolean & fancy indexing


mask = arr % 2 == 0
arr[mask]                # even numbers
idx = [np.newaxis, [0,2]]
arr[idx]                 # pick rows 0 & 2, keep rank

5.3 Memory views vs copies

Slices are views; fancy‑indexing returns a copy. Use arr.flags.writeable = False to create a read‑only mmap‑safe view.

6 · Broadcasting Rules

Broadcasting expands arrays of smaller shape to match larger ones without copying. Rules: align from the trailing dimension; dimensions of size 1 can be stretched; mismatch leads to ValueError.


a = np.arange(3)          # shape (3,)
b = np.arange(6).reshape(2,3) # (2,3)
a + b                     # result shape (2,3)

7 · Universal Functions (`ufunc`)

7.1 Vectorised math


angles = np.linspace(0, 2*np.pi, 360)
y = np.sin(angles)        # SIMD‑optimised

7.2 Reduction & accumulation


np.add.reduce(arr)
np.multiply.accumulate(arr)
np.subtract.outer(a, b)

7.3 Writing custom ufuncs (numpy.frompyfunc)


def deg2rad(x): return x*np.pi/180
rad_ufunc = np.frompyfunc(deg2rad, 1, 1)
rad_ufunc([0, 90, 180])

8 · Vectorisation & Performance Tuning

Avoid Python for‑loops; rely on ufuncs.
Use np.einsum for complex contractions.
Enable SIMD via NUMPY_EXPERIMENTAL_ARRAY_FUNCTION = 1.
Leverage np.lib.stride_tricks for windowing/rolling computations.
Profile with %timeit, py-spy, or np.benchmark (new in 2.1).


rolling = np.lib.stride_tricks.sliding_window_view(arr, window_shape=4)
means   = rolling.mean(axis=-1)

9 · Linear Algebra (`numpy.linalg`)

9.1 Core routines


A = np.random.default_rng().uniform(size=(3,3))
eigvals, eigvecs = np.linalg.eig(A)
q, r = np.linalg.qr(A)
x = np.linalg.solve(A, b)
np.linalg.svd(A, full_matrices=False)

9.2 BLAS/LAPACK back‑end

NumPy delegates heavy linear algebra to BLAS/LAPACK. Compile with OpenBLAS, MKL or the high‑performance BLIS back‑end if you require multi‑threaded speed‑ups.

10 · Random Number Generation

10.1 PCG64 bit‑generator


rng = np.random.default_rng(seed=42)
rng.integers(0, 10, size=5)
rng.normal(loc=0, scale=1, size=(2,2))

10.2 Statistical helpers


np.random.Generator.binomial
np.random.Generator.choice

11 · Statistics & Descriptive Methods

For CPU‑vectorised descriptive statistics NumPy provides:


data.mean(axis=0)
np.nanmean(data)
data.std(ddof=1)
np.percentile(data, [25, 50, 75], method="nearest")

In 2.2 the nanquantile algorithm now supports method="hazen" & "weibull".

12 · Structured & Record Arrays


dt = np.dtype([("time","f8"),("lat","f4"),("lon","f4")])
gps = np.empty(1_000_000, dtype=dt)
gps["lat"] += 1.0        # vectorised field operation

Structured arrays behave like typed tables; each field is a view into the underlying buffer, enabling memory‑efficient column operations.

13 · I/O & Memory Mapping

np.loadtxt, genfromtxt for CSVs
np.save / np.load for .npy format
np.memmap for out‑of‑core arrays
New: numpy.lib.format.v3 adds chunked‑storage & compression options.


m = np.memmap("big.dat", dtype="float32", mode="r", shape=(10000, 10000))

14 · Interoperability & Ecosystem

NumPy forms the common array API baseline: pandas, xarray, Polars, scipy, scikit‑learn, and GPU drops‑in (cupy, torch.Tensor) all expose __array_interface__ or __array__.

Array API standard 2024 edition is fully supported in NumPy 2.1.

15 · Migrating 1.x → 2.x

15.1 Deprecation checker


python ‑m numpy.f2py ‑‑check‑2‑compat project/

15.2 Common breakers

Remove np.bool, np.int … aliases
Replace np.random.RandomState with np.random.default_rng
Update Cython pinned C‑API includes to Numpy 2.0 ABI

16 · Best Practices & Pitfalls

Avoid dtype=object unless absolutely necessary
Profile memory with np.show_config() & sys tools
Use __array_priority__ in subclassing to control op dispatch
Prefer np.isclose over direct floating‑point equality
Document shapes (typing ≈ ndarray[Any, Shape[N, M]]) using PEP 646 type hints

17 · Further Resources

Happy vectorising!