array object can take many concrete types. It is perhaps a one-dimensional (1D) array of Booleans, or a three-dimensional (3D) array of 8-bit unsigned integers. Because the built-in perform isinstance()
will present, each array is an occasion of np.ndarray
, no matter form or the kind of components saved within the array, i.e., the dtype
. Equally, many type-annotated interfaces nonetheless solely specify np.ndarray
:
import numpy as np
def course of(
x: np.ndarray,
y: np.ndarray,
) -> np.ndarray: ...
Such kind annotations are inadequate: most interfaces have robust expectations of the form or dtype
of handed arrays. Most code will fail if a 3D array is handed the place a 1D array is anticipated, or an array of dates is handed the place an array of floats is anticipated.
Taking full benefit of the generic np.ndarray
, array form and dtype
traits can now be absolutely specified:
def course of(
x: np.ndarray[tuple[int], np.dtype[np.bool_]],
y: np.ndarray[tuple[int, int, int], np.dtype[np.uint8]],
) -> np.ndarray[tuple[int], np.dtype[np.float64]]: ...
With such element, current variations of static evaluation instruments like mypy
and pyright
can discover points earlier than code is even run. Additional, run-time validators specialised for NumPy, like StaticFrame‘s sf.CallGuard
, can re-use the identical annotations for run-time validation.
Generic Sorts in Python
Generic built-in containers comparable to record
and dict
might be made concrete by specifying, for every interface, the contained sorts. A perform can declare it takes a record
of str
with record[str]
; or a dict
of str
to bool
might be specified with dict[str, bool]
.
The Generic np.ndarray
An np.ndarray
is an N-dimensional array of a single aspect kind (or dtype
). The np.ndarray
generic takes two kind parameters: the primary defines the form with a tuple
, the second defines the aspect kind with the generic np.dtype
. Whereas np.ndarray
has taken two kind parameters for a while, the definition of the primary parameter, form, was not full specified till NumPy 2.1.
The Form Sort Parameter
When creating an array with interfaces like np.empty
or np.full
, a form argument is given as a tuple. The size of the tuple defines the array’s dimensionality; the magnitude of every place defines the scale of that dimension. Thus a form (10,)
is a 1D array of 10 components; a form (10, 100, 1000)
is a 3 dimensional array of measurement 10 by 100 by 1000.
When utilizing a tuple
to outline form within the np.ndarray
generic, at current solely the variety of dimensions can usually be used for kind checking. Thus, a tuple[int]
can specify a 1D array; a tuple[int, int, int]
can specify a 3D array; a tuple[int, ...]
, specifying a tuple of zero or extra integers, denotes an N-dimensional array. It is perhaps potential sooner or later to type-check an np.ndarray
with particular magnitudes per dimension (utilizing Literal
), however this isn’t but broadly supported.
The dtype
Sort Parameter
The NumPy dtype
object defines aspect sorts and, for some sorts, different traits comparable to measurement (for Unicode and string sorts) or unit (for np.datetime64
sorts). The dtype
itself is generic, taking a NumPy “generic” kind as a sort parameter. Essentially the most slim sorts specify particular aspect traits, for instance np.uint8
, np.float64
, or np.bool_
. Past these slim sorts, NumPy supplies extra common sorts, comparable to np.integer
, np.inexact
, or np.quantity
.
Making np.ndarray
Concrete
The next examples illustrate concrete np.ndarray
definitions:
A 1D array of Booleans:
np.ndarray[tuple[int], np.dtype[np.bool_]]
A 3D array of unsigned 8-bit integers:
np.ndarray[tuple[int, int, int], np.dtype[np.uint8]]
A two-dimensional (2D) array of Unicode strings:
np.ndarray[tuple[int, int], np.dtype[np.str_]]
A 1D array of any numeric kind:
np.ndarray[tuple[int], np.dtype[np.number]]
Static Sort Checking with Mypy
As soon as the generic np.ndarray
is made concrete, mypy
or comparable kind checkers can, for some code paths, establish values which are incompatible with an interface.
For instance, the perform under requires a 1D array of signed integers. As proven under, unsigned integers, or dimensionalities apart from one, fail mypy
checks.
def process1(x: np.ndarray[tuple[int], np.dtype[np.signedinteger]]): ...
a1 = np.empty(100, dtype=np.int16)
process1(a1) # mypy passes
a2 = np.empty(100, dtype=np.uint8)
process1(a2) # mypy fails
# error: Argument 1 to "process1" has incompatible kind
# "ndarray[tuple[int], dtype[unsignedinteger[_8Bit]]]";
# anticipated "ndarray[tuple[int], dtype[signedinteger[Any]]]" [arg-type]
a3 = np.empty((100, 100, 100), dtype=np.int64)
process1(a3) # mypy fails
# error: Argument 1 to "process1" has incompatible kind
# "ndarray[tuple[int, int, int], dtype[signedinteger[_64Bit]]]";
# anticipated "ndarray[tuple[int], dtype[signedinteger[Any]]]"
Runtime Validation with sf.CallGuard
Not all array operations can statically outline the form or dtype
of a ensuing array. Because of this, static evaluation won’t catch all mismatched interfaces. Higher than creating redundant validation code throughout many features, kind annotations might be re-used for run-time validation with instruments specialised for NumPy sorts.
The StaticFrame CallGuard
interface gives two decorators, verify
and warn
, which elevate exceptions or warnings, respectively, on validation errors. These decorators will validate type-annotations in opposition to the traits of run-time objects.
For instance, by including sf.CallGuard.verify
to the perform under, the arrays fail validation with expressive CallGuard
exceptions:
import static_frame as sf
@sf.CallGuard.verify
def process2(x: np.ndarray[tuple[int], np.dtype[np.signedinteger]]): ...
b1 = np.empty(100, dtype=np.uint8)
process2(b1)
# static_frame.core.type_clinic.ClinicError:
# In args of (x: ndarray[tuple[int], dtype[signedinteger]]) -> Any
# └── In arg x
# └── ndarray[tuple[int], dtype[signedinteger]]
# └── dtype[signedinteger]
# └── Anticipated signedinteger, supplied uint8 invalid
b2 = np.empty((10, 100), dtype=np.int8)
process2(b2)
# static_frame.core.type_clinic.ClinicError:
# In args of (x: ndarray[tuple[int], dtype[signedinteger]]) -> Any
# └── In arg x
# └── ndarray[tuple[int], dtype[signedinteger]]
# └── tuple[int]
# └── Anticipated tuple size of 1, supplied tuple size of two
Conclusion
Extra might be completed to enhance NumPy typing. For instance, the np.object_
kind might be made generic such that Python sorts contained in an object array might be outlined. For instance, a 1D object array of pairs of integers might be annotated as:
np.ndarray[tuple[int], np.dtype[np.object_[tuple[int, int]]]]
Additional, models of np.datetime64
can not but be statically specified. For instance, date models might be distinguished from nanosecond models with annotations like np.dtype[np.datetime64[Literal['D']]]
or np.dtype[np.datetime64[Literal['ns']]]
.
Even with limitations, fully-specified NumPy kind annotations catch errors and enhance code high quality. As proven, Static Evaluation can establish mismatched form or dtype
, and validation with sf.CallGuard
can present robust run-time ensures.