The Great thing about House-Filling Curves: Understanding the Hilbert Curve

0. Introduction

(SFC) are fascinating mathematical constructs with many sensible functions in knowledge science and knowledge engineering. Whereas they could sound summary, they’re usually hiding in plain sight—behind phrases like Z-ordering or Liquid Clustering (used, for instance, in platforms like Databricks). Should you’ve labored with large-scale knowledge platforms, likelihood is you’ve already used SFCs with out realizing it.

Regardless of its relevance in trendy methods, data on this subject is usually fragmented, making it tough to bridge principle and follow. This text goals to bridge that hole, whereas specializing in the Hilbert curve.

My aim is to supply a condensed and accessible overview of SFCs: beginning with their mathematical origins, shifting by means of sensible implementation methods, and ending with real-world functions in knowledge processing and optimization. It’s not the plan to interchange present sources however reasonably reference them for extra detailed data. Additional sources for terminology and particulars might be referenced all through the textual content.

You may ask: What’s so fascinating about curves? In any case, an everyday curve is straightforward to grasp and possibly not the primary subject I might choose up a guide about. However SFCs are completely different. They traverse each level in a steady house, have fractal properties, and produce visually hanging patterns when plotted in 2D or 3D-especially in decrease iterations. So, allow us to take a more in-depth look.

(If you wish to begin with visualization and animations immediately, take a look at my GitHub repository)

1. Historical past and Concept of House-Filling Curves

The examine of SFCs dates again to the nineteenth century, when Georg Cantor made a groundbreaking discovery. He confirmed that ”two finite-dimensional easy manifolds have the identical cardinality, no matter their dimensions.” [1]

As an example this, take into account the unit interval [0, 1] ⊂ R and the unit sq. [0, 1]² ⊂ R². Intuitively, one may anticipate the sq. to have a bigger cardinality than the road phase. Nevertheless, Cantor demonstrated that each units even have the identical cardinality, utilizing his technique of interleaving decimals.

This end result implies the existence of a bijection between the interval and the sq., which means there’s a one-to-one correspondence between their components. Following Cantor’s discovery, a pure query arose: Is there additionally a steady bijection between these units? Eugen Netto answered this query within the unfavorable.

On this context, continuity might be interpreted geometrically: a steady mapping would enable one to “draw” the picture in 2D or 3D with out lifting the pen – forming a curve. This perception laid the groundwork for the later growth of SFCs — curves that, whereas steady, can come arbitrarily near filling a higher-dimensional
house.

2. Peano Curve: The Discovery of House-Filling Curves

After Netto’s sobering discovery, the query arose as as to if such a mapping, if not bijective, could possibly be surjective. The primary one who was in a position to outline such a mapping was G. Peano, setting up the so-called Peano curve.

The Peano curve is outlined recursively. Its area is the unit interval [0, 1] ⊂ R, and its picture lies within the unit sq. [0, 1]² ⊂ R². By repeatedly subdividing the interval [0, 1] into thirds, and correspondingly partitioning the sq. in R² right into a 3 × 3 grid, the development converges to the precise space-filling curve because the variety of iterations tends to infinity. [1]

Determine 1: Peano curve of order 1,2 and three (from left to proper).
The picture of the Peano curve of order 1 is copied and mirrored in greater orders. It may be noticed that the essential sample of the first-order Peano curve reappears in greater orders, however is mirrored in each second iteration. This alternating technique of mirroring and rotating the essential aspect is a function shared by different SFCs as effectively.
(Picture from Wikipedia beneath public area license, modified by writer)

Thus, the graphs of the Peano curve at finite iterations (Determine 1) don’t symbolize the “closing” SFC. Solely within the restrict, because the variety of iterations of this recursive mapping approaches infinity, does the precise SFC emerge, which traverses each level in [0, 1]². Visually, on this restrict, the curve would primarily seem as a crammed sq. spanning from (0, 0) to (1, 1)

This statement raises an initially counterintuitive query: By definition, a curve is one-dimensional. Whereas it may be embedded in a higher-dimensional house (n > 1), its intrinsic parameter area stays one-dimensional. But, if the Peano curve passes by means of each level in [0, 1]² and thus fully fills the airplane, can its picture nonetheless be thought to be one-dimensional? The reply is not any: the picture of the Peano curve has Hausdorff dimension 2. One other attribute of an SFC is that its picture has constructive Jordan content material (Peano-Jordan Measure). These info could appear shocking, but it surely aligns with the properties of fractals: many such units have Hausdorff dimensions larger than 1, and a few even non-integer Hausdorff dimensions.

3. The Hilbert Curve – In style until right now!

Though Peano was the primary to assemble an SFC, a way more well-known instance is the Hilbert curve, outlined by David Hilbert in 1891. Its definition is barely easier and begins with a 2 x 2 grid. Just like the Peano curve, the mapping of the Hilbert curve recursively subdivides every interval in [0, 1] and every sq. in [0, 1]² into 4 smaller intervals/squares at every step. As with the Peano curve, the Hilbert curve converges to a real SFC within the restrict because the variety of iterations approaches infinity.

Determine 2: The fundamental unit on the left (order 1) is repeated to construct higher-order Hilbert curves. Nevertheless, the required transformations (reminiscent of mirroring and rotation) are extra complicated than within the case of the Peano curve.
*(Picture by writer)*

For the needs of this text, we are going to deal with the Hilbert curve, as its properties make it a worthwhile device in trendy knowledge platforms.

3.1 Formal Definition of the Hilbert Curve

Beginning with the interval [0,1] because the area of the Hilbert curve, every recursion step divides the present interval into 4 equal subintervals: a is the left endpoint and h the interval width, the subintervals are:

Splitting intervals in [0,1]. (Formular from [2], Picture by writer)

For any chosen level in [0, 1], precisely considered one of these subintervals comprises the purpose. This interval can then be subdivided once more utilizing the identical rule, producing a finer interval that also comprises the purpose. This course of might be continued infinitely, yielding an arbitrarily exact location of the purpose alongside the curve. The identical recursive subdivision is utilized in [0, 1]² in parallel, splitting every sq. into 4 smaller squares:

Splitting quadrants in [0,1]². (Formular from [2], Picture by writer)

Normal properties:

Surjective: From its recursive definition it follows that the Hilbert curve is surjective: each level in [0, 1]² is roofed within the restrict. The nested intervals are compact, and adjoining intervals share boundary factors (e.g., a + h/4 is each the proper endpoint of the primary subinterval and the left endpoint of the second).
Thus your entire sq. is crammed. The mapping, nonetheless, is just not injective—makes an attempt to implement bijectivity (e.g., by opening intervals) break continuity.
Steady: This property is evident from visible representations: the curve might be drawn with out lifting the pen. Formally, it may be established by exhibiting that the Hilbert curve arises because the uniform restrict of steady features, and uniform convergence preserves continuity.
Nowhere differentiable: By taking a look at graphs of the Hilbert Curve it’s apparent that this curve is just not
differentiable. A proof for this property was given by H.Sagan utilizing the distinction quotient.
Locality preserving: In distinction to easier mappings such because the Z-order curve, the Hilbert curve tends to protect locality: factors which are shut within the one-dimensional parameter are sometimes mapped to close by. This facet is essential for functions in large knowledge platforms.
Optimistic Jordan Content material: Within the restrict of infinitely many iterations, the picture of the Hilbert curve has constructive Jordan measure, which means that it occupies a nonzero space of the airplane. (Peano-Jordan Measure)
Hausdorff Dimension of two: Correspondingly, the Hilbert curve doesn’t behave like a typical one-dimensional line, however has Hausdorff dimension 2, reflecting that it totally fills the unit sq..

Regardless that, early definitions of the Hilbert Curve are approached in 2D, greater dimensions are additionally possible. The algorithm we talk about within the subsequent part works in any finite dimension.

4 Computing the Hilbert Curve With Skilling’s Algorithm

The definition of the Hilbert Curve was given in a geometrical method with out an algebraic definition for computing coordinates on a given grid, for a given level in I. It took nearly 100 years after Hilbert launched his concept earlier than mathematicians thought of methods the right way to compute factors for a given Hilbert index. Who may blame them? In any case, for a very long time there have been no computer systems that would draw curves with tons of or hundreds of factors. Whereas researching I found a number of methods the right way to compute the Hilbert curve – from complicated numbers to L-Techniques. Whereas some are tremendous in depth, others protect the iterative strategy for computing single factors of the curve. What I used to be on the lookout for was one thing easy:

A perform that takes a Hilbert index (i.e. any numbers like 1,2,3 in 1D house) and returns its coordinates. You’ll be able to take into account the Hilbert index because the variety of the interval from left to proper for Hilbert Curve of order < infinity.
A perform that does the inverse, mapping a coordinate again to its Hilbert index.

Whereas looking out the web for doable implementations I got here throughout a Github repository of Princeton College implementing the algorithm of John Skilling, that was printed in a paper from 2004 known as Programming the Hilbert Curve. Sadly, this paper is just not freely accessible for the general public, so I made a decision to investigate the code from the Princeton repository.

4.1 Skilling’s Algorithm – Overview

Skilling noticed that mapping Hilbert indices to coordinates might be expressed elegantly by way of binary operations. For instance, take into account the indices 0, 1, 2, 3 in a single dimension. These correspond to the coordinates (0, 0), (1, 0), (1, 1), (0, 1) in a 2 × 2 grid. Right here, the values 0, 1, 2, 3 now not symbolize fractional factors within the unit interval (like 1/3), however as an alternative discrete interval numbers. With a 2 × 2 grid, there are precisely 4 intervals in [0, 1] and 4 corresponding squares in [0, 1]². Skilling’s algorithm generalizes this concept. It computes the mapping from a Hilbert index to its corresponding coordinate (and vice versa) in any finite dimension utilizing binary transformations. The important steps are:

Convert the Hilbert index from decimal to binary.
Rework the binary quantity into its Grey code illustration.
Disentangle the Grey code right into a coordinate construction.
Apply rotations and reflections utilizing XOR operations.
Convert the binary coordinates again to decimal

4.2 Binary Illustration

To know why binaries are a lot better suited to computing factors of the Hilbert Curve from Hilbert Indices and vice versa the next examples may assist (we talk about every thing in 2D, however the algorithm works in any dimensional house):
The Hilbert Curve is outlined on a 2×2, 4×4, 8×8, 16×16…and many others. grid. (Keep in mind the definition above and its recursive strategy).
By wanting on the numbers, one may uncover that the variety of intervals develop with 2ⁿ, the place n is the order of the curve. This matches completely with binary encoding: for an n-th order curve, we
want precisely n bits per axis to explain the grid.
Take the 4 × 4 grid (second order) for instance. Two bits per axis are enough:

The primary bit identifies the main quadrant (decrease left, higher left, decrease proper, or higher proper).
The second bit specifies the place inside that quadrant.

For example, Hilbert index 2 has the binary type 0010. Deciphering this:

00 selects the lower-left quadrant.
10 selects the upper-right subquadrant inside it.

Determine 3: Mapping binaries to grid cells. The primary two bits encode the main quadrant, the final two bits the
subquadrant. Think about the repetitive sample of 00, 01, 10, 11 in each quadrant, forming a Hilbert curve of
order 1. (Picture by writer)

Nevertheless, if we proceed this course of for indices larger than 3, we encounter a problem: the orientation of the curve modifications from one quadrant to the subsequent. Accurately dealing with these rotations and reflections is precisely the place Grey code and XOR operations (as in Skilling’s algorithm) turn into important.

4.3 Grey Code Illustration

The following step in Skilling’s algorithm is a metamorphosis from binary to Grey code. The important thing distinction is that in Grey code, consecutive numbers differ in just one bit. This property is essential: It ensures that the curve strikes easily from one quadrant to the subsequent (although the orientation of the curve in every quadrant remains to be not appropriate)

By wanting on the binary numbers and the orientation of the completely different sections of the curve, we are able to see that the curve remains to be not appropriate, however the finish of every quadrant now connects to the start of the subsequent.

Determine 4: After remodeling binary values to Grey code, the final cell of a present quadrant has the identical worth
as the primary cell of the subsequent (Picture by writer)

4.4 Disentanglement of the Bits

The actual “magic” of Skilling’s technique begins with a reordering of the Grey-coded bits—a step known as disentanglement. In our 4 × 4 instance, we initially interpreted the 4 bits as (bitx₁, bity₁, bitx₂, bity₂) the place the primary pair encodes the main quadrant and the second pair the sub-quadrant. Nevertheless, for coordinate computation we want a construction of the shape (bitx₁, bitx₂, bity₁, bity₂) so that each one x-bits and y-bits can later be mixed into the respective decimal coordinates (x, y). This step is named disentanglement of the bits.

Determine 5: Orientation of subquadrants in a 4×4 grid after Grey code disentanglement (Picture by writer)

4.5 Corrective Transformations

After disentangling the bits, the ultimate step of Skilling’s algorithm is to rotate and mirror the subcurves inside every quadrant in order that they join seamlessly into the Hilbert curve of order n.

Determine 6 illustrates this course of for the 4 × 4 case. The desk on the left exhibits how Grey-coded coordinates are transformed into commonplace binary numbers by making use of easy transformations: swaps and bit-flips.

The diagram on the proper visualizes the impact: the higher quadrants are rotated by 180◦, the decrease quadrants are mirrored alongside the diagonal, and in some instances (e.g. the yellow quadrant) no transformation is required in any respect.

The important thing perception is that after these corrective transformations, the coordinates are as soon as once more in commonplace binary type. Which means that the output of Skilling’s algorithm might be transformed on to decimal coordinates within the format (x, y), with out additional adjustment

Determine 6: Last transformations to transform grey code to binary coordinates (Picture by writer)

Skilling algorithm key transformations: Enter: Grey code formatted (bitx₁, bitx₂, bity₁, bity₂) In python the format could be: [-1, ndims, nbits]. Instance: The quantity 4 could be represented as the next checklist/np-array: [[01],[10]]. For the x-Dimension 1 is the least vital bit (LSB), and 0 probably the most vital bit
(MSB).

Loop from probably the most vital bit (MSB) to least vital bit (LSB)
Innerloop from highest dimension (y in 2D) to lowest dimension
: Have a look at the present bit. If 1: Flip each decrease bit in dimension 0 (often x) If 0: Swap values between
decrease bits in present dimension and dimension 0 (in the event that they differ).

Step 3 might be simply computed with numpy utilizing XOR operations. The entire technique of flipping and swapping bits in every iteration is visualized within the following animations.

Determine 7: Creation technique of a 2D Hilbert curve utilizing the algorithm of John Skilling (Picture by writer)

Determine 8: Creation technique of a 3D Hilbert curve utilizing the algorithm of John Skilling (Picture by writer)

If you wish to analyze the algorithm in additional element or just generate your individual animations in 2D or 3D, take a look at my GitHub Repository

5 Purposes of House Filling Curves

After discussing theoretical elements and implementation particulars of the Hilbert Curve, the query arises, the place it may be utilized. Throughout the implementation we noticed the right way to remodel Hilbert Indices into coordinates. For the next utility, the inverse of this course of is extra fascinating.

One worthwhile facet of the Hilbert Curve is that it maps a 1D ordered set (i.e. 1,2,3…) to coordinates in an n-dimensional house. It offers an order to the factors it traverses and it may well stay in vector areas of arbitrary dimension. Thus, the Hilbert Curve is used for knowledge partitioning and cluster, picture compression and in addition for constructing options in machine studying, when coping with spatial knowledge.

5.1 Information Partitioning/Clustering utilizing SFCs

Probably the most distinguished functions of SFCs is knowledge partitioning. For instance, in Databricks, Z-ordering relies on the Z-curve, whereas liquid clustering depends on the Hilbert Curve. The reason being easy:
the Hilbert curve preserves locality higher than the Z-curve, which is essential when indexing and partitioning multidimensional knowledge. In determine 9 you may see how some exemplary knowledge factors are mapped to factors of the Hilbert curve, by assigning every level to 1 partition given by the curve.

Determine 9: Mapping of knowledge to factors of the Hilbert Curve. The pink dashed arrows point out some mappings
exemplarily (Picture by writer)

When a question is utilized to the information (e.g. SELECT * FROM desk WHERE x in (1,2) and y in (2,3), all factors on this vary ((1,2), (1,3), (2,2), (2,3)) are transformed to Hilbert indices and the system can immediately retrieve all matching entries. The important thing benefit is that this mapping permits quick and versatile knowledge retrieval. In contrast to conventional indexing, the Hilbert-based partitioning adapts naturally to updates or progress within the dataset — with out requiring your entire index to be recomputed.

5.2 Information Indexing: Hilbert Curve vs. Z-Curve

To spotlight the sensible benefits of the Hilbert curve, I in contrast its efficiency with the Z-curve on a set of artificial vary queries.

For the experiment, I generated 100 random vary queries of mounted dimension. For every question, I computed the Hilbert and Z-curve indices and counted the variety of clusters, whereas a cluster is a set of consecutive indices. For instance, if the question returned the indices [1,2,3,5,6,8,9], this could type three clusters: [1,2,3], [5,6], and [8,9].
If the information is saved in index order, clusters correspond to sequential reads, whereas gaps between clusters indicate pricey jumps to new storage addresses.

Determine 10: 100 random queries for a 2D setup utilizing Hilbert curve and Z-curve. As you may see, you may’t see something! 😉
(Picture by writer)

To quantify efficiency, I used two metrics:

Cluster rely: Fewer clusters indicate much less fragmentation and fewer storage jumps.
Intra-cluster unfold: The common variety of indices per cluster

The worst-case situation could be excessive fragmentation: each level forming a cluster of its personal. Determine 11 compares the efficiency for the Z-curve and Hilbert curve for 2, three and 4 dimensions, a question dimension of seven (7×7 in 2D, 7x7x7 in 3D and many others.) and 6 bits per axis (i.e. 64 values per axis)

Determine 11: Comparability of Hilbert and Z curve based mostly on variety of clusters and intra-cluster unfold for two,3 and 4 dimensions. The outcomes clearly present that the Hilbert curve preserves locality a lot better than the Z-curve (Picture by writer)

The outcomes clearly present that the Hilbert curve preserves locality a lot better than the Z-curve. Throughout all examined dimensions, queries lead to fewer clusters and thus greater intra-cluster density with Hilbert indices. In follow, this interprets into extra environment friendly knowledge retrieval and decreased I/O prices, significantly for multidimensional vary queries.

6 Past House-Filling Curves

The aim of this text was for example the class of SFCs and to offer a glimpse into their functions in knowledge indexing. Nevertheless, the most recent analysis on this subject goes past classical SFCs.

The principle limitation of all space-filling curves is their mounted mechanism. As soon as outlined, their construction affords little room for adaptation to completely different datasets or workload patterns. In follow, this rigidity can restrict efficiency.

To beat this, researchers reminiscent of Chen et al. (College of Digital Science and Know-how of China & Huawei) have proposed AdaCurve, a machine studying–based mostly strategy. As an alternative of counting on a predetermined mapping, AdaCurve trains a mannequin to generate a one-dimensional index immediately from high-dimensional knowledge factors, optimized based on each the dataset and the question workload. [3]

This concept is extremely promising: whereas Hilbert and different SFCs supply elegant however inflexible mappings, AdaCurve adapts dynamically, producing an indexing system that’s tailor-made to the information and queries at hand. Such adaptability may pave the way in which for considerably extra environment friendly indexing in large-scale knowledge platforms sooner or later.

References

[1] H. Sagan, House-Filling Curves. Springer-Verlag, 1994.

[2] M. Bader, House-Filling Curves – An Introduction with Purposes in Scientific Computing. Springer-Verlag, 2013.

[3] X. CHEN, “Optimizing block skipping for high-dimensional knowledge with discovered adaptive curve,” SIGMOD, vol. 3, 2025. [Online]. Out there:https://zheng- kai.com/paper/2025_sigmod_chen.pdf

The Great thing about House-Filling Curves: Understanding the Hilbert Curve

Accelerating HPC and AI analysis in universities with Amazon SageMaker HyperPod

Exploring the Actual-Time Race Monitor with Amazon Nova

Exploring the Actual-Time Race Monitor with Amazon Nova

Leave a Reply Cancel reply

Popular News

How Aviva constructed a scalable, safe, and dependable MLOps platform utilizing Amazon SageMaker

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

Diffusion Mannequin from Scratch in Pytorch | by Nicholas DiSalvo | Jul, 2024

The Journey from Jupyter to Programmer: A Fast-Begin Information

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

About Us

Category

Recent Posts