Why Is My Code So Gradual? A Information to Py-Spy Python Profiling

irritating points to debug in information science code aren’t syntax errors or logical errors. Slightly, they arrive from code that does precisely what it’s alleged to do, however takes its candy time doing it.

Purposeful however inefficient code generally is a huge bottleneck in an information science workflow. On this article, I’ll present a quick introduction and walk-through of py-spy, a robust software designed to profile your Python code. It might probably pinpoint precisely the place your program is spending probably the most time so inefficiencies might be recognized and corrected.

Instance Downside

Let’s arrange a easy analysis query to put in writing some code for:

“For all flights going between US states and territories, which departing airport has the longest flights on common?”

Under is a straightforward Python script to reply this analysis query, utilizing information retrieved from the Bureau of Transportation Statistics (BTS). The dataset consists of knowledge from each flight inside US states and territories between January and June of 2025 with data on the origin and vacation spot airports. It’s roughly 3.5 million rows.

It calculates the Haversine Distance — the shortest distance between two factors on a sphere — for every flight. Then, it teams the outcomes by departing airport to search out the common distance and stories the highest 5.

import pandas as pd  
import math  
import time  
  
  
def haversine(lat_1, lon_1, lat_2, lon_2):  
    """Calculate the Haversine Distance between two latitude and longitude factors"""  
    lat_1_rad = math.radians(lat_1)  
    lon_1_rad = math.radians(lon_1)  
    lat_2_rad = math.radians(lat_2)  
    lon_2_rad = math.radians(lon_2)  
  
    delta_lat = lat_2_rad - lat_1_rad  
    delta_lon = lon_2_rad - lon_1_rad  
  
    R = 6371  # Radius of the earth in km  
  
    return 2*R*math.asin(math.sqrt(math.sin(delta_lat/2)**2 + math.cos(lat_1_rad)*math.cos(lat_2_rad)*(math.sin(delta_lon/2))**2))  
  
  
if __name__ == '__main__':  
    # Load in flight information to a dataframe  
    flight_data_file = r"./information/2025_flight_data.csv"  
    flights_df = pd.read_csv(flight_data_file)  
  
    # Begin timer to see how lengthy evaluation takes  
    begin = time.time()  
  
    # Calculate the haversine distance between every flight's begin and finish airport  
    haversine_dists = []  
    for i, row in flights_df.iterrows():  
        haversine_dists.append(haversine(lat_1=row["LATITUDE_ORIGIN"],  
                                         lon_1=row["LONGITUDE_ORIGIN"],  
                                         lat_2=row["LATITUDE_DEST"],  
                                         lon_2=row["LONGITUDE_DEST"]))  
  
    flights_df["Distance"] = haversine_dists  
  
    # Get consequence by grouping by origin airport, taking the common flight distance and      printing the highest 5  
    consequence = (  
        flights_df  
        .groupby('DISPLAY_AIRPORT_NAME_ORIGIN').agg(avg_dist=('Distance', 'imply'))  
        .sort_values('avg_dist', ascending=False)  
    )  
  
    print(consequence.head(5))  
  
    # Finish timer and print evaluation time  
    finish = time.time()  
    print(f"Took {finish - begin} s")

Working this code offers the next output:

                                        avg_dist
DISPLAY_AIRPORT_NAME_ORIGIN                     
Pago Pago Worldwide              4202.493567
Guam Worldwide                   3142.363005
Luis Munoz Marin Worldwide       2386.141780
Ted Stevens Anchorage Worldwide  2246.530036
Daniel Okay Inouye Worldwide        2211.857407
Took 169.8935534954071 s

These outcomes make sense, because the airports listed are in American Samoa, Guam, Puerto Rico, Alaska, and Hawaii, respectively. These are all places exterior of the contiguous United States the place one would count on lengthy common flight distances.

The issue right here isn’t the outcomes — that are legitimate — however the execution time: virtually three minutes! Whereas three minutes is perhaps tolerable for a one-off run, it turns into a productiveness killer throughout growth. Think about this as a part of an extended information pipeline. Each time a parameter is tweaked, a bug is fastened, or a cell is re-run, you might be pressured to sit down idle whereas this system runs. That friction breaks your movement and turns a fast evaluation into an all-afternoon affair.

Now let’s see how py-spy can assist us diagnose precisely what strains are taking so lengthy.

What Is Py-Spy?

To grasp what py-spy is doing and the advantages of utilizing it, it helps to check py-spy to the built-in Python profiler cProfile.

cProfile: It is a Tracing Profiler, working just like a stopwatch on every perform name. The time between every perform name and return is measured and reported. Whereas extremely correct, this provides vital overhead, because the profiler has to continually pause and file information, which may decelerate the script considerably.
py-spy: It is a Sampling Profiler, working just like a excessive velocity digicam trying on the entire program without delay. py-spy sits fully exterior the working Python script and takes high-frequency snapshots of this system’s state. It appears to be like on the complete “Name Stack” to see precisely what line of code is being run and what perform referred to as it, all the way in which as much as the highest degree.

Working Py-spy

As a way to run py-spy on a Python script, the py-spy library should be put in within the Python surroundings.

pip set up py-spy

As soon as the py-spy library is put in, our script might be profiled by working the next command within the terminal:

py-spy file -o profile.svg -r 100 -- python important.py

Here’s what every a part of this command is definitely doing:

py-spy: Calls the software.
file: This tells py-spy to make use of its “file” mode, which is able to constantly monitor this system whereas it runs and saves the info.
-o profile.svg: This specifies the output filename and format, telling it to output the outcomes as an SVG file referred to as profile.svg.
-r 100: This specifies the sampling price, setting it to 100 occasions per second. Which means that py-spy will test what this system is doing 100 occasions per second.
--: This separates the py-spy command from the Python script command. It tells py-spy that all the things following this flag is the command to run, not arguments for py-spy itself.
python important.py: That is the command to run the Python script to be profiled with py-spy, on this case working important.py.

Word: If working on Linux, sudo privileges are sometimes a requirement for working py-spy, for safety causes.

After this command is completed working, an output file profile.svg will seem which is able to enable us to dig deeper into what components of the code are taking the longest.

Py-spy Output

Opening up the output profile.svg reveals the visualization that py-spy has created for the way a lot time our program spent in several components of the code. This is named a Icicle Graph (or typically a Flame Graph if the y-axis is inverted) and is interpreted as follows:

Bars: Every coloured bar represents a specific perform that was referred to as throughout the execution of this system.
X-axis (Inhabitants): The horizontal axis represents the gathering of all samples taken throughout the profiling. They’re grouped in order that the width of a specific bar represents the proportion of the whole samples that this system was within the perform represented by that bar. Word: That is not a timeline; the ordering doesn’t signify when the perform was referred to as, solely the whole quantity of time spent.
Y-axis (Stack Depth): The vertical axis represents the decision stack. The highest bar labeled “all” represents the complete program, and the bars under it signify capabilities referred to as from “all”. This continues down recursively with every bar damaged down into the capabilities that have been referred to as throughout its execution. The very backside bar reveals the perform that was truly working on the CPU when the pattern was taken.

Interacting with the Graph

Whereas the picture above is static, the precise .svg file generated by py-spy is totally interactive. If you open it in an online browser, you possibly can:

Search (Ctrl+F): Spotlight particular capabilities to see the place they seem within the stack.
Zoom: Click on on any bar to zoom in on that particular perform and its kids, permitting you to isolate complicated components of the decision stack.
Hover: Hovering over any bar shows the particular perform title, file path, line quantity, and the precise share of time it consumed.

Probably the most crucial rule for studying the icicle graph is just: The broader the bar, the extra frequent the perform. If a perform bar spans 50% of the graph’s width, it signifies that this system was engaged on executing that perform for 50% of the whole runtime.

Prognosis

From the icicle graph above, we are able to see that the bar representing the Pandas iterrows() perform is noticeably vast. Hovering over that bar when viewing the profile.svg file reveals that the true proportion for this perform was 68.36%. So over 2/3 of the runtime was spent within the iterrows() perform. Intuitively this bottleneck is sensible, as iterrows() creates a Pandas Collection object for each single row within the loop, inflicting huge overhead. This reveals a transparent goal to try to optimize the runtime of the script.

Optimizing The Script

The clearest path to optimize this script based mostly on what was discovered from py-spy is to cease utilizing iterrows() to loop over each row to calculate that haversine distance. As a substitute, it must be changed with a vectorized calculation utilizing NumPy that can do the calculation for each row with only one perform name. So the adjustments to be made are:

Rewrite the haversine() perform to make use of vectorized and environment friendly C-level NumPy operations that enable entire arrays to be handed in quite than one set of coordinates at a time.
Substitute the iterrows() loop with a single name to this newly vectorized haversine() perform.

import pandas as pd  
import numpy as np  
import time  
  
  
def haversine(lat_1, lon_1, lat_2, lon_2):  
    """Calculate the Haversine Distance between two latitude and longitude factors"""  
    lat_1_rad = np.radians(lat_1)  
    lon_1_rad = np.radians(lon_1)  
    lat_2_rad = np.radians(lat_2)  
    lon_2_rad = np.radians(lon_2)  
  
    delta_lat = lat_2_rad - lat_1_rad  
    delta_lon = lon_2_rad - lon_1_rad  
  
    R = 6371  # Radius of the earth in km  
  
    return 2*R*np.asin(np.sqrt(np.sin(delta_lat/2)**2 + np.cos(lat_1_rad)*np.cos(lat_2_rad)*(np.sin(delta_lon/2))**2))  
  
  
if __name__ == '__main__':  
    # Load in flight information to a dataframe  
    flight_data_file = r"./information/2025_flight_data.csv"  
    flights_df = pd.read_csv(flight_data_file)  
  
    # Begin timer to see how lengthy evaluation takes  
    begin = time.time()  
  
    # Calculate the haversine distance between every flight's begin and finish airport  
    flights_df["Distance"] = haversine(lat_1=flights_df["LATITUDE_ORIGIN"],  
                                       lon_1=flights_df["LONGITUDE_ORIGIN"],  
                                       lat_2=flights_df["LATITUDE_DEST"],  
                                       lon_2=flights_df["LONGITUDE_DEST"])  
  
    # Get consequence by grouping by origin airport, taking the common flight distance and      printing the highest 5  
    consequence = (  
        flights_df  
        .groupby('DISPLAY_AIRPORT_NAME_ORIGIN').agg(avg_dist=('Distance', 'imply'))  
        .sort_values('avg_dist', ascending=False)  
    )  
  
    print(consequence.head(5))  
  
    # Finish timer and print evaluation time  
    finish = time.time()  
    print(f"Took {finish - begin} s")

Working this code offers the next output:

                                        avg_dist
DISPLAY_AIRPORT_NAME_ORIGIN                     
Pago Pago Worldwide              4202.493567
Guam Worldwide                   3142.363005
Luis Munoz Marin Worldwide       2386.141780
Ted Stevens Anchorage Worldwide  2246.530036
Daniel Okay Inouye Worldwide        2211.857407
Took 0.5649983882904053 s

These outcomes are an identical to the outcomes from earlier than the code was optimized, however as a substitute of taking practically three minutes to course of, it took simply over half a second!

Wanting Forward

In case you are studying this from the long run (late 2026 or past), test in case you are working Python 3.15 or newer. Python 3.15 is anticipated to introduce a local sampling profiler in the usual library, providing related performance to py-spy with out requiring exterior set up. For anybody on Python 3.14 or older py-spy stays the gold customary.

This text explored a software for tackling a standard frustration in information science — a script that capabilities as supposed, however is inefficiently written and takes a very long time to run. An instance script was offered to study which US departure airports have the longest common flight distance in keeping with the Haversine distance. This script labored as anticipated, however took virtually three minutes to run.

Utilizing the py-spy Python profiler, we have been in a position to study that the reason for the inefficiency was the usage of the iterrows() perform. By changing iterrows() with a extra environment friendly vectorized calculation of the Haversine distance, the runtime was optimized from three minutes down to simply over half a second.

See my GitHub Repository for the code from this text, together with the preprocessing of the uncooked information from BTS.

Thanks for studying!

Information Sources

Information from the Bureau of Transportation Statistics (BTS) is a piece of the U.S. Federal Authorities and is within the public area below 17 U.S.C. § 105. It’s free to make use of, share, and adapt with out copyright restriction.

Why Is My Code So Gradual? A Information to Py-Spy Python Profiling

A sensible information to Amazon Nova Multimodal Embeddings

AI brokers in enterprises: Greatest practices with Amazon Bedrock AgentCore

AI brokers in enterprises: Greatest practices with Amazon Bedrock AgentCore

Leave a Reply Cancel reply

Popular News

Greatest practices for Amazon SageMaker HyperPod activity governance

How Cursor Really Indexes Your Codebase

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

Construct a serverless audio summarization resolution with Amazon Bedrock and Whisper

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

About Us

Category

Recent Posts