, myself included, begin their coding journey utilizing a Jupyter Pocket book. These information have the extension .ipynb, which stands for Interactive Python Pocket book. Because the extension title suggests, it has an intuitive and interactive consumer interface. The pocket book is damaged down into ‘cells’ or small blocks of separated code or markdown (textual content) language. Outputs are displayed beneath every cell as soon as the code inside that cell has been executed. This promotes a versatile and interactive setting for coders to construct their coding expertise and begin engaged on knowledge science initiatives.
A typical instance of a Jupyter Pocket book is beneath:

This all sounds nice. And don’t get me unsuitable, to be used instances akin to conducting solo analysis or exploratory knowledge evaluation (EDA), Jupyter Notebooks are nice. The problems come up once you ask the next questions:
- How do you flip a Jupyter Pocket book into code that may be leveraged by a enterprise?
- Are you able to collaborate with different builders on the identical challenge utilizing a model management system?
- How are you going to deploy code to a manufacturing setting?
Fairly quickly, the constraints of completely utilizing Jupyter Notebooks inside a industrial context will begin to trigger issues. It’s merely not designed for these functions. The overall resolution is to organise code in a modular trend.
By the tip of this text, you need to have a transparent understanding of easy methods to construction a small knowledge science challenge as a Python program and admire the benefits of transitioning to a programming strategy. You may try an instance template to complement this text in my github right here.
Disclaimer
The contents of this text are primarily based on my expertise of migrating away from solely utilizing Jupyter Notebooks to put in writing code. Do notebooks nonetheless have a goal? Sure. Are there alternative routes to organise and execute code past the strategies I focus on on this article? Sure.
I wished to share this info to assist anybody desirous to make the transfer away from notebooks and in the direction of writing scripts and packages. If I’ve missed any options of Jupyter Notebooks that mitigate the constraints I’ve talked about, please drop a remark!
Let’s get again to it.
Programming: what’s the large deal?
For the aim of this text, I’ll be specializing in the Python programming language as that is the language I exploit for knowledge science initiatives. Structuring code as a Python program unlocks a variety of functionalities which might be tough to attain when working completely inside a Jupyter Pocket book. These advantages embrace collaboration, versatility and portability – you’re merely capable of do extra along with your code. I’ll clarify these advantages additional down – stick with me somewhat longer!
Python packages are usually organised into modules and packages. A module is a python script (information with a .py extension) that comprises python code which will be imported into different information. A package deal is a listing that comprises python modules. I’ll focus on the aim of the file __init__.py
later within the article.

Anytime you import a python library into your code, akin to built-in libraries like os
or third-party libraries like pandas
, you might be interacting with a python program that’s been organised right into a package deal and modules.
For instance, let’s say you wish to use the randint perform from numpy. This perform permits you to generate a random integer primarily based on specified parameters. You may write:
from numpy.random import randint
Let’s annotate that import assertion to point out what you’re really importing.

On this occasion, numpy
is a package deal; random
is a module and randint
is a perform.
So, it seems you most likely work together with python packages frequently. This poses the query, what does the journey appear to be in the direction of turning into a python programmer?
The nice transition: the place do you even begin?
The trick to constructing a practical python program is all within the file construction and organisation. It sounds boring nevertheless it performs an excellent vital half in setting your self up for fulfillment!
Let me use an analogy to elucidate: each home has a drawer that has nearly all the things in it; instruments, elastic bands, drugs, your hopes and desires, the lot. There’s no rhyme or cause, it’s a dumping floor of nearly all the things. Consider this as a Jupyter Pocket book. This one file usually comprises all levels of a challenge, from importing knowledge, exploring what the information appears to be like like, visualising tendencies, extracting options, coaching a mannequin and so on. For a challenge that’s destined to be deployed on a manufacturing system or co-developed with colleagues, it’s going to trigger chaos. What’s wanted is a few organisation, to place all of the instruments in a single compartment, the medication in one other and so forth.
An effective way to try this with code is to make use of a challenge template. One which I exploit incessantly is the Cookie Cutter Information Science template. You may create an entire listing to your challenge with all of the related information wanted to do absolutely anything in a number of easy operations in a terminal window – see the hyperlink above for info on easy methods to set up and run Cookie Cutter.
Under are a few of the key options of the challenge template:
- package deal or src listing — listing for python scripts/modules, outfitted with examples to get you began
- readme.md — file to explain utilization, setup and easy methods to run the package deal
- docs listing — containing information that allow seamless autodocumentation
- Makefile— for writing OS ambivalent bespoke run instructions
- pyproject.toml/necessities.txt — for dependency administration

High tip. Ensure to maintain Cookie Cutter updated. With each launch, new options are added in accordance with the ever-evolving knowledge science universe. I’ve learnt fairly a number of issues from exploring a brand new file or characteristic within the template!
Alternatively, you should use different templates to construct your challenge akin to that offered by Poetry. Poetry is a package deal supervisor which you should use to generate a challenge template that’s extra light-weight than Cookie Cutter.
One of the best ways to work together along with your challenge is thru an IDE (Built-in Improvement Setting). This software program, akin to Visible Studio Code (VS Code) or PyCharm, embody quite a lot of options and processes that allow you to code, take a look at, debug and package deal your work effectively. My private choice is VS Code!
From cells to scripts: let’s get coding
Now that we’ve a growth setting and a properly structured challenge template, how precisely do you write code in a python script when you’ve solely ever coded in a Jupyter Pocket book? To reply that query, let’s first contemplate a number of industry-standard coding Greatest Practices.
- Modular — observe the software program engineering philosophy of ‘Single Duty Precept’. All code needs to be encapsulated in features, with every perform performing a single activity. The Zen of Python states: ‘Easy is healthier than complicated’.
- Readable — if code is readable, then there’s probability it will likely be maintainable. Make sure the code is stuffed with docstrings and feedback!
- Fashionable — format code in a constant and clear method. The PEP 8 pointers are designed for this goal to advise how code needs to be introduced. You may set up autoformatters akin to Black in an IDE in order that code is routinely formatted in compliance with PEP 8 every time the python script is saved. For instance, the precise stage of indentation and spacing will probably be utilized so that you don’t even have to consider it!
- Versatile — if code is encapsulated into features or lessons, these will be reused all through a challenge.
For a deeper dive into coding finest follow, this text is a incredible overview of rules to stick to as a Information Scientist, make sure you test it out!
With these finest practices in thoughts, let’s return to the query: how do you write code in a python script?
Module construction
First, separate the completely different levels of your pocket book or challenge into completely different python information. And ensure to call them in accordance with the duty. For instance, you may need the next scripts in a typical machine studying package deal: knowledge.py
, preprocess.py
, options.py
, prepare.py
, predict.py
, consider.py
and so on. Relying in your challenge construction, these would sit throughout the package deal
or src
listing.
Inside every script, code needs to be organised or ‘encapsulated’ right into a lessons and/or features. A perform is a reusable block of code that performs a single, well-defined activity. A class is a blueprint for creating an object, with its personal set of attributes (variables) and strategies (features). Encapsulating code on this method permits reusability and avoids duplication, thus holding code concise.
A script may solely want one perform if the duty is straightforward. For instance, an information loading module (e.g. knowledge.py
) might solely include a single perform ‘load_data’ which hundreds knowledge from a csv file right into a pandas
DataFrame. Different scripts, akin to an information processing module (e.g. preprocess.py
) will inherently contain extra duties and therefore requires extra features or a category to encapsulate these duties.

High tip. Transitioning from Jupyter Notebooks to scripts might take a while and everybody’s private journey will look completely different. Some Information Scientists I do know write code as python scripts right away and don’t contact a pocket book. Personally, I exploit a pocket book for EDA, I then encapsulate the code into features or lessons earlier than porting to a script. Do no matter feels best for you.
There are a number of instruments that may assist with the transition. 1) In VS Code, you may choose a number of strains, proper click on and choose Run Python > Run Choice/Line in Python Terminal. That is much like working a cell in Jupyter Pocket book. 2) You may convert a pocket book to a python script by clicking File > Obtain as > Python (.py). I wouldn’t advocate that strategy with giant notebooks for concern of making monster scripts, however the choice is there!
The ‘__main__’ occasion
At this level, we’ve established that code needs to be encapsulated into features and saved inside clearly named scripts. The subsequent logical query is, how are you going to tie all these scripts collectively so code will get executed in the precise order?
The reply is to import these scripts right into a single-entry level and execute the code in a single place. Inside the context of creating a easy challenge, this entry level is usually a script named important.py
(however will be referred to as something). On the prime of important.py
, simply as you’ll import essential built-in packages or third-party packages from PyPI, you’ll import your personal modules or particular lessons/features from modules. Any lessons or features outlined in these modules will probably be accessible to make use of by the script they’ve been imported into.
To do that, the package deal listing inside your challenge must include a __init__.py
file, which is usually left clean for easy initiatives. This file tells the python interpreter to deal with the listing as a package deal, which means that any information with a .py extension get handled as modules and may subsequently be imported into different information.
The construction of important.py
is challenge dependent, however it should usually be dictated by the required order of code execution. For a typical machine studying challenge, you’ll first want to make use of the load_data perform from the module knowledge.py
. You then may instantiate the preprocessor class that’s imported from the module preprocess.py
and apply quite a lot of class strategies to the preprocessor object. You’d then transfer onto characteristic engineering and so forth till you’ve gotten the entire workflow written out. This workflow would usually be contained or referenced inside a conditional assertion on the backside of important.py
.
Wait….. who talked about something a couple of conditional assertion? The conditional assertion is as follows:
if __name__ == '__main__':
# add code right here
__name__
is a particular python variable that may have two completely different values relying on how the script is run:
- If the script is run immediately in terminal, the interpreter assigns the
__name__
variable the worth'__main__'
. As a result of the assertionif '__name__=='__main__':
is true, any code that sits inside this assertion is executed. - If the script is run as an imported module, the interpreter assigns the title of the module as a string to the
__name__
variable. As a result of the assertion ifif '__name__=='__main__':
is fake, the contents of this assertion shouldn’t be executed.
Some extra info on this may be discovered right here.
Given this course of, you’ll have to reference the grasp perform throughout the if '__name__=='__main__':
conditional assertion in order that it’s executed when important.py
is run. Alternatively, you may place the code beneath if '__name__=='__main__':
to attain the identical end result.

important.py
(or any python script) will be executed in terminal utilizing the next syntax:
python3 important.py
Upon working important.py
, code will probably be executed from all of the imported modules within the specified order. This is identical as clicking the ‘run all’ button on a Jupyter Pocket book the place every cell is executed in sequential order. The distinction now could be that the code is organised into particular person scripts in a logical method and encapsulated inside lessons and features.
You can even add CLI (command-line interface) arguments to your code utilizing instruments akin to argparse and typer, permitting you to toggle particular variables when working important.py
within the terminal. This supplies a substantial amount of flexibility throughout code execution.
So we’ve now reached the very best half. The pièce de résistance. The actual the explanation why, past having fantastically organised and readable code, you need to go to the hassle of Programming.
The top recreation: what’s the purpose of programming?
Let’s stroll via a few of the key advantages of transferring past Jupyter Notebooks and transitioning to writing Python scripts as a substitute.

- Packaging & distribution — you may package deal and distribute your python program so it may be shared, put in and run on one other laptop. Bundle managers akin to pip, poetry or conda can be utilized to put in the package deal, simply as you’ll set up packages from PyPI, akin to
pandas
ornumpy
. The trick to efficiently distributing your package deal is to make sure that the dependencies are managed accurately, which is the place the informationpyproject.toml
ornecessities.txt
are available. Some helpful assets will be discovered right here and right here. - Deployment — while there are a number of strategies and platforms to deploy code, utilizing a modular strategy will put you in good stead to get your code manufacturing prepared. Instruments akin to Docker allow the deployment of packages or functions in remoted environments referred to as containers, which will be simply managed via CI/CD (steady integration & deployment) pipelines. It’s price noting that whereas Jupyter Notebooks will be deployed utilizing JupyterLab, this strategy lacks the flexibleness and scalability of adopting a modular, script-based workflow.
- Model management — transferring away from Jupyter Notebooks opens up the great worlds of model management and collaboration. Model management techniques akin to Git are very a lot {industry} customary and supply a wealth of advantages, offering you employ them appropriately! Comply with the motto ‘incremental adjustments are key’ and make sure that you make small, common commits with logical commit messages in crucial language everytime you make practical adjustments while creating. It will make it far simpler to maintain monitor of adjustments and take a look at code. Right here is an excellent helpful information to utilizing git as an information scientist.
Enjoyable truth. It’s usually discouraged to commit Jupyter Notebooks to model management techniques as it’s tough to trace adjustments!
- (Auto)Documentation — everyone knows that documenting code will increase its readability thus serving to the reader perceive what the code is doing. It’s thought of finest follow so as to add docstrings to features and lessons inside python scripts. What’s actually cool is that we are able to use these docstrings to construct an index of formatted documentation of your entire challenge within the type of html information. Instruments akin to Sphinx allow you to do that in a fast and straightforward method. You may learn my earlier article which takes you thru this course of step-by-step.
- Reusability — adopting a modular strategy promotes the reuse of code. There are a lot of frequent duties inside knowledge science initiatives, akin to cleaning knowledge or scaling options. There’s little level in reinventing the wheel, so when you can reuse features or lessons with minor modification from earlier initiatives, so long as there are not any confidentiality restrictions, then save your self that point! You may need a
utils.py
orlessons.py
module which comprises ambivalent code that can be utilized throughout modules. - Configuration administration — while that is doable with a Jupyter Pocket book, it’s common follow to make use of configuration administration for a python program. Configuration administration refers to organising and managing a challenge’s parameters and variables in a centralised method. As a substitute of defining variables all through the code, they’re saved in a file that sits throughout the challenge listing. Which means you don’t want to interrogate the code to vary a parameter. An outline of this may be discovered right here.
Notice. For those who use a YAML file (.yml) for configuration, this requires the python package deal
yaml
. Ensure to put in the pyyaml package deal (not ‘yaml’) utilizingpip set up pyyaml
. Forgetting this will result in “package deal not discovered” errors—I’ve made this error, perhaps greater than as soon as..
- Logging — utilizing loggers inside a python program lets you simply monitor code execution, present debugging info and monitor a program or software. While this performance is feasible inside a Jupyter Pocket book, it’s usually thought of overkill and is fulfilled with the print() assertion as a substitute. Through the use of python’s logger module, you may format a logging object to your liking. It has 5 completely different messaging ranges (data, debug, warning, error, important) relative to the severity of the occasions being logger. You may embrace logging messages all through the code to supply perception into code execution, which will be printed to terminal and/or written to a file. You may be taught extra about logging right here.
When are Jupyter Notebooks helpful?
As I eluded initially of this text, Jupyter Notebooks nonetheless have their place in knowledge science initiatives. Their easy-to-use interface makes them nice for exploratory and interactive duties. Two key use instances are listed beneath:
- Conducting exploratory knowledge evaluation on a dataset in the course of the preliminary levels of a challenge.
- Creating an interactive useful resource or report back to exhibit analytical findings. Notice there are many instruments on the market that you should use on this nature, however a Jupyter Pocket book may do the trick.
Last ideas
Thanks for sticking with me to the very finish! I hope this dialogue has been insightful and has shed some mild on how and why to start out programming. As with most issues in Information Science, there isn’t a single ‘appropriate’ strategy to remedy an issue, however a thought of multi-faceted strategy relying on the duty at hand.
Shout out to my colleague and fellow knowledge scientist Hannah Alexander for reviewing this text 🙂
Thanks for studying!