Each firm I labored for till as we speak, there it was: the resilient MS Excel.
Excel was first launched in 1985 and has remained robust till as we speak. It has survived the rise of relational databases, the evolution of many programming languages, the Web with its infinite variety of on-line functions, and eventually, it’s also surviving the period of the AI.
Phew!
Do you’ve any doubts about how resilient Excel is? I don’t.
I feel the rationale for that’s its practicality to start out and manipulate a doc rapidly. Take into consideration this case: we’re at work, in a gathering, and immediately the management shares a CSV file and asks for a fast calculation or a number of calculated numbers. Now, the choices are:
1. Open an IDE (or a pocket book) and begin coding like loopy to generate a easy matplotlib graphic;
2. Open Energy BI, import the info, and begin making a report with dynamic graphics.
3. Open the CSV in Excel, write a few formulation, and create a graphic.
I can’t communicate for you, however many instances I am going for choice 3. Particularly as a result of Excel recordsdata are appropriate with all the pieces, simply shareable, and beginner-friendly.
I’m saying all of this as an Introduction to make my level that I don’t suppose that Excel recordsdata are going away anytime quickly, even with the quick growth of AI. Many will love that, many will hate that.
So, my motion right here was to leverage AI to make Excel recordsdata higher documented. One of many fundamental complaints of information groups about Excel is the dearth of finest practices and reproducibility, on condition that the names of the columns can have any names and information varieties, however zero documentation.
So, I’ve created an AI Agent that reads the Excel file and creates this small documentation. Right here is the way it works:
- The Excel file is transformed to CSV and fed into the Massive Language Mannequin (LLM).
- The AI Agent generates the info dictionary with column info (variable title, information kind, description).
- The info dictionary will get added as feedback to the Excel file’s header.
- Output file saved with feedback.
Okay. Arms-on now. Let’s get that executed on this tutorial.
Code

We are going to start by organising a digital surroundings. Create a venv
with the software of your selection, comparable to Poetry, Python Venv, Anaconda, or UV. I actually like UV, as it’s the quickest and the only, in my view. When you have UV put in [5], open a terminal and create your venv
.
uv init data-docs
cd data-docs
uv venv
uv add streamlit openpyxl pandas agno mcp google-genai
Now, allow us to import the mandatory modules. This venture was created with Python 3.12.1, however I consider Python 3.9 or larger may do the trick already. We are going to use:
- Agno: for the AI Agent administration
- OpenPyxl: for the manipulation of Excel recordsdata
- Streamlit: for the front-end interface.
- Pandas, OS, JSON, Dedent and Google Genai as help modules.
# Imports
import os
import json
import streamlit as st
from textwrap import dedent
from agno.agent import Agent
from agno.fashions.google import Gemini
from agno.instruments.file import FileTools
from openpyxl import load_workbook
from openpyxl.feedback import Remark
import pandas as pd
Nice. The subsequent step is creating the features we’ll have to deal with the Excel recordsdata and to create the AI Agent.
Discover that every one the features have detailed docstrings. That is intentional as a result of LLMs use docstrings to know what a given operate does and determine whether or not to make use of it or not as a software.
So, in case you’re utilizing Python features as Instruments for an AI Agent, make sure that to make use of detailed docstrings. These days, with free copilots comparable to Windsurf [6] it’s even simpler to create them.
Changing the file to CSV
This operate will:
- Take the Excel file and browse solely the primary 10 rows. That is sufficient for us to ship to the LLM. Doing that, we’re additionally stopping sending too many tokens as enter and making this agent too costly.
- Save the file as CSV to make use of as enter for the AI Agent. The CSV format is less complicated for the mannequin to soak up, as it’s a bunch of textual content separated by commas. And we all know LLMs shine working with textual content.
Right here is the operate.
def convert_to_csv(file_path:str):
"""
Use this software to transform the excel file to CSV.
* file_path: Path to the Excel file to be transformed
"""
# Load the file
df = pd.read_excel(file_path).head(10)
# Convert to CSV
st.write("Changing to CSV... :leftwards_arrow_with_hook:")
return df.to_csv('temp.csv', index=False)
Let’s transfer on.
Creating the Agent
The subsequent operate creates the AI agent. I’m utilizing Agno
[1], as it is vitally versatile and simple to make use of. I additionally selected the mannequin Gemini 2.0 Flash
. Throughout the check part, this was the best-performing mannequin producing the info docs. To make use of it, you’ll need an API Key from Google. Don’t overlook to get one right here [7].
The operate:
- Receives the CSV output from the earlier operate.
- Passes via the AI Agent, which generates the info dictionary with column title, description, and information kind.
- Discover that the
description
argument is the immediate for the agent. Make it detailed and exact. - The info dictionary shall be saved as a
JSON
file utilizing a software known asFileTools
that may learn and write recordsdata. - I’ve arrange
retries=2
so we will work round any error on a primary strive.
def create_agent(apy_key):
agent = Agent(
mannequin=Gemini(id="gemini-2.0-flash", api_key=apy_key),
description= dedent("""
You're an agent that reads the temp.csv dataset introduced to you and
primarily based on the title and information kind of every column header, decide the next info:
- The info forms of every column
- The outline of every column
- The primary column numer is 0
Utilizing the FileTools offered, create an information dictionary in JSON format that features the under info:
{: {ColName: , DataType: , Description: }}
In case you are unable to find out the info kind or description of a column, return 'N/A' for that column for the lacking values.
"""),
instruments=[ FileTools(read_files=True, save_files=True) ],
retries=2,
show_tool_calls=True
)
return agent
Okay. Now we’d like one other operate to save lots of the info dictionary to the file.
Including Information Dictionary to the File’s Header
That is the final operate to be created. It can:
- Get the info dictionary
json
from the earlier step and the unique Excel file. - Add the info dictionary to the file’s header as feedback.
- Save the output file.
- As soon as the file is saved, it shows a obtain button for the consumer to get the modified file.
def add_comments_to_header(file_path:str, data_dict:dict="data_dict.json"):
"""
Use this software so as to add the info dictionary {data_dict.json} as feedback to the header of an Excel file and save the output file.
The operate takes the Excel file path as argument and provides the {data_dict.json} as feedback to every cell
Begin counting from column 0
within the first row of the Excel file, utilizing the next format:
* Column Quantity:
* Column Title:
* Information Kind:
* Description:
Parameters
----------
* file_path : str
The trail to the Excel file to be processed
* data_dict : dict
The info dictionary containing the column quantity, column title, information kind, description, and variety of null values
"""
# Load the info dictionary
data_dict = json.load(open(data_dict))
# Load the workbook
wb = load_workbook(file_path)
# Get the lively worksheet
ws = wb.lively
# Iterate over every column within the first row (header)
for n, col in enumerate(ws.iter_cols(min_row=1, max_row=1)):
for header_cell in col:
header_cell.remark = Remark(dedent(f"""
ColName: {data_dict[str(n)]['ColName']},
DataType: {data_dict[str(n)]['DataType']},
Description: {data_dict[str(n)]['Description']}
"""),'AI Agent')
# Save the workbook
st.write("Saving File... :floppy_disk:")
wb.save('output.xlsx')
# Create a obtain button
with open('output.xlsx', 'rb') as f:
st.download_button(
label="Obtain output.xlsx",
information=f,
file_name='output.xlsx',
mime='software/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
)
Okay. The subsequent step is to attach all of this collectively on a Streamlit front-end script.
Streamlit Entrance-Finish
On this step, I might have created a unique file for the front-end and imported the features in there. However I made a decision to make use of the identical file, so let’s begin with the well-known:
if __name__ == "__main__":
First, a few traces to configure the web page and messages displayed within the Internet Utility. We are going to use the content material centered
on the web page, and there may be some details about how the App works.
# Config web page Streamlit
st.set_page_config(format="centered",
page_title="Information Docs",
page_icon=":paperclip:",
initial_sidebar_state="expanded")
# Title
st.title("Information Docs :paperclip:")
st.subheader("Generate an information dictionary in your Excel file.")
st.caption("1. Enter your Gemini API key and the trail of the Excel file on the sidebar.")
st.caption("2. Run the agent.")
st.caption("3. The agent will generate an information dictionary and add it as feedback to the header of the Excel file.")
st.caption("ColName: | DataType: | Description: ")
st.divider()
Subsequent, we’ll arrange the sidebar, the place the consumer can enter their API Key from Google and choose a .xlsx
file to be modified.
There’s a button to run the applying, one other to reset the app state, and a progress bar. Nothing too fancy.
with st.sidebar:
# Enter your API key
st.caption("Enter your API key and the trail of the Excel file.")
api_key = st.text_input("API key: ", placeholder="Google Gemini API key", kind="password")
# Add file
input_file = st.file_uploader("File add",
kind='xlsx')
# Run the agent
agent_run = st.button("Run")
# progress bar
progress_bar = st.empty()
progress_bar.progress(0, textual content="Initializing...")
st.divider()
# Reset session state
if st.button("Reset Session"):
st.session_state.clear()
st.rerun()
As soon as the run button is clicked, it triggers the remainder of the code to run the Agent. Right here is the sequence of steps carried out:
- The primary operate is named to rework the file to CSV
- The progress is registered on the progress bar.
- The Agent is created.
- Progress bar up to date.
- A immediate is fed into the agent to learn the
temp.csv
file, create the info dictionary, and save the output todata_dictionary.json
. - The info dictionary is printed on the display screen, so the consumer can see what was generated whereas it’s being saved to the Excel file.
- The Excel file is modified and saved.
# Create the agent
if agent_run:
# Convert Excel file to CSV
convert_to_csv(input_file)
# Register progress
progress_bar.progress(15, textual content="Processing CSV...")
# Create the agent
agent = create_agent(api_key)
# Begin the script
st.write("Working Agent... :runner:")
# Register progress
progress_bar.progress(50, textual content="AI Agent is operating...")
# Run the agent
agent.print_response(dedent(f"""
1. Use FileTools to learn the temp.csv as enter to create the info dictionary for the columns within the dataset.
2. Utilizing the FileTools software, save the info dictionary to a file named 'data_dict.json'.
"""),
markdown=True)
# Print the info dictionary
st.write("Producing Information Dictionary... :page_facing_up:")
with open('data_dict.json', 'r') as f:
data_dict = json.load(f)
st.json(data_dict, expanded=False)
# Add feedback to header
add_comments_to_header(input_file, 'data_dict.json')
# Take away non permanent recordsdata
st.write("Eradicating non permanent recordsdata... :wastebasket:")
os.take away('temp.csv')
os.take away('data_dict.json')
# If file exists, present success message
if os.path.exists('output.xlsx'):
st.success("Achieved! :white_check_mark:")
os.take away('output.xlsx')
# Progress bar finish
progress_bar.progress(100, textual content="Achieved!")
That’s it. Here’s a demonstration of the agent in motion.

Stunning consequence!
Strive It
You possibly can strive the deployed app right here: https://excel-datadocs.streamlit.app/
Earlier than You Go
In my humble opinion, Excel recordsdata aren’t going away anytime quickly. Loving or hating them, we’ll have to stay with them for some time.
Excel recordsdata are versatile, straightforward to deal with and share, thus they’re nonetheless very helpful for the routine ad-hoc duties at work.
Nonetheless, now we will leverage AI to assist us deal with these recordsdata and make them higher. Synthetic Intelligence is touching so many factors of our lives. The routine and instruments at work are solely one other one.
Let’s reap the benefits of AI and work smarter day by day!
If you happen to preferred this content material, discover extra of my work in my web site and GitHub, shared under.
GitHub Repository
Right here is the GitHub Repository for this venture.
https://github.com/gurezende/Information-Dictionary-GenAI
Discover Me
You will discover extra about my work on my web site.
References
[1. Agno Docs] https://docs.agno.com/introduction/brokers
[2. Openpyxl Docs] https://openpyxl.readthedocs.io/en/secure/index.html
[3. Streamlit Docs] https://docs.streamlit.io/
[4. Data-Docs Web App] https://excel-datadocs.streamlit.app/
[5. Installing UV] https://docs.astral.sh/uv/getting-started/set up/
[6. Windsurf Coding Copilot] https://windsurf.com/vscode_tutorial
[7. Google Gemini API Key] https://ai.google.dev/gemini-api/docs/api-key