to make use of AI to construct a mathematical optimization mannequin for an actual enterprise downside, you’ve in all probability run into the identical wall: the AI works fantastically on textbook examples and falls aside the second you hand it your precise knowledge and your precise downside.
That hole isn’t a coincidence. It’s by design, and it’s the explanation why I constructed ORPilot.
The Promise of AI-Powered Optimization
Operations Analysis (OR) has been quietly powering a number of the most impactful choices in enterprise for many years — routing supply vans, scheduling manufacturing facility manufacturing, designing provide chains, allocating cargo to carriers. The mathematics is mature and the solvers are glorious. The bottleneck has all the time been the human experience required to translate a enterprise downside right into a mathematical mannequin.
Giant Language Fashions (LLMs) appeared like the proper answer. A rising physique of analysis, together with the OptiMUS sequence, OR-LLM, and others, has proven that state-of-the-art LLMs can generate right solver code for well-specified linear programming (LP) and combined integer programming (MIP) issues. The outcomes seemed spectacular. The demos have been compelling.
Then you definitely attempt to use one in all these instruments on an actual downside, and the cracks seem instantly.
The place Current Instruments Break Down
Nearly each LLM-for-OR instrument constructed up to now shares a hidden assumption: the issue description is full, unambiguous, and handed to the AI in a single, well-formatted immediate with all the information neatly embedded inline.
That isn’t how actual OR issues work. Not even shut.
Contemplate what really occurs when a provide chain workforce desires to construct an optimization mannequin:
- The issue description is incomplete and ambiguous. A enterprise analyst will say “we wish to decrease transportation prices” and neglect to say that every distribution heart has a throughput restrict, that some routes don’t exist, or that opening a facility incurs a one-time mounted value. These omissions aren’t
carelessness. They’re assumptions the analyst considers apparent, which is strictly why they’re harmful. An AI system that begins modeling earlier than these particulars are nailed down produces a mannequin that’s technically right however virtually flawed. - The info is just too massive to slot in a immediate. An actual provide chain downside would possibly contain a whole lot of manufacturing websites, distribution facilities, prospects, and hundreds of merchandise over a number of durations. The demand desk alone might need tens of millions of entries. You can’t embed that in a immediate. Even when you may, flooding the context window with uncooked knowledge dramatically will increase the chance of hallucinations.
- The info you’ve is just not the information the mannequin wants. The mannequin would possibly want a distance matrix between all pairs of places. What you’ve is a desk of GPS coordinates. The mannequin would possibly want mixture demand by product and interval. What you’ve is a transaction ledger with one row per order. Bridging this hole, specifically computing derived parameters from uncooked knowledge, is a big engineering step that no current LLM-for-OR instrument handles mechanically.
- Upon getting a working mannequin, portability and reproducibility matter. If you wish to re-run the mannequin on up to date knowledge, change from Gurobi to an open-source solver, or hand the mannequin off to a colleague on a unique machine, you’re again to sq. one except the instrument produces a sturdy, solver-agnostic artifact. Most instruments produce solver-specific code and nothing else.
These aren’t edge instances. They’re the usual circumstances for any real-world OR deployment. Current LLM-for-OR instruments have been constructed for a unique world, a textbook world, and so they present their seams the second they go away it.
Introducing ORPilot
ORPilot is an open-source AI agent constructed from the bottom up for manufacturing circumstances. It’s, to my data, the primary LLM-based OR instrument designed explicitly for the messy, large-scale, data-heavy actuality of commercial optimization.
Most AI instruments for optimization leap straight to writing code the second you describe your downside. ORPilot does one thing completely different: it asks questions first.
That design determination, prioritizing understanding over pace, displays a single tenet: an AI agent ought to work the identical method a talented human OR advisor would.
A great advisor doesn’t stroll right into a consumer assembly and begin writing a mathematical mannequin on the whiteboard. They ask questions. They hear fastidiously. They push again when one thing
is ambiguous. They be certain that the information is in the precise form earlier than the modeling begins. Solely in spite of everything of that do they choose up the pen.
ORPilot’s pipeline displays this self-discipline by 5 sequentially related levels.
Stage 1: Interview Agent
The interview agent is the entry level. It receives your preliminary description of the enterprise downside, which will be imprecise, incomplete, and even self-contradictory, and engages you in a
structured dialog to fill within the gaps. The important thing design precept is not any modeling begins till the interview is full.
The agent is prompted to determine info gaps within the present description, ask at most one focused clarifying query per flip (to keep away from overwhelming you), and terminate as soon as the target perform, determination variables, constraints, and knowledge necessities are all unambiguously specified.
In apply, this implies conversations like:
ORPilot: “As soon as a facility is opened, does it stay open for all subsequent durations, or can or not it’s closed later?”
ORPilot: “Does this mannequin deal with a single product kind or a number of merchandise?”
ORPilot: “You talked about a transportation value. Is that this value per unit shipped, per cargo no matter amount, or one thing else?”
Earlier than ending the interview, the agent presents a full structured abstract with goal perform, determination variables, constraints, parameters, indices, and provides you the possibility to right something earlier than that abstract is handed downstream. That is the guard in opposition to the commonest failure mode in LLM-for-OR instruments: modeling the flawed downside.
Stage 2: Information Assortment Agent
This stage has no counterpart in most of current LLM-for-OR instrument. It is among the most vital structural improvements in ORPilot.
Most current LLM-for-OR instruments assume the information is embedded in the issue textual content, sufficiently small to slot in a immediate. For textbook issues, this works. For actual issues, it breaks down in two methods. First, actual datasets are too massive. For instance, a 500-customer, 500-product, 12-period provide chain downside would have 3,000,000 demand entries. Second, embedding knowledge within the immediate inflates hallucination threat and burns by context window unnecessarily.
ORPilot’s reply is to deal with knowledge as separate from the immediate solely. Information lives in CSV information. The AI accesses it solely by writing and executing code. The info assortment agent’s job is to determine precisely what these CSV information have to appear like.
Primarily based on the issue specification from the interview agent, the information assortment agent determines:
- Which entities (units) exist within the mannequin
- What attributes (parameters) every entity wants
- The exact schema for every required desk: column names, varieties, semantics
It presents this specification to you and waits till you’ve provided all of the information within the right format. It validates completeness earlier than continuing.
Crucially, the agent is versatile: when you don’t have a specific piece of model-ready knowledge (say, the mannequin wants a distance matrix however you solely have GPS coordinates), you inform the agent what you even have, and it updates the schema accordingly — passing the hole to the following stage to deal with.
Stage 3: Parameter Computation Agent
Nearly each current LLM-for-OR instrument assumes the numerical portions wanted by the mannequin seem straight within the user-supplied knowledge. In apply, that is virtually by no means true. Two examples that come up continuously in actual OR issues:
- A car routing mannequin wants a pairwise distance matrix. The person has GPS coordinates. Computing Euclidean or geographic distances is a metamorphosis solely outdoors the scope of LP/MIP formulation.
- A multi-period manufacturing mannequin wants mixture demand per interval. The person has a transaction ledger with one row per order. The mannequin parameter is a sum-aggregation that must be computed from the uncooked knowledge.
The parameter computation agent bridges this hole mechanically. It receives the issue specification and the uncooked CSV information, then:
- Identifies which mannequin parameters can’t be learn straight from the uncooked tables
- Generates a Python script to compute these derived parameters
- Executes the script in a sandboxed atmosphere
- Writes the outcomes as further CSV information, handed to the modeling step
This ensures that by the point the modeling agent sees the information, it’s clear, appropriately typed, appropriately listed, and model-ready. In our experiments, this step considerably diminished code technology failures and retry counts.
One other frequent state of affairs the place the parameter computation agent might be helpful is computing BigM values. In some experiments that I did on ORPilot, the parameter computation agent computed a BigM worth wanted for constraints linking steady cargo variables to binary facility-opening choices. This can be a derived parameter that will be impractical to ask the person to supply straight.
Stage 4: Code Technology Agent
With a whole downside specification, uncooked knowledge, and derived parameters all in hand, the code technology agent produces a whole Python solver script in your chosen backend. ORPilot at the moment helps 5 backends: Gurobi, CPLEX, PuLP, Pyomo, and OR-Instruments.
The generated code is straight away executed in a sandbox. If something goes flawed: syntax error, runtime exception, or an infeasible/unbounded solver end result, the total error message and traceback are fed again to the LLM together with the beforehand generated code. The agent retries, as much as a user-configurable most variety of makes an attempt.
In apply, the vast majority of failures are resolved inside one or two retries. The important thing cause ORPilot’s retry loop is efficient is that the upstream levels have already finished the laborious work: the issue is appropriately specified, the information is model-ready, and the agent solely
wants to repair a code-level mistake slightly than rethink the complete mannequin construction.
Stage 5: Reporter Agent
After a profitable clear up, a reporter agent interprets the numerical outcomes into plain English, explaining which amenities to open, what routes to make use of, what portions to provide, within the area language of the unique enterprise downside, for consumption by a enterprise person slightly than an OR knowledgeable.
Why This Order Issues
The pipeline is intentionally sequential. Every stage is gated on the earlier one finishing efficiently. The interview should end earlier than knowledge assortment begins. Information have to be validated earlier than parameter computation runs. Parameters have to be prepared earlier than code is generated.
This sequencing prevents the commonest failure mode in LLM-based OR instruments: cascading errors the place an ambiguous downside description propagates by the pipeline and produces code that’s syntactically legitimate however fashions the flawed goal.
What This Seems Like at Scale
I examined ORPilot on a couple of OR issues, one in all which is a provide chain community design downside with 50 manufacturing websites, 50 distribution facilities, 500 prospects, 500 merchandise, 12 durations. The ensuing mannequin had greater than 9.7 million determination variables and 963,000 constraints. ORPilot efficiently dealt with the total pipeline finish to finish, from the preliminary dialog by knowledge assortment, parameter computation, code technology, and answer reporting, producing an optimum answer with Gurobi. Try my paper right here https://arxiv.org/abs/2605.02728 to see the outcomes of extra take a look at issues.
Getting Began
ORPilot is open supply and out there now:
GitHub: https://github.com/GuangruiXieVT/ORPilot
Paper: https://arxiv.org/abs/2605.02728
Set up takes a couple of minutes. ORPilot helps OpenAI, Anthropic, Google, and DeepSeek as LLM suppliers, and Gurobi, CPLEX, PuLP, Pyomo, and OR-Instruments as solver backends.
Within the subsequent put up on this sequence, we’ll take a deep dive on the Intermediate Illustration (IR) — the solver-agnostic JSON artifact that makes ORPilot’s outcomes reproducible and transportable throughout backends with out ever calling the LLM once more. Keep tuned!

