We Completed a Major Application Migration in 78 Minutes Using Agentic AI. Here Is What We Learned.

We Completed a Major Application Migration in 78 Minutes Using Agentic AI. Here Is What We Learned.

An afternoon that changed our delivery calculus

On a Friday in mid-March, one of our lead AI engineers set up a controlled test on a real internal application and pressed go. Seventy-eight minutes later, the migration was complete: nine commits, forty-seven files touched, automated checks green across the board. The run added just over ten thousand lines of code, removed nearly thirty thousand, and cleared every review area and test assertion we defined for success. This was not a demo. It ran against production-grade code with the same constraints our delivery teams face every week.

To be clear about what "production-grade" means here: this was not a simple starter project, and the migration was not a search-and-replace operation. The application is part of our project billing and timekeeping system, meaning downtime or regressions directly affect how we track and bill client work. It runs React 16 with the classic JSX runtime, uses antd v4 with LESS theming, styled-components v4, and Tailwind CSS. It has nineteen feature modules and three different state management libraries (Redux, reactn, and Zustand). Migrating from CRA to Vite 8 meant changing the build system, the dev server, the module resolution strategy, the environment variable conventions, and the plugin architecture all at once, while keeping every feature module, styling layer, and state management integration working exactly as before.

A senior engineer estimating this work manually would typically scope it at two to three weeks: dependency research, build configuration, incremental testing across all those integration points, careful regression testing for a business-critical application, and the inevitable surprises that surface when framework assumptions change under a complex codebase. For an application that touches billing, you test twice and deploy carefully. The agent completed it in seventy-eight minutes.

That result changed how we think about technical debt. Work that would normally sit in a backlog for weeks, competing for senior engineering time against feature delivery, moved from backlog to done in a single session. The agent did not operate on instinct. It followed a mission that our engineer designed, with consistent verification at each step. The engineer stayed in control of the objectives, the constraints, and the definition of done. The agent handled the execution volume.

For context, agentic AI is not simply another way to generate code. It is a pattern where systems act toward a goal, take steps, and make bounded decisions under human direction. We are putting that pattern to work on real delivery problems, and the results speak for themselves.

What we asked the agent to do and how we framed success

We started by writing down the mission in plain terms: the objectives, the constraints, the definition of done, and the boundaries of what the agent was allowed to touch. We collected all of this in a project directory that serves as a contract between the human who defines the work and the system that executes it.

Inside that directory sat a concise, one-page skill document. In one hundred and three lines, it spelled out each step the agent should take, how to check its own work, and what to do when a check failed. It was simple by design and specific to the application in front of us.

We tied success to objective checks. A scripted review tool scored the results against preset code review areas, and we required the run to pass every category before merging. We exercised the application in a browser and at the command line, and recorded the assertions so that the agent and the humans were verifying against the same expectations. That evidence was retained with the run, so our team had a complete record of what passed, when, and why.

The run was not frictionless. About eighteen minutes into the migration, the agent hit a wall: Vite 8's new Oxc-based transform does not process JSX inside .js files, and styled-components v4 has a transpilation bug under Vite's ESM dev mode. These are not well-documented issues, and they required the agent to diagnose, test, and resolve two non-obvious problems before it could continue. That twenty-one-minute stretch on a single commit is the honest part of the story. The agent is fast at volume work, but it still has to work through real debugging when it encounters something unexpected. The difference is that it did so within the structured mission, documented what it found, and kept moving.

None of this replaced judgment. It amplified it. The engineer defined the mission, set the verification criteria, and reviewed the results. The system handled the execution volume within those guardrails. This mirrors what research on human-in-the-loop programming recommends: keeping people in control of AI-assisted code generation, especially when correctness and security matter.

The structured practice that made speed safe

Speed without safety is not useful. What made this run valuable was the operating practice around the agent. We follow a straightforward approach: give the system project-specific context so it understands the shape of the codebase, express the work as a sequence of verifiable steps, insert human checkpoints at transitions that carry risk, and hold every change to the same quality gates that a human-only process would face.

This is how our AI Engineers operate. They are software engineers equipped with AI tools for coding and process automation, and the discipline around the tools is what produces reliable results. We treat the agent as an execution engine inside a process that is already accountable.

Our experience aligns with what others are finding. Google Research has reported that combining multiple AI-driven tasks to assist migrations works effectively when the workflow integrates prediction, diff generation, and test validation. Independent research on agent-driven migrations recommends environment-driven, test-centric loops where agents plan, run, and verify within a continuous feedback cycle. Cross-disciplinary workshops on automated programming emphasize the same fundamentals: robust validation, transparent decision logs, and practices that bridge prototypes to production.

We arrived at these practices through our own work. The external research confirms the direction.

Verification, audit trails, and why the record matters

When leadership sees a result like this, two questions follow immediately: how do we know it is right, and how will we account for it later. We built the run so that both answers were straightforward.

Commit history on a laptop screen being reviewed by an engineer

Every change landed through commits that explicitly recorded the agent's co-authorship alongside the engineer. That gives reviewers a clear signal that an agent participated and gives anyone looking at the record later a reliable trace of where, when, and how the system contributed. The run passed every automated test assertion we defined at the outset, and those results are preserved with the mission logs. We can reconstruct exactly what happened because we planned to keep the record from the start.

This matters beyond our own team. Regulatory frameworks are moving toward exactly these expectations. NIST's Generative AI profile calls for change-management controls, auditable assessment, and clear role definitions when humans and AI work together. The EU AI Act requires traceability, human control, and logs that support monitoring and investigation. The way we structured this run already aligns with that direction.

AI handles the volume; we provide the judgment and the record.

Where this approach fits, and where it does not

We are direct about the boundaries because credibility depends on knowing them.

The migration worked because the conditions were right. The application had tests we could trust. We could run it locally and in the pipeline, observe behavior quickly, and decide whether to proceed. Good testing and rapid verification cycles are the prerequisites. When those are in place, this approach delivers.

CI dashboard showing test results while an engineer observes

We would not pick this approach for a system with heavyweight change control where every modification requires extensive documentation, multiple approval boards, and long lead times. The coordination overhead can outrun the speed gains. The practical guidance from our experience is straightforward: favor targets with good test coverage and fast deploy-verify cycles. Be cautious where tests are thin, rollback is slow, or compliance requires extensive pre-approval.

That maps to what others are finding. Google Research has reported that AI-assisted migration pays off when model suggestions integrate with diff generation and test validation inside the developer workflow. Research on agent-driven migrations emphasizes the same prerequisites.

This is a powerful option in the right conditions. It is not a universal replacement for engineering judgment.

What changes inside teams: roles, skills, and trust

The real shift is in how our engineers spend their time. On this run, our AI Engineer did not type every change. They wrote the mission, encoded the checks, watched the gates, and decided when to step in. The skill is part engineering, part orchestration: writing small, precise instructions that a system can follow, and structuring validation so that a pass is meaningful and a failure is informative.

We have AI Engineers on our teams for exactly this reason. They are software engineers equipped with AI tools for coding and process automation. The role shifts from repetitive execution to directing and validating. The human stays in control, uses system outputs as proposals, and applies context and judgment at the points that matter most.

Most developers are either already using AI tools or plan to do so. The question for leaders is not whether their teams will use these tools, but whether they have clear operating procedures, human checkpoints, and accountable processes in place when they do. That is the investment we are making.

What we recommend for leaders considering this approach

Start where the odds of success are highest and the lessons will transfer. That is what we did. Pick a well-understood internal system with tests you trust and a deployment path that lets you verify quickly. Define a mission in writing so that your team and your tooling are aligned on scope and success. Make the skill document small and specific: just enough instruction to guide the system through the steps with clear, measurable checks. Retain the review results and test outputs with the rest of the record.

Set expectations for governance at the outset. Require commits that clearly record where the system contributed and where the human approved. Keep a log that captures the agent's actions and the checkpoints where a person made a call. Those habits will help you answer questions from your risk office later, and they align with the direction NIST and other frameworks are heading.

Measure what matters. We tracked elapsed time, commit and file counts, and pass rates on objective review and test criteria. Those are the metrics that let you compare agent-assisted runs to your baseline, and they are the evidence leadership will want before deciding to expand.

This is a fast-moving space, but it does not reward shortcuts. We are gaining hands-on experience now so we can help our clients navigate it with evidence, not aspiration. The management conversation about agentic AI is already underway. We intend to be in it with real results to point to.

Sources

  1. Agentic AI, explained. MIT Sloan Ideas Made to Matter. 2025.
  2. HiLDe: Intentional Code Generation via Human-in-the-Loop Decoding. arXiv. 2025.
  3. Environment-in-the-Loop: Rethinking Code Migration with LLM-based Agents. arXiv. 2026.
  4. Accelerating code migrations with AI. Google Research Blog. 2024.
  5. Automated Programming and Program Repair. Dagstuhl Reports, Vol. 14, Issue 10. 2024.
  6. Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile. NIST AI 600-1. 2024.
  7. Article 14: Human Oversight. EU Artificial Intelligence Act.
  8. Article 12: Record-keeping. European Commission AI Act Service Desk.
  9. 2025 Stack Overflow Developer Survey. Stack Overflow. 2025.