Introducing Span's AI Effectiveness suite, powered by agent traces
Introducing Span's AI Effectiveness suite,
powered by agent traces
Insights
Stack Trace Podcast
Where Does Engineering Work Go When AI Writes the Code?
Where Does Engineering Work Go When AI Writes the Code?
Span Team
•
Expel VP of Engineering Jeffrey Wescott on the month-long experiment that challenged a group of engineers to stop writing code by hand, the metric he thinks is a dead end, and why the real work is moving to the ends of the process. A full audio version of this interview is on Spotify.
Three Takeaways
The work is relocating from the middle to the ends. As models absorb the act of writing code, an engineer's leverage shifts to the two decisions on either side of it: getting precise about what to build before any line of code is generated, and proving rigorously that the result does what was intended. Quality engineering returns as a discipline.
Adoption and scaling AI effectiveness comes from experiments, not mandates. Expel's fastest gains came from handing a small group of senior engineers, skeptics included, the room to test the tools for themselves.
Tokenmaxxing is a vanity metric. Burning tokens is trivially gameable and measures nothing that matters. The number worth watching is value relative to token spend, and beneath that, business impact.
Ask engineering leaders what AI changes about their teams, and the conversation often stalls on the wrong question: should AI be writing the code? In practice that one is mostly settled, since on many teams, agents already write code every day. The better question is what engineers should focus on once they do.
Jeffrey Wescott, VP of Engineering at Expel, a managed detection and response (MDR) provider, has a clear answer. In a conversation with our Field CTO Stephen Poletto on the first episode of Span’s podcast, Stack Trace, he argues that the value in software is moving away from writing code and toward the two decisions that sit on either side of it: knowing exactly what to build and proving that what got built works.
This middle, where someone sat and typed the implementation, is the part that compresses. Judgment moves to the ends.
Lower the Floor, Raise the Ceiling: AI's Impact on Software Development
Why is the middle collapsing? Wescott uses a framing worth borrowing: AI lowers the floor and raises the ceiling at the same time.
Lowering the floor is the long tail of work nobody bothered to automate because the math never cleared. A task that costs four minutes a day but three and a half hours to script was never worth scripting, so it stayed manual forever. Now the script is nearly free, so it gets written, and the set of things worth automating grows until much of the friction in front of velocity disappears.
Raising the ceiling runs the other way: people taking on work that used to sit outside their expertise, leaning on the model for the specialized knowledge while supplying enough judgment to steer it. The ceiling lifts not by replacing expertise but by extending the reach of whoever holds the intent.
Both are good for output. The open question is what all that extra volume does to the long-term quality and maintainability of a system. Wescott is honest that nobody knows yet; it is too early to tell. That raises the obvious follow-up for any leader: if engineers write less of the code, what should be their priorities? Expel ran an experiment to find out.
Expel's Experiment: One Month Without Writing Code By Hand
When Wescott joined Expel, AI usage spanned the full range, from daily power users to vocal skeptics. Instead of mandating adoption from the top, he pulled together about ten engineers, deliberately including the loudest skeptics, and called the group the AI Coding Crew. The rule for the next month was simple: stop writing code by hand and delegate it to agents. They could still spec the work, review it, and iterate on what came back; they just could not type the implementation themselves.
Two weeks in, the verdict was unanimous, skeptics included: none of them had realized how far the tools had come, or how much it would change what they spent time on. One engineer raced an agent on a familiar, mechanical task he had done by hand many times and lost, badly enough that he stopped competing on implementation and reoriented around being precise about what he wanted built. The pattern repeated across the group. The clearer the engineer was about the goal, the better the agent performed.
This is the front end of the shift in practice. As the middle compresses, the time does not vanish. It moves forward into defining the problem well enough that an agent can execute it.
The New Bottleneck: Proving What You Built Works
If the front of the process is about specifying intent, the back is about verifying it, and that is where Wescott expects the bigger change.
His read, having watched this pendulum swing before, is that quality engineering makes a comeback. In the packaged-software era, when shipping was expensive, QA was a primary discipline. Continuous deployment dissolved it into "engineers test their own code."
Now the role narrows to one question: how do I prove to myself that this system does what I intended? Increasingly, the code an engineer writes by hand is the code that validates the system, while the model builds it.
Placement Beats Adoption
Nowhere is the urge to bolt AI onto everything stronger than in security, where the narrative is that attackers now operate at AI speed and defenders just need to match them. Expel shows why it is not that simple.
Expel ingests hundreds of billions of events a day, millions a minute, and Wescott still believes compressing that noise into signal is a human's job, not an LLM's. An LLM cannot sit in that path: the latency is prohibitive and the economics are worse, subsidized tokens or not. So the question is never whether to adopt AI, but where in the funnel an agent should sit. As an example, Wescott points to two spots:
Triage, where false positives are survivable and false negatives are not.
Detection authoring, where a model studies the events that slipped through and writes the rule that catches them next time, turning hours of hand-work into minutes.
The discipline is in the placement, not the adoption.
Why Tokenmaxxing Is a Dead End
The metrics follow the same logic. Asked about tokenmaxxing, the fixation of the moment, Wescott called it the most gameable thing imaginable. Tell someone to burn tokens, and they will burn tokens.
What matters is value relative to token spend, and underneath that, the thing every engineering org is really trying to measure: impact. The questions that get asked are whether the work:
saved the business money,
shipped something customers will pay for, or
retired a long-standing source of wasted developer time.
For Wescott, moving people up the AI-native maturity curve matters more than policing spend right now, especially in the narrow window where premium-plan tokens are still subsidized. That window will close: costs will rise, open models will catch up, and the market will equalize. What replaces tokenmaxxing is a real budgeting discipline: deciding whether a task needs the frontier model at all, or whether a cheaper, older, good-enough model will do. Model selection becomes a capital-allocation decision instead of a default.
What this Means for Engineers
For anyone earlier in their career, Wescott's advice is simple: the highest-leverage thing an individual contributor can do right now is upskill. Reaching AI-native fluency matters more than almost anything else in career terms, and the same holds for leaders. The people who compound the most value in the next phase of the industry are the ones who learn to use these tools early.
Judgment is moving to the ends, and that is where engineers should follow.
For more episodes of Stack Trace, subscribe to Span's YouTube channel or to the Stack Trace podcast on Spotify.
Everything you need to unlock engineering excellence
Everything you need to unlock engineering excellence