What if AI can skip straight to the machine code our computers actually run?
Before you dismiss this as hype, consider: every major shift in software development followed the same pattern. A new abstraction layer emerged. Skeptics said it would never work at scale. Early adopters proved them wrong. The economics shifted. And suddenly, what seemed impossible became the only way anyone built software.
That has happened four times in computing history. We might be watching the fifth—and the difference this time is that we no longer have to speculate. The research exists. The question is what it actually shows, and what it doesn't.
1952: When Programming Meant Rewiring
In the beginning, there was no "code" to write. Programming the ENIAC meant physically reconnecting cables and setting switches. If you wanted the computer to do something different, you literally rewired it.
The first major abstraction came with stored-program computers and assembly language. Instead of rewiring hardware, you could write symbolic instructions. LOAD A instead of flipping switch combinations. ADD B instead of reconfiguring circuits.
Hardware engineers hated it. Real programmers, they argued, understood the machine at the physical level. Symbolic abstraction was for people who didn't truly grasp what a computer was doing. The argument sounds familiar because it has been made at every transition since.
But the economics were undeniable. Assembly language meant changing programs in hours instead of days. Sharing them as written instructions instead of wiring diagrams. Programming became something a far wider range of people could do—and that shift in who could build things mattered more than the loss of direct hardware control.
The pattern was set: abstraction trades directness for scale.[1][2]
1972: The C Revolution
For two decades, assembly ruled. But it was completely tied to hardware—programming for an IBM mainframe looked nothing like programming for a DEC minicomputer. Software couldn't travel.
C changed that. Write once, compile for different architectures. The machine-specific details became the compiler's problem, not yours.
The skepticism was fierce and, to be fair, technically grounded: C was slower, more memory-hungry, and more abstracted from how hardware actually worked. Real systems programmers wanted to control every instruction, every register, every memory access. That level of control genuinely does produce more efficient code.
Unix proved that the trade-off was worth it anyway. Written in C, it could run on multiple architectures. Portability and shared codebases overwhelmed the performance costs—especially as compilers got smarter and hardware got faster. By the 1980s, writing operating systems in assembly was the exception. The skeptics had been right about the costs and wrong about whether they were disqualifying.
The pattern held: higher abstraction enabled portability, and portability won.[3][4]
Brooks Was Right—Just Not in the Way Usually Cited
During this transition, Fred Brooks documented something worth keeping in mind. His 1975 The Mythical Man-Month showed productivity gains from new abstractions materialise more slowly than promised.[5] His later essay "No Silver Bullet" (1987) sharpens the point: abstractions reduce accidental complexity (tooling friction) but cannot touch essential complexity (the inherent difficulty of the problem itself).[6] That distinction will matter throughout this series. Part 2 examines it in full.
1989–1995: The Scripting Era
C dominated systems programming, but it was still too unforgiving for a growing class of problems. Manual memory management meant a single pointer error could crash your program—or create a security vulnerability that wouldn't surface for years.
The late 1980s and mid-1990s brought Python (1991), Ruby (1995), JavaScript (1995), PHP (1994): automatic memory management, dynamic typing, interpreted execution. The backlash was predictable. These languages were toys. Too slow for real applications. Too loose with types to build reliable systems.
The web proved them wrong—not by eliminating the trade-offs, but by changing the question being asked. Suddenly, developer productivity mattered more than raw performance. Rapid iteration mattered more than perfect optimisation. Instagram, early Twitter, Netflix—major systems ran on Django, Rails, and Node.js.
The pattern accelerated: convenience and velocity trumped control and efficiency.[7]
One caveat worth noting: grouping Python, Ruby, JavaScript, and PHP as a single "scripting era" flattens real differences in design philosophy. They're connected less by shared architecture than by the economic context that made them all viable—cheap hardware, the web, and a massive new market of developers who weren't systems programmers.
The Abstraction Ladder: 70 Years of Programming Evolution
Each layer traded directness for scale — the fifth is underway, not yet complete
Symbolic instructions replace hardware wiring
Portable code across architectures
Automatic memory management, rapid iteration — Python, Ruby, JS, PHP
LLMs generate source code from descriptions — Copilot, GPT-4, Claude
AI generates LLVM-IR and assembly directly — Meta LLM Compiler, 546B token training, 77% of autotuning potential achieved
The Pattern:
- New abstraction emerges
- Experts resist (too slow, too limiting)
- Economics shift (productivity gains compound)
- Democratization wins (more people can build)
- Previous layer becomes legacy
Note: The democratization effect is strongest from the scripting era onward — C narrowed access relative to contemporary hobbyist environments before eventually expanding it.
Research Benchmark
Meta LLM Compiler (Cummins et al., 2024) — trained on 546B tokens of LLVM-IR and assembly — achieved 77% of the optimising potential of traditional autotuning search. Direct C-to-assembly generation has been demonstrated at research scale (Zhang et al., EMNLP 2024). Production-grade reliability remains an open problem.
The Pattern Recognition
Look at what happened at each layer:
Assembly (1950s): Expanded programming from electrical engineers to mathematicians and logicians. The competitive advantage was speed of software iteration over hardware redesign.
High-level languages (1970s): Expanded to computer science graduates, eventually to self-taught developers. Portable systems, faster development cycles. Note: C initially narrowed access compared to the BASIC and hobbyist environments of the same era—the democratisation story is strongest from the scripting era onward.
Scripting languages (1990s): Massively expanded—bootcamps, online courses, hobbyists. Rapid-iteration web services, data analysis pipelines, automation tools. Speed to market, ability to pivot quickly.
Every time, the pattern was identical: new abstraction hides complexity → experts resist → economics shift → democratisation wins → previous layer becomes legacy. Three times. In a row. Over seventy years.
2021: AI Learned to Code
GitHub Copilot launched in technical preview in 2021, with full public release in June 2022, but the underlying shift began with GPT-3 (Brown et al., 2020)[8]—the first demonstration that large language models could generate meaningful programs across multiple languages, not just autocomplete simple functions. The transformer architecture that made this possible was introduced in "Attention Is All You Need" (Vaswani et al., 2017).[9]
The skepticism followed its usual script: AI code is buggy, lacks architectural judgment, doesn't understand context. Some of that remains true—AI coding tools make confident errors in ways human programmers typically don't. But the productivity gains are real, documented, and compounding.
The pattern continued: AI abstracted away syntax mastery, and value shifted toward problem definition and system design.
The Fifth Step: This Is No Longer Just a Theory
Large language models can generate LLVM intermediate representation—the layer between high-level code and the machine code your processor actually executes. In research settings, they've demonstrated the ability to generate assembly directly from source code. The logical extension is obvious: if AI can go from natural language to assembly, why maintain Python or C as intermediary steps at all?
The field even has a name: Neural Compilation. And it now has a serious empirical benchmark.
In 2024, Meta released LLM Compiler—a model trained on 546 billion tokens of LLVM-IR and assembly code. It achieved 77% of the optimising potential of traditional autotuning search.[10] Separately, Zhang et al. (EMNLP 2024) demonstrated direct C-to-x86 assembly generation, bypassing the traditional compiler pipeline entirely.[11]
This isn't speculation. The research exists, it's peer-reviewed, and it's from serious institutions. The question is what it honestly shows—and where the gap to production currently sits.
What the Research Also Shows
The same papers that demonstrate the capability are clear about its limits. Current models are largely confined to simple programs. The failure modes are specific and persistent: instruction errors, invalid register usage, incorrect symbol handling, and memory access faults that would crash production systems. On complex real-world code, compilation success rates remain low.[11]
Most significantly: matching the output quality of mature compilers like clang-O3 remains a substantial challenge. Consistently surpassing them is a greater hurdle still.[10]
The honest summary: Neural Compilation is real, active, and further along than most people outside the field realise. It is not yet a replacement for the compiler toolchains running the world's production software. Research-grade proof of concept and production-grade reliability are separated by a gap that has swallowed more than one promising technology.
Why This Step Is Different from All the Others
Every previous abstraction layer added something while removing something else. Assembly added symbolic programming, removed direct hardware control. C added portability, removed architecture-specific optimisation. Scripting languages added productivity, removed performance predictability. AI coding added accessibility, removed the need for syntax expertise.
But humans could still read and understand what was happening at each layer. You could look at assembly and see what the CPU would do. Read C and understand the algorithm. Debug Python and trace the logic. The entire history of software tooling—debuggers, profilers, security audits, code review—assumes a human can open the hood.
Binary code generated directly by AI removes that entirely. This isn't just another rung on the abstraction ladder. It's potentially the removal of the ladder itself.
The implications aren't trivial: security auditing, regulatory compliance, debugging production failures, and the basic ability to understand what your systems are doing all depend on some form of human-readable source code existing somewhere in the chain. Whether the economics will eventually overwhelm those concerns—as they did at every prior layer—is a genuine question. The research shows we are moving in that direction. It does not show we have arrived.
What Seventy Years Actually Tell Us
The skeptics at every transition were right about the costs. They were wrong to assume those costs would be disqualifying. That's the real lesson—not that abstraction always wins, but that the costs of abstraction have consistently been smaller than the costs of not abstracting.
We now have evidence that the fifth transition has begun. Meta's LLM Compiler, the Neural Compilation research, the direct assembly generation experiments—these are published, peer-reviewed results showing that AI systems can operate at the level of machine code with meaningful competence.
What we don't yet have is evidence that the gap to production has closed, that the auditability problem has been solved, or that the economics favour eliminating source code for the kinds of software enterprises actually depend on. The historical pattern is a strong prior. The new research makes it stronger. Neither is a guarantee.
The abstraction ladder took 70 years to climb. The research suggests AI may be able to skip it. Whether that transition follows the historical pattern or stalls at the research stage is the question this series will examine.
This is Part 1 of 7: The Last Abstraction — What Happens When AI Skips the Source Code
Next: Part 2 — What We Gave Up at Each Layer
Referenced Readings
- [1]"Code: The Hidden Language of Computer Hardware and Software" by Charles Petzold (2000) — Traces the transition from physical switches to symbolic abstractions, revealing exactly what hardware intimacy meant and how assembly created the first layer of separation.
- [2]"Hackers: Heroes of the Computer Revolution" by Steven Levy (1984) — Captures the cultural dimension: hardware hackers saw themselves as artists working directly with the medium. Assembly felt like working through an intermediary.
- [3]"The Rise of Worse is Better" by Richard P. Gabriel (1991) — Explains the paradox: C won because it sacrificed optimal performance. The ability to write portable code beat architecture-specific optimization in the marketplace.
- [4]"The Development of the C Language" by Dennis Ritchie (1993) — First-hand account of the conscious trade-offs made in C's design: simplicity and portability over optimal performance.
- [5]"The Mythical Man-Month" by Fred Brooks (1975) — Primarily about team coordination and schedule estimation during large software projects. Useful for understanding why productivity gains from abstraction materialize more slowly than promised — but not the primary source for abstraction trade-off analysis.
- [6]"No Silver Bullet — Essence and Accident in Software Engineering" by Fred Brooks (1987) — The correct reference for evaluating abstraction trade-offs. Introduces the essential vs. accidental complexity distinction: new abstractions reduce accidental complexity (tooling friction) but cannot reduce essential complexity (the inherent difficulty of the problem).
- [7]"The Soul of a New Machine" by Tracy Kidder (1981) — Captures the engineering culture that valued explicit resource control and took pride in understanding exactly what code did and what it cost.
- [8]"Language Models are Few-Shot Learners" (GPT-3) by Brown et al. (2020) — Demonstrates LLM capabilities for code generation across multiple languages, marking the shift from autocomplete to generative programming assistance.
- [9]"Attention Is All You Need" by Vaswani et al. (2017) — The transformer architecture paper that underpins modern LLMs, including AI coding tools. Relevant to the AI-assisted coding section; not connected to the 1990s scripting language era.
- [10]"Meta Large Language Model Compiler (LLM Compiler)" by Cummins et al., Meta (2024) — Trained on 546B tokens of LLVM-IR and assembly code. Achieved 77% of the optimising potential of traditional autotuning search, and demonstrated disassembly round-trip from x86_64 and ARM assembly back into LLVM-IR. The benchmark that puts Neural Compilation on firm empirical footing.
- [11]"Towards AI-Native Software Development: C-to-Assembly Generation via LLM" by Zhang et al. (Findings of EMNLP 2024) — Studied direct source-to-assembly translation, bypassing traditional compiler pipelines. Documents both the capability and its current limits: models handle simple programs but face persistent failures with register allocation, symbol handling, and memory access on complex real-world code. Note: published in Findings of EMNLP, not the main proceedings.
