The Definitional Problem
Timeline forecasts for AGI span from 2027 to never, yet forecasters are often looking at identical data. The disagreement is less about technology than about what question is being asked. "AGI" has been used to mean everything from "a model that reliably books a dinner reservation" to "a system that ends human mortality." Until the target is fixed, the timeline is meaningless.
Three Thresholds That Actually Matter
The most productive framing separates three distinct and increasingly demanding capability thresholds, each with a genuinely different expected timeline:
| Threshold | Definition | Also called | Central estimate |
|---|---|---|---|
| Digital AGI | Human-level on all screen-based cognitive tasks — email, code, law, analysis — without the jagged failure modes of current models | White-collar AGI, Proto-AGI | 2031–2034 |
| Transformative AGI | AI deployment that precipitates economic restructuring on the scale of the Industrial Revolution — accelerating R&D, reshaping labor markets, compressing decades of scientific progress | TAI, Economic AGI | 2038–2048 |
| Full HLMI | Unaided machines that accomplish every task better and more cheaply than human workers across all domains, including physical, scientific, and creative work | High-level machine intelligence | 2043–2050+ |
Table 1. The three capability thresholds used throughout this analysis. The large variance in public AGI timelines traces almost entirely to conflation of these levels — not genuine disagreement about the pace of technical progress.
The DeepMind tiered framework classifies systems from "Emerging" (slightly better than an unskilled adult) through "Expert" (90th percentile) to "Superhuman." As of early 2026, frontier models sit firmly in the "Competent" band generally, while reaching "Expert" or "Virtuoso" in narrow domains like Python or mathematical translation. Digital AGI corresponds to reaching Level 3 across all cognitive domains. Full HLMI corresponds to Level 5.
Switching the expert elicitation from "all tasks" to "all occupations fully automatable" shifts the 50th-percentile forecast from 2047 to 2116 in the ESPAI 2023 survey — a 70-year gap produced by definitional choice alone, with no technological disagreement involved.
The Forecasting Record
The Expert Survey on Progress in AI (ESPAI 2023) — the largest systematic elicitation of AI researchers to date, with 1,714 respondents — implies a 10% probability of HLMI by 2027 and a 50% probability by 2047. Metaculus community forecasters (1,800 participants) place "weakly general AI" at February 2028 and "general AI" at July 2033. Professional superforecasters are more conservative still: a 43-member panel assigned just 12% probability to AGI by 2043 and 40% by 2070.
These estimates must all be read against a critical empirical corrective: a retrospective analysis of 2022–2025 AI benchmark milestones found that both domain experts and superforecasters systematically underestimated realized AI progress. Math-competition-level performance arrived earlier than median forecasts predicted. ARC-AGI-1 scores jumped from under 10% to over 90% in under two years. The raw survey medians have therefore been adjusted upward throughout this analysis to account for this documented bias.
The probability distributions in Section 4 incorporate an upward correction from raw survey anchors — accounting for documented underestimation bias in near-term AI forecasting, real-time 2025–2026 benchmark trajectories, and the definitional correction mapping "Digital AGI" to the lower end of the HLMI distribution. The result is more optimistic than raw surveys and more conservative than frontier lab leadership.
Three Scenarios
Recursive self-improvement compounds faster than infrastructure friction. Agentic systems displace research engineers by Q4 2026, triggering a capability explosion.
Digital AGI arrives this decade. Transformative AGI follows in the 2038–48 window as infrastructure and algorithmic efficiency co-evolve.
Economic bubble dynamics, regulatory intervention, or hard capability ceilings push broad-domain AGI well past mid-century.
Fast Takeoff: Technically Coherent, Industrially Fragile
The accelerationist thesis rests on a specific and falsifiable mechanistic claim: that current models are already accelerating AI research via code generation and synthetic data loops, creating a recursive self-improvement dynamic that will compound rapidly. The evidence is not speculative. Benchmark progress on ARC-AGI-1 traces a trajectory from under 10% in 2023 to over 90% by early 2026 — consistent with superlinear acceleration in abstract reasoning. Anthropic and OpenAI report measurable engineering productivity gains from agentic coding systems. The recursive loop is real and already running.
But three hard constraints prevent a confident 2027 central estimate. The 19 GW gap between projected AI data center demand and available US grid capacity is a physical constraint — nuclear plants and transmission lines do not move at the speed of software. Inference costs for reasoning-first architectures are orders of magnitude higher than prior-generation LLMs, creating an economic viability ceiling for mass deployment. And prediction markets, which have real money at stake, assign only 9–10% probability to AGI by 2027 — pricing in exactly the industrial friction that optimistic statements tend to elide.
The fast-takeoff scenario carries approximately 10–15% probability. The most important leading indicators: ARC-AGI-3 results in March 2026, nuclear compute campus groundbreakings, and whether agentic systems demonstrably replace — rather than merely assist — research engineers by Q4 2026.
The Central Case: 2029–2035 for Digital AGI
The institutional scenario is where the evidence is most convergent. The SWE-bench trajectory, rising from 2 percent in 2023 to roughly 40 percent in 2026, METR’s documented doubling of long-horizon agent task completion every seven months, and Epoch AI’s Capabilities Index extrapolations collectively suggest that reliable software engineering will be achieved within this decade. However, the move from roughly 40 percent to 90 percent and beyond on SWE-bench is not just curve extrapolation. It represents a transition from stochastic generation to high-reliability agency.
The “last mile” is fundamentally about integration, context window stability, deployment tooling, and cost compression. Reaching the “five nines” level of reliability required to remove human oversight entirely means solving complex multi-system coordination problems, preventing context drift over long execution chains, and reducing inference costs enough to make autonomous agents economically viable at scale. These are difficult challenges, but they are engineering and systems challenges that appear to scale along relatively predictable trajectories.
For Transformative AGI, the strongest central estimate remains in the 2038–2048 window. This range aligns with inside-view transformative AI models at the 50th percentile, the Metaculus community median, and expert survey medians once definitional framing artifacts are removed. Even if Digital AGI emerges in a lab by the early 2030s, the constraint shifts from raw capability to absorption capacity.
At this stage, the bottleneck is not model performance alone. It is physical infrastructure buildout, institutional adaptation, regulatory friction, and deployment into domains where feedback loops are slow and capital intensive, such as biology, physical science, and governance. Transformative impact depends as much on how quickly societies integrate these systems as on when they are first demonstrated.
The Long Slog: Non-Negligible and Underweighted
A one-third probability on much later timelines is not negligible. Scenarios that support it include a severe AI investment bubble correction that materially slows frontier training runs, regulatory responses triggered by a major AI-attributed biosecurity or cybersecurity incident, or model collapse in non-verifiable domains that prevents generalisation beyond benchmarkable tasks. It also includes the possibility, raised by researchers such as Demis Hassabis and Yann LeCun, that moving from large-scale pattern matching to genuine hypothesis generation in physical science requires architectural innovations not yet visible in the current scaling-law paradigm.
Professional superforecasters provide a calibrated external signal for this slower trajectory. Their estimates, roughly 40 percent by 2070 and 60 percent by 2100, reflect historical base rates for transformative technology transitions rather than laboratory benchmarks. Their conservatism is not dismissal of progress but an expression of uncertainty grounded in long-run technological diffusion dynamics.
The Rolling Singularity
Treating AGI as a single binary event is the most common and costly analytical mistake in this domain. Intelligence is not a monolith — it is a collection of capabilities with vastly different data requirements, feedback-loop speeds, and physical-world dependencies. The more accurate picture is a rising tide: domains being submerged sequentially, over years and decades, each reaching human-level parity on its own schedule.
Software engineering is already at or near the waterline. Knowledge work — law, financial analysis, research summarisation — will follow within a few years. Scientific discovery, which requires physical experiments and access to failed-experiment data that barely exists in training corpora, lags by a decade. Physical robotics lags by two decades, governed by Moravec's Paradox: reasoning is easy, walking is hard. Full labor automation — what survey respondents mean when they say "all occupations" — may be a mid-century phenomenon at best, contingent on social and institutional changes orthogonal to model capability.
| Domain | Estimated Range | Confidence | Primary Constraints |
|---|---|---|---|
| Software Engineering | 2026–2028 | High | Integration reliability, security edge-cases, long-context consistency |
| Knowledge Work | 2028–2033 | Moderate-High | Regulatory acceptance, liability frameworks, specialised domain data |
| Scientific Discovery | 2030–2038 | Moderate | Wet-lab feedback loops, hypothesis generation taste, failed-experiment data |
| Physical Robotics | 2033–2045 | Low-Moderate | Moravec's Paradox, embodied training data scarcity, hardware reliability |
| Full Labor Automation | 2045–2116+ | Low / Very Uncertain | Social, institutional, regulatory, and physical-world complexity |
Table 2. Domain-sequential AGI arrival estimates. The wide variance in public AGI forecasts largely reflects different domains being forecast — not genuine technological disagreement within any single domain.
Probability Distributions
The following distributions are constructed by triangulating across large expert surveys (ESPAI 2022–2023, n > 1,700), professional superforecaster panels (Good Judgment, n = 43), community prediction markets (Metaculus, n ≈ 1,800), and compute-centric inside-view models. Raw survey anchors have been adjusted upward to account for the documented underestimation of near-term AI benchmark progress. These distributions should be treated as informed probability estimates, not predictions.
Percentile Summary
| Threshold | 10th pct — Best Case | 50th pct — Median | 90th pct — Upper Tail |
|---|---|---|---|
| Digital AGI | ~2028 | ~2031–2034 | ~2042 |
| Transformative AGI | ~2029 | ~2038–2045 | ~2070–2120 |
| Full HLMI | ~2029 | ~2043–2050 | ~2120–2200 |
Table 3. Percentile summary across thresholds. Convergence of 10th-percentile estimates near 2029 reflects broad agreement that meaningful general capability is unlikely before then; divergence in upper tails reflects genuine uncertainty about physical-world constraints and institutional barriers.
Cumulative Probability by Year
Digital AGI
Transformative AGI
Key Drivers & Watchpoints
AGI is not a single measurable event but a cluster of capabilities: broad generalization, robust autonomy over long tasks, and economically meaningful performance across many domains. The most informative signals over the next five years are less about any individual model release and more about whether compounding trends continue across three simultaneous bottlenecks - scaling inputs, scaling efficiency, and scaling autonomy.
The operative question has shifted from "Can the model do X?" to "Can the model do X reliably, without human intervention, at a cost lower than a human expert?" That shift in framing - from capability to reliability and agency - defines the evaluation era 2026-2031.
The Geopolitics of Intelligence
AGI development is no longer a unipolar US-led endeavor. Nations are aggressively building independent intelligence infrastructure. The UAE's G42 is deploying toward 5 GW of capacity, while South Korea, India, and Vietnam mobilize sovereign AI backbones. This "Intelligence Grid" dynamic broadens the investment base but intensifies export control risks. The winners will not just be those with the best models, but those who can secure the energy, supply chains, and sovereign partnerships required to run them at gigawatt scale.
Critical Watchpoints (2026 - 2031)
The "evangelism" era is over; the evaluation era has begun. The highest-signal sources for the next five years include Epoch AI (compute thresholds), IEA (data center energy demand), METR (autonomous task horizons), and supply chain disclosures from TSMC and SK Hynix. Confidence in timelines relies on triangulation - combining scaling inputs, algorithmic efficiency, and physical constraints - rather than any single lab's announcements.
Conclusion
The disagreement between those who predict AGI by 2027 and those who predict it after 2060 has always been, at its core, a definitional disagreement — one side forecasting knowledge work, the other forecasting all occupations, all domains, all physical tasks. Fixing that definitional problem yields a cleaner picture.
Software engineering is already at or near the human-level threshold. Knowledge work follows before 2035. Scientific discovery follows in the 2030s. Physical robotics and full labor automation trail by a decade or more, governed by the rate at which physical-world feedback loops can be sped up — and those loops are not governed by Moore's Law.
The most important implication is not the specific years but the epistemic posture they imply: plan for the median scenario, monitor actively for the leading indicators in Section 5, and resist both the complacency of long-timeline conservatism and the recklessness of fast-takeoff certainty. Forecasters have consistently underestimated near-term AI progress. That error should make us faster to update on new evidence, not slower.
The ship is heavy. The friction of the real world — power grids, supply chains, economic cycles, regulatory bodies — is a genuine governor of speed. But the ship is moving, and the current is strong. The horizon is visible. The only honest disagreement is about how long it takes to reach it.