Futures of Intelligence February 2026 Vol. IV

The Most Likely Timeline
to AGI

Timelines, drivers, and bottlenecks on the path to artificial general intelligence — grounded in empirical evidence and calibrated probability

Central Estimate — Digital AGI
2031–34
Knowledge work & software engineering
Transformative AGI Median
2038–48
Economic revolution scale
Fast Takeoff (by 2027)
~12%
Technically plausible, industrially fragile
Overview
Executive Summary (TL;DR)
0:00 / 0:00
Abstract

Most debates about AGI timelines are not technological disagreements — they are definitional ones. When expert surveys switch elicitation from "all cognitive tasks" to "all occupations fully automatable," the median forecast shifts by more than seventy years. This paper resolves that framing problem by reasoning across three distinct capability thresholds: Digital AGI (knowledge work and software), Transformative AGI (industrial-revolution-scale disruption), and Full HLMI (all tasks, all domains). Drawing on benchmark trajectories, large-scale expert surveys, prediction market data, and hard physical infrastructure constraints verified as of early 2026, the analysis concludes that Digital AGI most likely arrives between 2030 and 2035, Transformative AGI between 2038 and 2048. Neither the 2027 fast-takeoff nor the post-2060 sceptical scenario receives the strongest evidential support. The singularity is better understood as a rolling tide — domains falling sequentially, not simultaneously.

Voiceover
Abstract — The Most Likely Timeline to AGI
0:00 / 0:00
⚠ Audio file not found — replace the src URL in the HTML to link your GitHub file.
01

The Definitional Problem

Timeline forecasts for AGI span from 2027 to never, yet forecasters are often looking at identical data. The disagreement is less about technology than about what question is being asked. "AGI" has been used to mean everything from "a model that reliably books a dinner reservation" to "a system that ends human mortality." Until the target is fixed, the timeline is meaningless.

Three Thresholds That Actually Matter

The most productive framing separates three distinct and increasingly demanding capability thresholds, each with a genuinely different expected timeline:

ThresholdDefinitionAlso calledCentral estimate
Digital AGI Human-level on all screen-based cognitive tasks — email, code, law, analysis — without the jagged failure modes of current models White-collar AGI, Proto-AGI 2031–2034
Transformative AGI AI deployment that precipitates economic restructuring on the scale of the Industrial Revolution — accelerating R&D, reshaping labor markets, compressing decades of scientific progress TAI, Economic AGI 2038–2048
Full HLMI Unaided machines that accomplish every task better and more cheaply than human workers across all domains, including physical, scientific, and creative work High-level machine intelligence 2043–2050+

Table 1. The three capability thresholds used throughout this analysis. The large variance in public AGI timelines traces almost entirely to conflation of these levels — not genuine disagreement about the pace of technical progress.

The DeepMind tiered framework classifies systems from "Emerging" (slightly better than an unskilled adult) through "Expert" (90th percentile) to "Superhuman." As of early 2026, frontier models sit firmly in the "Competent" band generally, while reaching "Expert" or "Virtuoso" in narrow domains like Python or mathematical translation. Digital AGI corresponds to reaching Level 3 across all cognitive domains. Full HLMI corresponds to Level 5.

Switching the expert elicitation from "all tasks" to "all occupations fully automatable" shifts the 50th-percentile forecast from 2047 to 2116 in the ESPAI 2023 survey — a 70-year gap produced by definitional choice alone, with no technological disagreement involved.

The Forecasting Record

The Expert Survey on Progress in AI (ESPAI 2023) — the largest systematic elicitation of AI researchers to date, with 1,714 respondents — implies a 10% probability of HLMI by 2027 and a 50% probability by 2047. Metaculus community forecasters (1,800 participants) place "weakly general AI" at February 2028 and "general AI" at July 2033. Professional superforecasters are more conservative still: a 43-member panel assigned just 12% probability to AGI by 2043 and 40% by 2070.

These estimates must all be read against a critical empirical corrective: a retrospective analysis of 2022–2025 AI benchmark milestones found that both domain experts and superforecasters systematically underestimated realized AI progress. Math-competition-level performance arrived earlier than median forecasts predicted. ARC-AGI-1 scores jumped from under 10% to over 90% in under two years. The raw survey medians have therefore been adjusted upward throughout this analysis to account for this documented bias.

Methodological Note

The probability distributions in Section 4 incorporate an upward correction from raw survey anchors — accounting for documented underestimation bias in near-term AI forecasting, real-time 2025–2026 benchmark trajectories, and the definitional correction mapping "Digital AGI" to the lower end of the HLMI distribution. The result is more optimistic than raw surveys and more conservative than frontier lab leadership.

02

Three Scenarios

Accelerationist
2026–27
Probability: ~12%
The Fast Takeoff

Recursive self-improvement compounds faster than infrastructure friction. Agentic systems displace research engineers by Q4 2026, triggering a capability explosion.

Central Case
2029–35
Probability: ~55%
The Institutional Scenario

Digital AGI arrives this decade. Transformative AGI follows in the 2038–48 window as infrastructure and algorithmic efficiency co-evolve.

Sceptical
2045+
Probability: ~33%
The Long Slog

Economic bubble dynamics, regulatory intervention, or hard capability ceilings push broad-domain AGI well past mid-century.

Fast Takeoff: Technically Coherent, Industrially Fragile

The accelerationist thesis rests on a specific and falsifiable mechanistic claim: that current models are already accelerating AI research via code generation and synthetic data loops, creating a recursive self-improvement dynamic that will compound rapidly. The evidence is not speculative. Benchmark progress on ARC-AGI-1 traces a trajectory from under 10% in 2023 to over 90% by early 2026 — consistent with superlinear acceleration in abstract reasoning. Anthropic and OpenAI report measurable engineering productivity gains from agentic coding systems. The recursive loop is real and already running.

But three hard constraints prevent a confident 2027 central estimate. The 19 GW gap between projected AI data center demand and available US grid capacity is a physical constraint — nuclear plants and transmission lines do not move at the speed of software. Inference costs for reasoning-first architectures are orders of magnitude higher than prior-generation LLMs, creating an economic viability ceiling for mass deployment. And prediction markets, which have real money at stake, assign only 9–10% probability to AGI by 2027 — pricing in exactly the industrial friction that optimistic statements tend to elide.

Scenario Assessment

The fast-takeoff scenario carries approximately 10–15% probability. The most important leading indicators: ARC-AGI-3 results in March 2026, nuclear compute campus groundbreakings, and whether agentic systems demonstrably replace — rather than merely assist — research engineers by Q4 2026.

The Central Case: 2029–2035 for Digital AGI

Voiceover
The Central Case
0:00 / 0:00

The institutional scenario is where the evidence is most convergent. The SWE-bench trajectory, rising from 2 percent in 2023 to roughly 40 percent in 2026, METR’s documented doubling of long-horizon agent task completion every seven months, and Epoch AI’s Capabilities Index extrapolations collectively suggest that reliable software engineering will be achieved within this decade. However, the move from roughly 40 percent to 90 percent and beyond on SWE-bench is not just curve extrapolation. It represents a transition from stochastic generation to high-reliability agency.

The “last mile” is fundamentally about integration, context window stability, deployment tooling, and cost compression. Reaching the “five nines” level of reliability required to remove human oversight entirely means solving complex multi-system coordination problems, preventing context drift over long execution chains, and reducing inference costs enough to make autonomous agents economically viable at scale. These are difficult challenges, but they are engineering and systems challenges that appear to scale along relatively predictable trajectories.

For Transformative AGI, the strongest central estimate remains in the 2038–2048 window. This range aligns with inside-view transformative AI models at the 50th percentile, the Metaculus community median, and expert survey medians once definitional framing artifacts are removed. Even if Digital AGI emerges in a lab by the early 2030s, the constraint shifts from raw capability to absorption capacity.

At this stage, the bottleneck is not model performance alone. It is physical infrastructure buildout, institutional adaptation, regulatory friction, and deployment into domains where feedback loops are slow and capital intensive, such as biology, physical science, and governance. Transformative impact depends as much on how quickly societies integrate these systems as on when they are first demonstrated.

The Long Slog: Non-Negligible and Underweighted

Voiceover
The Long Slog
0:00 / 0:00

A one-third probability on much later timelines is not negligible. Scenarios that support it include a severe AI investment bubble correction that materially slows frontier training runs, regulatory responses triggered by a major AI-attributed biosecurity or cybersecurity incident, or model collapse in non-verifiable domains that prevents generalisation beyond benchmarkable tasks. It also includes the possibility, raised by researchers such as Demis Hassabis and Yann LeCun, that moving from large-scale pattern matching to genuine hypothesis generation in physical science requires architectural innovations not yet visible in the current scaling-law paradigm.

Professional superforecasters provide a calibrated external signal for this slower trajectory. Their estimates, roughly 40 percent by 2070 and 60 percent by 2100, reflect historical base rates for transformative technology transitions rather than laboratory benchmarks. Their conservatism is not dismissal of progress but an expression of uncertainty grounded in long-run technological diffusion dynamics.

03

The Rolling Singularity

Treating AGI as a single binary event is the most common and costly analytical mistake in this domain. Intelligence is not a monolith — it is a collection of capabilities with vastly different data requirements, feedback-loop speeds, and physical-world dependencies. The more accurate picture is a rising tide: domains being submerged sequentially, over years and decades, each reaching human-level parity on its own schedule.

Software engineering is already at or near the waterline. Knowledge work — law, financial analysis, research summarisation — will follow within a few years. Scientific discovery, which requires physical experiments and access to failed-experiment data that barely exists in training corpora, lags by a decade. Physical robotics lags by two decades, governed by Moravec's Paradox: reasoning is easy, walking is hard. Full labor automation — what survey respondents mean when they say "all occupations" — may be a mid-century phenomenon at best, contingent on social and institutional changes orthogonal to model capability.

Domain-Level AGI Arrival Estimates
Expected range for human-level parity by capability area
DomainEstimated RangeConfidencePrimary Constraints
Software Engineering2026–2028HighIntegration reliability, security edge-cases, long-context consistency
Knowledge Work2028–2033Moderate-HighRegulatory acceptance, liability frameworks, specialised domain data
Scientific Discovery2030–2038ModerateWet-lab feedback loops, hypothesis generation taste, failed-experiment data
Physical Robotics2033–2045Low-ModerateMoravec's Paradox, embodied training data scarcity, hardware reliability
Full Labor Automation2045–2116+Low / Very UncertainSocial, institutional, regulatory, and physical-world complexity

Table 2. Domain-sequential AGI arrival estimates. The wide variance in public AGI forecasts largely reflects different domains being forecast — not genuine technological disagreement within any single domain.

04

Probability Distributions

The following distributions are constructed by triangulating across large expert surveys (ESPAI 2022–2023, n > 1,700), professional superforecaster panels (Good Judgment, n = 43), community prediction markets (Metaculus, n ≈ 1,800), and compute-centric inside-view models. Raw survey anchors have been adjusted upward to account for the documented underestimation of near-term AI benchmark progress. These distributions should be treated as informed probability estimates, not predictions.

Cumulative Probability by Year — Three Thresholds
P(threshold achieved by year) across Digital AGI, Transformative AGI, and Full HLMI
Digital AGI
Transformative AGI
Full HLMI

Percentile Summary

Threshold10th pct — Best Case50th pct — Median90th pct — Upper Tail
Digital AGI~2028~2031–2034~2042
Transformative AGI~2029~2038–2045~2070–2120
Full HLMI~2029~2043–2050~2120–2200

Table 3. Percentile summary across thresholds. Convergence of 10th-percentile estimates near 2029 reflects broad agreement that meaningful general capability is unlikely before then; divergence in upper tails reflects genuine uncertainty about physical-world constraints and institutional barriers.

Cumulative Probability by Year

Probability Mass Accumulation
P(threshold achieved by year) — Digital AGI vs. Transformative AGI

Digital AGI

Transformative AGI

05

Key Drivers & Watchpoints

AGI is not a single measurable event but a cluster of capabilities: broad generalization, robust autonomy over long tasks, and economically meaningful performance across many domains. The most informative signals over the next five years are less about any individual model release and more about whether compounding trends continue across three simultaneous bottlenecks - scaling inputs, scaling efficiency, and scaling autonomy.

The operative question has shifted from "Can the model do X?" to "Can the model do X reliably, without human intervention, at a cost lower than a human expert?" That shift in framing - from capability to reliability and agency - defines the evaluation era 2026-2031.

↑ Accelerating Factors
The Reasoning Revolution The shift to "System 2" architectures (e.g., o-series, DeepSeek-R1) decouples performance from training scale. Smaller models given time to "think" can outperform larger models, making intelligence elastic and on-demand.
Frontier Compute Scaling Nvidia's Vera Rubin architecture (late 2026) integrates 144 GPUs into a single rack domain. Coupled with HBM4 memory, this removes bandwidth bottlenecks, allowing clusters to operate as a unified memory space.
Recursive Self-Improvement Anthropic reports ~90% of Claude Code is model-generated. If "research engineer" agents demonstrably replace - rather than merely assist - human engineers, the feedback loop becomes self-sustaining.
Post-Transformer Architectures SSMs like Mamba and "Hope" architectures enable constant memory requirements for million-token contexts. This allows agents to maintain coherent memory over days or weeks, a prerequisite for true agency.
Inference Cost Collapse Costs dropped from $20/M tokens (2022) to under $0.10 (2024). If frontier reasoning becomes economically viable at scale by 2027, deployment-driven feedback loops accelerate sharply.
↓ Decelerating Factors
The Power and Grid Bottleneck Global data center demand is projected to double to ~945 TWh by 2030. With interconnection queues stretching 3-5 years and SMRs unlikely before the 2030s, energized capacity lags behind theoretical compute potential.
Chip Supply Chain Chokepoints HBM4 memory is reportedly sold out through 2026. Packaging capacity (CoWoS) and high-bandwidth memory shortages will pace training expansion to supply, not willingness to spend.
Regulation & Compliance Full enforcement of the EU AI Act begins August 2026. Compliance burdens and potential "systemic risk" thresholds (10^25 FLOPs) could slow deployment or force region-specific fragmentation.
The Data Wall Exhaustion of high-quality human text by 2028 poses a risk of "model collapse" if synthetic data loops produce models that are excellent bureaucrats but poor inventors.
Geopolitics & Export Controls Tightening controls on Gate-All-Around transistors and memory could bifurcate the global supply chain, leading to incompatible compute ecosystems and reduced research liquidity.

The Geopolitics of Intelligence

AGI development is no longer a unipolar US-led endeavor. Nations are aggressively building independent intelligence infrastructure. The UAE's G42 is deploying toward 5 GW of capacity, while South Korea, India, and Vietnam mobilize sovereign AI backbones. This "Intelligence Grid" dynamic broadens the investment base but intensifies export control risks. The winners will not just be those with the best models, but those who can secure the energy, supply chains, and sovereign partnerships required to run them at gigawatt scale.

Critical Watchpoints (2026 - 2031)

2026
GW-Scale Energization. The first major infrastructure test. Significant delays in actual vs. announced energized capacity impose a hard ceiling on compute scaling through 2028.
Aug 2026
EU AI Act Enforcement. Full enforcement for general-purpose models begins. Early actions will reveal the true compliance burden and whether it forces geographic shifts in training.
Late 2026
Agentic Displacement. The signal is displacement, not assistance. If systems run independent research pipelines without human intervention, the recursive loop has activated.
2027
Billion-Dollar Training Runs. If cost growth continues, training runs will cross the $1B threshold, likely consolidating the frontier to a very small number of actors and changing experimentation diversity.
2027-28
Vera Rubin & HBM4. Whether Rubin-generation hardware deploys on schedule - and whether HBM4 yield issues materialize - will determine the pace of inference scaling.
2028
Human Text Scarcity. If leading labs report sharply increased synthetic-data reliance without robustness regressions, the capability runway extends significantly.
2029
Day-Scale Autonomous Tasks. The single most consequential threshold for fast-takeoff: systems completing day-scale projects (debugging, refactoring) with minimal oversight.
What to Watch and Where

The "evangelism" era is over; the evaluation era has begun. The highest-signal sources for the next five years include Epoch AI (compute thresholds), IEA (data center energy demand), METR (autonomous task horizons), and supply chain disclosures from TSMC and SK Hynix. Confidence in timelines relies on triangulation - combining scaling inputs, algorithmic efficiency, and physical constraints - rather than any single lab's announcements.

06

Conclusion

The disagreement between those who predict AGI by 2027 and those who predict it after 2060 has always been, at its core, a definitional disagreement — one side forecasting knowledge work, the other forecasting all occupations, all domains, all physical tasks. Fixing that definitional problem yields a cleaner picture.

Software engineering is already at or near the human-level threshold. Knowledge work follows before 2035. Scientific discovery follows in the 2030s. Physical robotics and full labor automation trail by a decade or more, governed by the rate at which physical-world feedback loops can be sped up — and those loops are not governed by Moore's Law.

Digital AGI — human-level performance on knowledge work, software engineering, and analysis — most likely arrives between 2030 and 2035. Transformative AGI — the kind that restructures economies, accelerates scientific discovery, and demands wholesale institutional adaptation — most likely arrives between 2038 and 2048. Full automation of all human labour is a mid-to-late-century phenomenon. The singularity is not a single date. It is a rising tide — and software engineering is already underwater.

The most important implication is not the specific years but the epistemic posture they imply: plan for the median scenario, monitor actively for the leading indicators in Section 5, and resist both the complacency of long-timeline conservatism and the recklessness of fast-takeoff certainty. Forecasters have consistently underestimated near-term AI progress. That error should make us faster to update on new evidence, not slower.

The ship is heavy. The friction of the real world — power grids, supply chains, economic cycles, regulatory bodies — is a genuine governor of speed. But the ship is moving, and the current is strong. The horizon is visible. The only honest disagreement is about how long it takes to reach it.

References
1
Grace et al. "Thousands of AI Authors on the Future of AI." Expert Survey on Progress in AI (ESPAI) 2023. HLMI elicitation: n = 1,714. FAOL elicitation: n = 774.
2
Metaculus community forecasts (live, as of February 2026). "Weakly general AI": median 01 Feb 2028 (1.7k forecasters). "General AI": median Jul 2033 (1.8k forecasters).
3
Good Judgment Inc. "Superforecasting AGI." March 2023. 43 professional superforecasters. Median probabilities: 12% by 2043, 40% by 2070, 60% by 2100.
4
Forecasting Research Institute. "Assessing Near-Term Forecasting Accuracy in the XPT." September 2025. Documented systematic underestimation of AI benchmark milestones by both domain experts and superforecasters, 2022–mid-2025.
5
Epoch AI. Long-run training compute trend: 4.1× per year (90% CI 3.7×–4.6×) since 2010. Epoch Capabilities Index: composite of 37 benchmarks; acceleration consistent with step-change around April 2024.
6
Cotra, A. Inside-view transformative AI schedule (2022, updated). ~15% by 2030, median ~2040, ~60% by 2050.
7
Stanford HAI. AI Index Report 2025. Training compute for notable models doubling approximately every five months; largest LLM dataset sizes doubling approximately every eight months.
8
Morris et al. "Levels of AGI: Operationalizing Progress on the Path to AGI." Google DeepMind, 2023. Five-level taxonomy from Emerging through Superhuman.
9
METR Task Complexity Research. Autonomous task horizon doubling approximately every 7 months as of early 2026. SWE-bench Verified: top agentic frameworks reaching 30–40% as of February 2026.
View Full Source Data & Bibliography →
Matthew R. Wesney © 2026