2026-04-28AI safetyAI workforceAI modelsCSETAI talentClaude MythosAI regulationmachine learning jobs

AI Models Locked: CSET's AI Workforce Counter Hits 91.67%

Top AI models are now 'too dangerous to release.' CSET analyzed 448M job postings with 91.67% accuracy to expose how governments miscount real AI talent.

Two simultaneous shifts just redefined the global AI race. Leading AI companies are restricting access to their most powerful systems — including GPT-Rosalind and Claude Mythos — labeling them "too dangerous to release." At the same time, Georgetown's Center for Security and Emerging Technology (CSET) published a landmark analysis revealing that most governments cannot accurately count how many people are even capable of building these systems.

The combination is more than ironic. If frontier models are locked away, the real source of AI power becomes the human talent that can build new ones — yet today's workforce measurement tools routinely miscount that talent by millions of people.

AI Models Locked Behind a Safety Wall

CSET Research Fellow Steph Batalis contributed expert commentary to a TIME magazine investigation documenting how "too dangerous to release" is becoming standard practice at top AI labs. The models involved include GPT-Rosalind (OpenAI's restricted frontier system) and Claude Mythos (Anthropic's withheld capability tier) — both held back due to dual-use risk concerns.

Dual-use (the term for technologies that can serve both legitimate and harmful purposes simultaneously) is the central concern. In the AI context, the specific risks are cybersecurity attacks — where AI can automate and scale offensive operations — and biological research, where capable AI could lower the barrier for designing dangerous pathogens.

Batalis described the evidence asymmetry directly: "We know that people want to, and do, commit cyber attacks. We just don't have that same sample size with the biological risks." The cybersecurity threat is observable and measurable. The biological threat is potentially catastrophic but historically undersampled — which makes it harder to calibrate policy, even though the potential consequences are larger.

CSET report on AI models too dangerous to release — Claude Mythos and GPT-Rosalind access restrictions explained

The result is a stratified access model now emerging across the industry: publicly available models, gated research-access tiers, and a locked category that stays inside the lab. That third tier is expanding quietly — and it raises a pointed question: if access to the best AI systems is restricted, what determines which nation can build the next generation? Follow our AI automation news for ongoing coverage.

448 Million Job Postings — One Critical Measurement Flaw

The answer, in theory, is talent. The nation that can train and retain the most people capable of building frontier AI systems will eventually outpace the rest. But there is a compounding problem: current workforce measurement tools cannot reliably count those people.

CSET drew on the Lightcast database — a commercial labor market intelligence service (a paid subscription platform that tracks skills mentioned in online job postings) — containing 448 million job postings from 2010 through 2024. Researchers found that prevailing AI workforce definitions systematically overcount or miscount the actual developer pool.

Two dominant counting approaches exist in the field:

Occupation-based counting — used by Stanford HAI, the Brookings Institution, and the OECD (Organisation for Economic Co-operation and Development, a 38-nation group that coordinates global economic policy): counts workers by their job title category using the U.S. Standard Occupational Classification (SOC), the federal government's official taxonomy of job types. This aligns with government datasets but inherently lags behind real-world technological change — new job types take years to appear in the SOC.
Skills-based counting — used by Lightcast and referenced by groups like Stanford HAI and Brookings in supplementary analysis: flags anyone whose job listing mentions AI-related skills. Responsive to change, but vulnerable to keyword inflation. When "ChatGPT" became mainstream in 2023, AI skill mentions spiked dramatically across postings in every industry — inflating AI headcount counts without adding a single new AI developer to the actual workforce.

Both approaches also suffer from ghost jobs — postings that HR departments publish with no real intent to fill, used instead to build candidate pipelines, benchmark compensation data, or project organizational confidence to investors. Ghost job inflation silently skews every metric built on raw posting counts.

Adding another layer of risk: the Lightcast Skills Taxonomy (the commercial classification framework underlying skills-based analysis) has what researchers call vendor opacity — the company does not fully disclose how it categorizes and updates its skill definitions, limiting independent reproducibility.

The AI Workforce Three-Tier Framework That Rewrites the Count

CSET's response was to build a new classifier — an AI model trained to evaluate whether a role substantively requires AI-specific knowledge, rather than simply scanning for keyword mentions. Tested on data from the Lightcast dataset, the system achieved 91.67% accuracy.

The underlying framework separates the AI labor market into three overlapping groups that previous analyses routinely collapsed into one number:

AI system builders: Roles directly involved in designing, training, and deploying AI models — researchers, machine learning engineers, training infrastructure specialists, AI safety evaluators. This is the group CSET's definition is built to isolate and count precisely.
AI tool adopters: Workers who use AI in daily workflows but are not building the underlying systems. A marketing strategist using Claude to draft copy. A radiologist using an AI screening tool to flag anomalies. Valuable contributors — but not AI developers by any meaningful technical definition.
AI-exposed workers: People whose jobs are being changed or displaced by AI-enabled automation, whether or not they personally interact with any AI tool. Factory supervisors whose scheduling tasks are being restructured by AI systems. Claims processors whose workflows are being automated by document analysis tools.

CSET's concrete example cuts through the ambiguity: a project manager on a technical AI team counts as AI development work — the role requires understanding AI development cycles, model limitations, and technical tradeoffs in ways that are substantively specific to the field. A cloud database administrator does not count — even if that person works at an AI company, supports AI infrastructure, and lists "familiarity with ML pipelines" on their resume.

CSET AI workforce three-tier framework: AI system builders vs. tool adopters vs. AI-exposed workers

That distinction could shift a national AI workforce count by hundreds of thousands of people — which explains why countries measuring the same underlying workforce arrive at wildly different totals depending on which methodology they apply.

Why Miscounting AI Talent Is a Strategy Problem

The stakes of getting this wrong are policy-concrete. U.S. immigration programs for high-skilled technology workers — H-1B and O-1A visa pathways that the AI industry depends on for international talent acquisition — are calibrated using workforce supply estimates that may systematically misrepresent the actual developer pool.

Federal workforce training grants, congressional testimony on AI competitiveness, and bilateral technology agreements all rely on headline AI employment figures. If those figures overcount, the U.S. overestimates its development capacity and under-invests in the pipeline that actually builds frontier systems. If they undercount, policymakers underestimate the talent shortage and restrict immigration precisely when more builders are most needed.

CSET's parallel work on China's military AI adoption and talent pipeline adds a geopolitical dimension: if the U.S. cannot reliably measure its own AI builder population, comparing relative capacity against a peer competitor becomes even less reliable. The measurement problem is, at a structural level, a national strategy problem.

The research community recognized the significance. CSET's related paper, Keeping Top AI Talent in the United States, earned 125 upvotes and 105 comments on Hacker News — substantial engagement for a policy research publication, indicating that practitioners and engineers — not just academics — found the methodology directly relevant to their work.

For anyone working in technology today: the next time your company, your government, or a think tank announces an AI workforce initiative, the first useful question is which definition of "AI work" they are using — and whether it separates the people who build AI from the people who use it. Most initiatives don't. CSET has now given the field the tools to understand these distinctions and demand more precise answers. Start asking the question.

Related Content — Get Started | Guides | More News

Sources

Stay updated on AI news

Simple explanations of the latest AI developments