Presidential Unsealing & Reporting System for UAP Encounters · Release 01 · Cleared 2026-05-08PURSUE — Full-Spectrum Analysis

A computational forensic audit of every document released in PURSUE Release 01 — Department of War mission reports, NASA debriefings, State Department cables, and the FBI 62-HQ-83894 vault file. 116 PDFs · 4,147 pages · 16 sections of analysis below.
documents pages archived chars extracted text-bearing UAP / UFO mentions redaction codes
FILEDoW-PURSUE-01GENERATEDPUBLIC DOMAIN17 U.S.C. § 105

Corpus overview

Documents
Pages
Bytes
Unsearchable %
UAP / UFO hits
Chars extracted

By agency

By scan class

By text source

Anomaly — the unsearchable majority

87% of pages have no OCR text layer

Of 4,147 pages in this release, (%) are stored as image-only PDFs with no embedded text. The Department of War cleared these for release but did not run optical character recognition over them — the corpus is therefore not text-searchable to the public without independent OCR. The single largest unsearchable block is the FBI 62-HQ-83894 case file (~1,800 pages, 1947–1968).

Unsearchable docs by agency

Largest unsearchable documents

AgencyPagesFile

Temporal pattern

Incidents by year (corrected)

Incidents by decade

35 of 47 dated incidents (74%) fall in the 2020s — the corpus skews heavily modern despite spanning eight decades of file dates.

Year × Agency stack (incident year)

Release lag · years from incident to release

Below 5 years: contemporaneous case. 50+ years: historical FBI/State case. Non-zero negative: ?

Hour-of-day clustering · UAP encounter Zulu times

Histogram of all HHMMZ times appearing in OCR text. Peaks at 00Z–05Z (CENTCOM late-night sorties), 15Z (afternoon EST), and 19Z–22Z (early evening).

Original declassification calendar

Target declassification years stamped on the original SECRET markings (e.g. Declassification Date: 20470506). The 2045–2049 cluster represents documents that were originally not scheduled for declassification until 20+ years from now — released early by directive.

The "early release" finding

19 documents in the textual subset carry an original declassification target between 2045 and 2049. They've been released in 2026 — between 19 and 23 years before the original classification authority intended. The same person — MG Richard A. Harrison, USCENTCOM Chief of Staff — signed the early declassification on 84 of them.

Geographic pattern

Theaters · COCOM proxy

World map · incident locations

Marker radius scales with incident count. Color-coded by theater. Coordinates are theater centroids (the source CSV provides region names, not lat/lon).

Top locations

Sensor-coverage hypothesis

UAP encounters in this dataset are distributed exactly along the US active-sensor footprint:

  • CENTCOM littorals — Iraq, Syria, Persian Gulf, Strait of Hormuz, Gulf of Aden, Arabian Sea, UAE, Mediterranean, Djibouti
  • Domestic Western US — the FBI helicopter-IR cluster, Sept & Dec 2025
  • EUCOM forward edge — Greece (Aegean approaches)
  • INDOPACOM — East China Sea, Japan
  • Space — Apollo lunar surface, Skylab LEO, three NASA-era Moon entries

Conclusion: this dataset is not "where UAPs are." It is "where the United States has cameras." Independent observation infrastructure — civilian, foreign, distributed — is not represented.

NHI lexicon — concept density

By category

Top terms

Coverage by document

How many text-bearing documents touch each category at least once.

Term cloud (TF-IDF weighted)

Phenomenology vs Kinematics gap

Phenomenology terms (UAP, orb, metallic, transmedium) appear times. Performance descriptors (hypersonic, instantaneous-acceleration, right-angle turn, hovering, no-visible-propulsion, antigravity) appear only times. Witness/investigation language appears times. The textual subset records that something was observed but does not describe how it moved or who saw it. The descriptive content lives in narrow form-cells or in the un-OCR'd FBI scans.

Classification & FOIA exemption analysis

Classification markings

Caveats

FOIA exemptions invoked

E.O. 13526 §1.4 sub-paragraph distribution

§1.4(a) covers "military plans, weapons systems, or operations." §1.4(c) "intelligence sources/methods, cryptology." §1.4(g) "national security vulnerabilities."

Declassification authorities · OCR-name cluster

VariantHits

Variants are OCR transcription errors of the same name — the cluster confirms a single person signed 100+ declassifications.

The (b)(6) signature · personnel-privacy redaction dominates

The most common FOIA exemption in this corpus is (b)(6) — personnel privacy, with hits. (b)(1) classified national security information appears times. The bulk of redaction in the OCR'd subset blacks out operator names and unit-identifying details, not phenomenological content. Names of pilots and operations centers are excised; the descriptions of sightings that survive are largely intact.

Operational metadata — what the forms tell us

Named operations

Combatant Commands

MAJCOMs / Numbered AFs

Operations Centers

Mission types

Domain & asset breakdown

This is a CENTCOM ISR data set

USCENTCOM is mentioned times — an order of magnitude more than any other COCOM. Operation INHERENT RESOLVE (the anti-ISIS coalition op in Iraq/Syria) appears in 9 docs. The 609th Combined Air Operations Center (CENTCOM's AOC) appears times. Mission Type "ISR" (Intelligence, Surveillance, Reconnaissance) dominates, followed by Aerial Reconnaissance (AREC) and Defensive Counter-Air (DCA). Reading: the modern subset of this release is a curated slice of CENTCOM ISR sortie reports, filtered for UAP observations during regular ops.

Redaction signal

Top documents by redaction marker count

TitleAgencyPagesMarkersDensity /1k

Redaction codes by agency · who's hiding what

Stacked: which FOIA exemption code dominates within each agency's textual subset.

Document structure

Page count distribution

File size vs pages

Per-page char density (largest 25 docs)

Each row = a document. Each column = one page. Brightness = chars on that page. Black = unsearchable scan; cyan = native text.

Cross-document similarity network

TF-IDF cosine similarity ≥ 0.05 · 616 edges

Department of War FBI NASA State Other

Force-directed graph of document-to-document TF-IDF cosine similarity. The dense central cluster is the DOW mission-report family — they share the same Mission Data / Aircraft Callsign / Time On Station template. Drag nodes; hover for titles.

Top similarity pairs

SimAB

Outliers — documents that resemble nothing else

Documents with the lowest maximum cosine similarity to anything else in the textual corpus. Score 0.000 = no shared vocabulary above the noise floor — usually because the document has no extractable text (FBI scans). Among the textual outliers are the rare narrative reports and historic State-cable cases.

Max simTitleAgencyChars

PDF toolchain forensics — who scanned what

Creator software

Producer software

Tooling fingerprint

The dominant Producer is Adobe Acrobat (32-bit) 26 Paper Capture Plug-in (30 docs) — Adobe's built-in OCR engine. The Creator field reveals the original capture device: HP 9100C Digital Sender (a turn-of-millennium HP enterprise scanner/copier — these are documents that were scanned by old hardware, then OCR'd by Adobe). A handful of docs were generated on macOS 26.4 (Build 25E246) Quartz PDFContext, indicating that some documents in this release were re-saved or assembled on contemporary Apple hardware before publication.

Distinctive vocabulary & recurring entities

High-IDF terms

TermCountDocsIDF

Recurring capitalized phrases

EntityHitsDocs

NHI assessment — physics, biologics, intelligent control

Physics-anomaly hits
NHI / biologics references
Docs assessed

Headline finding · this release does not contain disclosure-era language

Across 1.16 million characters of OCR'd text spanning 91 documents, the assessment scan finds:

  • Zero hits for "non-human intelligence" or "non-human biologics"
  • Zero hits for "crash retrieval", "reverse engineer*", "unknown alloy", or "recovered (craft|material|debris)"
  • Zero hits for "biological samples", "specimen", "cattle/animal mutilation", or "abduction"
  • Two total hits for kinematic-anomaly language ("multiple 90 degree turns" — at an estimated 80 mph)
  • One hit for "extraterrestrial" — a 1994 State Department cable quoting one airline captain's personal opinion

The recent disclosure-era vocabulary popularized by Congressional testimony (Grusch, Mellon, Elizondo) is completely absent from this PURSUE Release 01 textual subset. What is present is structured ISR sortie data — sensor logs of small, slow, brief contacts that operators routinely classified as "Benign" with "Intelligent Control: NO".

Physics-defying patterns scanned

ClassRationaleHits

The scanner looks for ten classes of motion that defy known mechanics. Only "right-angle turns" returned hits — and inspection of the underlying narrative (Greece, October 2023) shows the object was an 80-mph circular target observed for 3 minutes — well within consumer-drone capability.

NHI / biologics patterns scanned

ClassHits

Eleven classes of explicit NHI / biologics / crash-retrieval / reverse-engineering language, scanned across the textual corpus. Only "extraterrestrial" matched — a single instance, in a 1994 cable.

The two physics-anomaly verbatim quotes

The single NHI verbatim quote

Documents ranked by anomaly score

Composite (0–1) of physics-anomaly density and explicit NHI presence. Almost every document scores zero — the data is overwhelmingly mundane in content.

ScorePhysNHITitleAgency

Documents ranked by credibility score

Composite (0–1) of unique-sensor count, sensor-mention density, witness language, classification rigour, narrative density, and source-agency standing. High-credibility docs are the templated DOW mission reports — well-formatted but content-sparse.

ScoreTitleAgencySensSent

Truthfulness probability · AI honest read

Treating each document as a claim and weighing source / sensor count / classification rigour / corroboration:

  • What is highly probably true: the dated DOW mission reports — small flying objects of unknown origin were observed by US ISR aircraft in CENTCOM airspace 2020–2024. These are signed by named officers, classified SECRET//REL TO USA, FVEY, sourced from multi-sensor platforms (FLIR + radar + visual), and processed through 609th AOC standard channels. The events themselves probably occurred as reported.
  • What is probably true: the FBI 1947–1968 vault file documents real public reports of unidentified phenomena — but those reports are themselves of mixed reliability (the file contains civilian tip letters alongside investigative records).
  • What the data does not establish: that any of these encounters involved a non-human craft, non-human intelligence, or a phenomenon outside known physics. The descriptions in the textual subset — small slow circular objects observed for minutes, often classified "Benign" by the operator — are consistent with drones, sensor artifacts, balloons, birds, atmospheric phenomena, and foreign ISR. The release does not adjudicate "what they were." It releases the fact that they were observed.
  • What might still be hidden: the un-OCR'd FBI scans (1,800 pages) cannot be assessed by this scan. (b)(6) personnel-privacy redactions outnumber (b)(1) classified-info redactions ~3:1, suggesting the bulk of redaction is operator names, not phenomenology. But content this scan cannot see remains an unresolved gap.

Document explorer — sort the entire archive

Click a column header to sort. Click a title to open the canonical war.gov URL. Filter by typing in the box.

Title Agency Pages Bytes Chars Source Scan Incident Location

Synthesis — what an AI sees

Asymmetric epistemic surface

Two epistemic regimes coexist. The DOW tranche is OCR-clean, structured, modern, and template-uniform. The FBI tranche is a 1,800-page photocopy bundle without OCR. Same release, two completely different surfaces — one machine-readable, one not. A search engine sees one. The other has to be reconstructed by hand. The 87% page-coverage gap is the single largest finding in this analysis.

Form-shaped content

DOW mission reports are not narratives — they are intelligence forms. Same fields, same headings, often same exemption blocks. Cross-document cosine similarities above 0.85 are common. The textual subset's vocabulary is dominated by field labels, not observations. Actual descriptive content is short, embedded in narrow "Description" cells.

Sensor map, not phenomena map

Plotting incident locations against US theater command coverage, the dataset traces a sensor-presence map: CENTCOM littorals, Aegean approaches, INDOPACOM perimeter, plus a domestic Western-US infrared cluster from helicopter-borne FBI assets. UAP encounters appear precisely where the US has cameras, and not elsewhere.

Performance vacuum

UAP appears 503 times. Hypersonic / instantaneous-acceleration / right-angle / hovering / no-propulsion / antigravity collectively under a dozen times. Either the kinematics live in the un-OCR'd FBI scans, or they have been redacted out, or this release has only declassified the fact of sighting and not its physical character.

Early-release directive trace

19 documents bear original declassification target dates between 2045 and 2049 — they have been released in 2026, on average 21 years early. The same officer signed off on 100+ of these declassifications, with OCR variants of his name forming a forensic cluster. The directive force behind this release was real, applied uniformly, and is visible in the metadata.

Personnel privacy > phenomenology

(b)(6) personnel-privacy redactions outnumber (b)(1) classified-info redactions. The bulk of black ink in this release covers the names of operators and units, not the descriptions of what they saw. To the extent that this release withholds, it withholds who saw the UAPs more than what they were.

What an OCR pass would unlock

Independent OCR over the FBI 62-HQ-83894 sections (~1,800 pages) would roughly triple the textual corpus and bring the 1947–1968 era — currently invisible to text analysis — into the same analytical frame as the 2020–2024 mission reports. This is the single highest-leverage downstream task this release leaves on the table.