M6E3: Timeline Analysis and Multi-Source Event Reconstruction

Module 6, Episode 3: Timeline Analysis and Multi-Source Event Reconstruction

The Problem Is Structure, Not Volume

By now, you have probably accepted the premise that AI changes open-source intelligence collection and analysis in meaningful ways. The tools have improved. The pipelines are real. A modern analyst can ingest, translate, cluster, and visually map more source material in an afternoon than a team could process in a week five years ago. You have built some version of this mental model, and it is not wrong.

Here is where it breaks.

The specific problem of reconstructing a timeline from contradictory, fragmentary, multi-language OSINT is primarily a structure problem — and AI, whatever its virtues, does not reliably solve structural problems in event reconstruction. It creates new ones. The discipline required to work through conflicting timestamps, misattributed media, and deliberate disinformation cannot be reduced to processing speed or language coverage. It requires judgment about source intent, institutional behavior, and political context that no current model provides with consistent reliability. Understanding precisely where AI helps and where it fails in timeline reconstruction is the difference between producing an accurate chronology and producing a confident one that is wrong.

Timeline reconstruction is the analytic task of establishing the sequence, timing, and causal relationships of events from incomplete and often contradictory source material. Every major OSINT investigation eventually becomes a timeline problem. When did the convoy arrive? Which video was filmed first? Did the strike happen before or after the ceasefire announcement? These questions have answers in the physical world that the documentary record may obscure, and the obscuring is sometimes accidental and sometimes deliberate. The analytic discipline required to navigate that uncertainty — rigorously, reproducibly, and honestly — is what this episode is about. AI plays a specific and uneven role in that discipline, and the unevenness matters more than the capability.

Why the Source Record Is Structurally Broken Before You Even Start

Consider what the source record looks like in any fast-moving conflict event. Video clips posted to Telegram channels with no metadata and timestamps stripped by upload compression. Eyewitness accounts in three languages, each capturing a slightly different moment or vantage point. Government statements with times that may reflect when communications were drafted, not when events occurred. Social media posts whose platform timestamps are in UTC but whose subjects are operating in local time zones that the poster may themselves have confused. Wire service reports that sourced local correspondents who sourced Telegram channels that sourced the original videos. The primary record and its copies have become indistinguishable.

Chronolocation — the process of determining the specific date and time an image or video was recorded — depends on contextual clues including weather, shadows, construction, and vegetation. Each of those clues is vulnerable to manipulation and misinterpretation. A shadow angle that appears to constrain an image to late afternoon might be in a courtyard with blocked sun, defeating the physics. Vegetation data that suggests a spring date might be in a climate that doesn't match the ostensible location. Construction visible in the background might place a video in a narrow window — or that construction might have been imported from a different shot in the editing suite.

The deeper problem is that even honest sources produce incompatible records. Two credible journalists, both physically present at the same event, may post video that places the same incident ninety minutes apart on the clock — one because their phone clock was misconfigured, one because they shot footage before and after taking notes that they date-stamped later. The ninety-minute gap is not disinformation. It is noise. But it looks exactly like deliberate timestamp manipulation until you can determine which source's internal clock to trust, and that determination requires knowledge about those sources that pure document analysis cannot provide.

Translation artifacts compound this. Temporal markers in Arabic, Russian, Farsi, and Chinese do not always resolve cleanly to Western timestamp conventions. Tenses and temporal relationships in Arabic reporting can be genuinely ambiguous about whether an event described has occurred, is in progress, or is being anticipated. A large language model translating at volume will render these ambiguities as determinate — it will choose a reading — and it will do so without flagging the choice. The analyst downstream receives what looks like a clear timestamp and has no indication that the underlying text was ambiguous. That confidence artifact is introduced silently into the analysis pipeline at the translation stage.

And then there is deliberate disinformation, which operates on the same surface as all of these innocent errors and is specifically designed to be indistinguishable from them. State-aligned media have learned that the highest-leverage disinformation is real material from a different time, place, or context, with the timestamp and location altered just enough to serve the narrative. In March 2026, as conflict with Iran intensified, a satellite image posted by an Iranian news outlet appeared to show a devastated US base in Qatar — but it was an AI-generated fake. Tehran Times, a state-aligned English daily, posted on X a "before vs. after" image claiming to show "completely destroyed" US radar equipment at a base in Qatar. It was an AI-manipulated version of a Google Earth image of a US base in Bahrain, with a subtle but detectable giveaway: a row of cars parked in identical positions in both the authentic and manipulated images.

That giveaway — the cars — was caught by human analysts doing careful comparative work. The question for practitioners is whether an AI pipeline would catch it, whether the pipeline would surface the inconsistency before a decision was made, and whether the volume of similar fabrications means that some portion of them will always pass through. The answer to all three questions is: it depends, not consistently, and yes. The Bahrain-for-Qatar substitution was discovered in part because the location in the "before" image was already geolocated in existing databases and could be compared. The fabricated satellite images followed the emergence of imposter OSINT accounts on social media designed to undermine credible digital investigators — as one researcher noted, OSINT developed as a solution using public satellite imagery to circumvent censorship, "but it's now being preyed upon by disinformation agents."

What AI Does to the Timestamp Problem

At the collection and extraction layer, AI-assisted pipelines can ingest source material across languages simultaneously, extract temporal markers from unstructured text, and convert implicit time references — "the morning after the summit," "three days before the announcement" — into relative sequence labels that can be placed in a preliminary ordering. Large language models can reconstruct a timeline of events by identifying temporal patterns and correlations within data, automatically generating a detailed chronicle of the sequence of actions. This is genuinely useful at scale. When you have four hundred Telegram posts, thirty wire reports, twenty videos, and a dozen government statements to process, having a model do the initial temporal tagging is a material improvement over manual triage. The model brings all of it to the surface. The analyst doesn't need to read Arabic and Farsi simultaneously to have a working draft of when things appeared to happen.

Chronolocation assistance is another real contribution. The methodology of inferring image or video timing from sun angle and shadow length — cross-referencing shadow geometry against SunCalc (a free tool at suncalc.org that calculates solar position and shadow angles for any location and date) and Google Earth Pro measurements — has been practiced by skilled OSINT investigators for years. To determine the time of a picture, analysts need shadow length, the subject's height, the altitude angle, and the azimuth. Multimodal AI systems can now assist with the geometric calculations, measuring shadow ratios from images, querying solar position data, and narrowing the date-time window faster than a human working manually. What previously required careful step-by-step measurement work in Google Earth Pro can be scaffolded by AI into a first-pass constraint set that the analyst refines.

The contradiction-detection function is where AI capability meets its most important limitation. A large language model ingesting a corpus of documents can be prompted to surface instances where two sources assign different timestamps to what appears to be the same event. At volume, this flagging function is valuable. The problem is that the model cannot determine whether a discrepancy is noise, an error, or disinformation. It can identify that two timestamps are ninety minutes apart. It cannot tell you whether the journalist in source A had a misconfigured phone or whether the government in source B issued a time-stamped statement retroactively. That determination requires knowing something about those specific sources — their past behavior, their institutional incentives, their track record in similar situations — that is not in the document corpus and that no general-purpose model carries with reliable precision.

Large language models can replicate portions of the verification process, but often without necessary real-world grounding. In one test, when asked when an image was taken, Google's Gemini used shadows and street activity to affirm alignment with a fabricated timestamp — failing to note that the building portrayed had been destroyed in a fire months earlier. Only when explicitly asked about the current state of the building did the model comment on the discrepancy. This is the failure mode that should concern practitioners most. The model confirmed the false timestamp not through confabulation but through competent visual analysis of a manipulated image. It did exactly what it was designed to do, and arrived at the wrong answer because it lacked contextual knowledge about the world that was not visible in the image itself. Synthetic content only needs to be plausible, not perfect — and models confidently fill in the rest, producing a false geographic and temporal certainty that appears evidentiary but is predictive.

Without localized archives or awareness of recent infrastructural changes, large language models rely on broad plausibility rather than evidence. This creates a core epistemic tension: OSINT depends on transparent, repeatable, evidence-backed verification, while AI produces probabilistic, variable, and non-auditable answers. The timeline you produce with AI assistance is faster. It is not automatically more accurate. There is a specific failure pattern where AI-assisted timelines are confidently wrong in ways that manual analysis would surface through productive doubt.

Case Study: MH17 and the Discipline of Chronological Reconstruction

The investigation into the downing of Malaysia Airlines Flight MH17 (the passenger jet shot down over eastern Ukraine on July 17, 2014) remains the most instructive OSINT timeline reconstruction in the public record — not because the methodology translates directly to contemporary AI-assisted work, but because it demonstrates precisely the kind of multi-layered temporal and evidentiary discipline that AI can scaffold but cannot replace.

On November 9, 2014, Bellingcat (the open-source investigative outlet founded by Eliot Higgins) published a report linking a Buk missile launcher filmed and photographed in eastern Ukraine on July 17 to the downing of MH17. Based primarily on social media evidence, the report detailed the launcher's movements in eastern Ukraine on that date, evidence that it originated from the 53rd Anti-Aircraft Rocket Brigade in Kursk, Russia, and convoy activity before and after July 17.

The reconstructed timeline was built from video and photographic fragments, each requiring independent temporal verification before it could be placed in the chronological sequence. The Luhansk video was critical. Buk 332 was filmed traveling east through the separatist-controlled city of Luhansk on the morning of July 18, 2014, missing one missile; intercepted communications indicated the launcher was taken into Russia shortly after. This single data point — the direction of travel, the missing missile, the morning timing — required chronolocation to establish its placement in the sequence. That placement had strategic implications: it meant the launcher had been in range of MH17's flight path the previous afternoon and was withdrawing afterward.

On July 21, 2014, the Russian Ministry of Defense presented fabricated and misleading information about MH17's flight path, radar data, the location of the Luhansk video, and included misdated and heavily edited satellite imagery. The fabrications were structural — designed to exploit the same temporal ambiguities that make timeline reconstruction hard. The video's location was disputed. The radar data's timestamps were challenged. The satellite imagery's dates were altered. Each challenge created a new discrepancy in the evidentiary record that had to be resolved before the timeline could hold.

Russia also launched what an investigative committee of the Australian parliament called a "cold-war style disinformation campaign" that included attempts to edit MH17's Wikipedia entry and a massive astroturfing operation generating no fewer than 57,000 tweets per day flooding social media with false narratives. This was deliberate saturation — creating so many competing claims that establishing any single timestamp or location required investigators to fight through thousands of contradictory signals before reaching the evidentiary bedrock.

What Bellingcat's team did, painstakingly and over months, was apply cross-source reconciliation with explicit source credibility weighting. By comparing and crossing many different pictures, videos, and satellite images, and corroborating them with eyewitness accounts, they obtained a clear picture of what happened in the days prior to and after the incident. Each source was evaluated not just for its temporal marker but for its provenance: Who posted it? When were they in a position to have filmed it? Does the shadow angle in this video match the stated time? Does the vehicle visible in the background appear in dated photographs elsewhere in the timeline? The verification was redundant by design — every claim required at least one independent corroborating source before it entered the timeline with confidence.

Now consider what an AI-assisted version of this investigation looks like in 2026. Current-generation large language models can ingest hundreds of posts, extract temporal markers from Cyrillic and Arabic text simultaneously, flag discrepancies between claimed timestamps and solar angles, and draft a preliminary chronological ordering in hours rather than weeks. That acceleration is material. What these models cannot do is evaluate the Russian Ministry of Defense's history of using misdated satellite imagery as a disinformation technique, recognize that a particular Telegram channel was created two hours after the event and is likely astroturfing, or understand that a specific military unit's communications patterns indicate when it transitioned from radio silence to public messaging — which itself tells you something about timing. That contextual weight lives in trained human analysts who understand the political environment, the institutional actors, and the patterns of deliberate deception. The AI brings the material together. The analyst decides what it means.

Cross-Source Reconciliation: The 90-Minute Problem

There is a specific scenario that illustrates the analytical decision point better than any theoretical discussion. Two credible, independent sources place the same event ninety minutes apart on the clock. Both sources have established track records. Neither has an obvious motivation to falsify the timestamp. The gap could be explained by clock misconfiguration, by different definitions of when the "event" began versus ended, by platform upload delay, or — if you are working a politically charged environment — by deliberate timestamp manipulation designed to create ambiguity that serves a particular narrative. How do you reconcile them?

The answer in current practice is methodical and hierarchical. AI's role is assistive rather than authoritative.

The first move is to establish whether both sources are describing the same event or adjacent ones. AI-assisted temporal clustering can help here — if you have twenty sources and two of them are ninety minutes off from the cluster of eighteen, that isolation is meaningful and can be surfaced automatically. But if the population of sources is small or the event is contested, the clustering gives you nothing.

The second move is to interrogate the timestamps themselves against physical evidence. Using SunCalc with precise coordinates, analysts can find the exact solar position matching the shadow data in an image — if shadows in a video confirm 13:13 local time but the posted timestamp says 14:45, the physical evidence overrides the claimed timestamp. This is the hardest cross-check to defeat with deliberate disinformation, because it requires the fabricator to have correctly modeled the solar geometry at the specific location and date — a computation many disinformation operations don't bother to do. When they don't bother, the physics becomes the audit trail.

The third move is source credibility weighting, and here AI provides essentially no guidance. Credibility weighting in timeline analysis requires the analyst to make explicit judgments about institutional incentives, track records, and access. A government statement timestamped at 14:30 from a ministry that has previously issued retroactively backdated statements during military operations carries less temporal weight than a geolocated video posted in real time by an independent journalist with no stake in the outcome. These judgments are not in any document the AI has processed. They live in the analyst's knowledge of the actors. Building and maintaining that knowledge — developing what amounts to a source reputation ledger — is the institutional practice that determines whether cross-source reconciliation produces accurate timelines or confident garbage.

The failure mode is when an AI-assisted pipeline resolves the ninety-minute discrepancy through statistical reasoning — averaging the timestamps, choosing the cluster mean, or simply deferring to the source with more corroborating posts in the corpus — without flagging that the resolution method is arbitrary. Large language models are constrained by their inability to interpret time-series causality natively. Causality and sequence in event reconstruction are related but distinct from temporal ordering, and conflating them — concluding that because event A's timestamp precedes event B's, A caused B — is the kind of reasoning error that speed and volume make more likely, not less, because the analyst is working with the AI's draft rather than building the timeline from scratch with deliberate attention to each inferential step.

Where Disinformation Targets the Workflow

The MH17 countermeasures were crude by 2026 standards. Misdated satellite imagery, edited metadata, coordinated inauthentic posting. Today's operations are more sophisticated because they understand the workflow they are trying to defeat.

Generative AI has accelerated state actors' ability to fabricate convincing satellite imagery during conflicts. During the US-Israeli military action against Iran in early 2026, Tehran Times posted imagery claiming to show destroyed US radar equipment at a base in Qatar — in fact an AI-manipulated version of a Google Earth image of a US base in Bahrain. The point is not just that the fabrication existed. The operational design is what matters. A "before vs. after" format is specifically optimized for OSINT analyst consumption — it mimics the exact format that legitimate OSINT investigators use to document battle damage assessment. The fabrication was crafted to enter the workflow at the point where analysts aggregate evidence, and it nearly succeeded because the visual giveaway — a row of cars parked in identical positions in both the authentic and manipulated images — required careful comparative analysis to detect.

Reports of fake satellite imagery created or edited using AI followed the Russia-Ukraine conflict and the four-day war between India and Pakistan as well. Fabricating material that enters the OSINT pipeline at the aggregation stage, formatted to look like legitimate primary evidence, with timestamps and coordinates plausible enough to survive a first-pass automated check, has become a standard component of information operations during armed conflict. The adversary is not trying to fool the entire OSINT ecosystem. They are trying to inject one false data point into the chronological record that shifts the perceived sequence of events — who struck first, when the ceasefire was violated, which side was in the area when the casualties occurred.

The design attack on AI-assisted timelines is the plausibility attack. Even simple stylistic manipulations destabilize AI models: adding snow and fog to an image of the Sahara Desert caused several large language models to return inconsistent or incorrect geolocation guesses, including Mongolia. The adversary doesn't need to defeat the model completely. They need to introduce sufficient visual noise that the model's confidence drops to ambiguous and the analyst moves on to cleaner evidence — evidence that may itself have been curated to support the false narrative. The fabricated satellite imagery of the Qatar base was designed to enter a corpus where the real imagery was confused and contested; its goal was to shift the aggregate inference, not to be the sole evidence.

In early 2025, fact-checkers at Full Fact documented how Grok and Google Lens both misidentified AI-generated imagery purporting to show a "train attack" as authentic. The systems failed to detect the fabrication and provided contextual details like "train routes" and "station locations" that reinforced the illusion of reality. This last element is the most operationally dangerous: the model not only failed to detect the fabrication but generated corroborating detail that made the false image more convincing. The analyst who queries an AI assistant and receives what appears to be independent corroboration — "this appears to show the Kharkov-Kherson freight line near Pavlohrad" — has received AI-generated reinforcement of disinformation, sourced entirely from the model's statistical priors about what makes a credible image description.

Fabricated satellite images have been accompanied by imposter OSINT accounts on social media specifically designed to appear to undermine credible digital investigators — establishing a meta-layer of doubt around legitimate analysis. The strategic logic is recursive: make OSINT itself appear untrustworthy, and you eliminate the verification mechanism at the same time you introduce the fabrication.

The Human Load-Bearing Members

There is a specific set of analytic functions in timeline reconstruction where human judgment is not just valuable but structurally required — where removing it doesn't slow the work down, it makes it wrong.

The first is source credibility weighting over time and context. A source that was reliable in one operational environment may be compromised, co-opted, or operating under different constraints in another. An outlet that provided accurate timeline data in 2022 may now be state-adjacent in ways that introduce systematic timestamp bias. This reputation knowledge is dynamic, context-dependent, and requires human maintenance of a source credibility ledger that no current AI system maintains with the necessary specificity. When GeoConfirmed (the volunteer-driven OSINT conflict verification project at geoconfirmed.org) validates a geolocation, that validation carries epistemic weight because it reflects accumulated human judgment about methodology and track record. No AI system can substitute for that community's earned credibility.

The second is intent inference — the most dangerous and important judgment in contested timeline analysis. When two timestamps conflict, the analyst must decide whether the gap is innocent or deliberate, and if deliberate, in whose interest. That determination is a political and psychological inference about institutional behavior. It requires asking: who benefits from this particular temporal ambiguity? Which actor needs the sequence to be uncertain? Does the pattern of claims in this corpus resemble previous deception operations by this actor, and if so, which phase of the operation are we in? These are intelligence analysis questions, not document processing questions. Large language models capable of generating reports can identify incident root causes and flag timeline anomalies — but the practical application of these models within forensic timeline tasks remains in early stages. That measured assessment, from applied research rather than marketing, is the appropriate level of confidence.

The third is recognizing the moment when the timeline itself is the target. Experienced analysts recognize that in certain operational contexts, the adversary's goal is not to persuade you that a specific event happened at a specific time — it is to make the timeline irresolvable. To create sufficient competing claims that no confident chronology can be established. To shift the analytical product from "here is what happened and when" to "this cannot be determined." That strategic disinformation goal is visible in patterns that an AI system has no particular reason to surface: the number of timestamp variations is unusually high, the variations cluster around a specific ninety-minute window that would, if accepted, invert the apparent sequence of strike and retaliation, and the accounts generating the contradictory evidence were all created after the event. A human analyst who has worked similar information operations recognizes that pattern. The AI sees documents with conflicting timestamps.

The fourth is what might be called translation politics — the judgment about how to handle genuine ambiguity that exists in non-English source material about temporal relationships. Arabic temporal constructions, Russian aspect-based tense systems, and Chinese evidential markers all encode uncertainty about timing in ways that machine translation flattens. When a Russian military statement uses a perfective verb aspect to describe an action that might be completed or ongoing, that grammatical ambiguity is analytically significant. A translator who understands both the language and the operational context can preserve and flag that ambiguity. An AI pipeline converts it to a determined timestamp and moves on.

These four functions — source credibility over time, intent inference, disinformation pattern recognition, and translation politics — are where human analysts are load-bearing. Remove them, and the timeline you produce is faster, more comprehensive in coverage, and potentially wrong in ways you cannot detect because the error is invisible in the document corpus.

The Practice Change

The mental model most analysts currently hold runs something like this: AI accelerates the first pass, human judgment refines it. That framing is not wrong, but it is insufficiently specific about which first-pass outputs carry hidden error rates that refinement won't catch.

The specific practice change that follows from this episode's argument is architectural. Before you route AI-assisted timeline output to an analyst for review, you need to explicitly mark each data point with its derivation: is this timestamp from a physical evidence inference (solar angle, shadow analysis, weather cross-check), from direct metadata, from a human-verified source, or from the AI's linguistic extraction from text that may itself be unreliable? That derivation tag is not overhead. It is the audit trail that allows the analyst to apply appropriate skepticism to each point in the timeline. A timestamp derived from SunCalc cross-check on a geolocated video is epistemically different from a timestamp derived from an LLM's extraction from a translated government statement. Treating them as equivalent inputs to a unified timeline is where the confident-wrong failure mode lives.

The harder version of this practice change is institutional: invest seriously in maintaining source reputation knowledge as a perishable and specific asset. Which Telegram channels have been shown to post fabricated material and in what contexts? Which wire services have institutional incentives to timestamp events in ways that serve particular political narratives? Which governments have a documented history of backdating statements during military operations? That knowledge is not in any AI training corpus at the granularity required for timeline reconstruction in live operational contexts. It lives in human analysts, and it degrades when those analysts leave or when the knowledge is never formally maintained. Chronolocation and geolocation share similar analytic structures; both techniques demand thorough investigation of every part of a photo or video, an understanding of the surrounding area, local knowledge, and awareness of weather — requirements that exceed what any automated system can supply.

The MH17 investigation took months and dozens of analysts to produce a timeline that held. AI-assisted tools in 2026 would compress that enormously. But the investigation succeeded because the analysts understood that the Russian Ministry of Defense was in the business of producing plausible falsehoods with specific goals, and every piece of counter-evidence the Ministry offered had to be evaluated against that prior. That prior was not derivable from the documents. It was an intelligence judgment about an institution's behavior over time, applied to the task of reconstructing a sequence of events.

The implication is concrete. The next time your AI-assisted pipeline produces a clean chronological ordering from a contested event, look specifically at the sources that resolved the contradictions. Ask whether the resolution was physical evidence, corroborated testimony, or statistical inference from the corpus. Ask who benefits from each resolution. Ask whether the timeline you now have is accurate or merely consistent — and understand that consistency is exactly what sophisticated disinformation is designed to produce.

Those are different things. The tools will not tell you which one you have.


Sources in this episode include Bellingcat's MH17 open-source investigation, Sector035's guide to chronolocation methodology, the Reuters Institute analysis of AI's effect on OSINT verification, and AFP's reporting on AI-generated fake satellite imagery during the US-Iran conflict. The SunCalc tool used in chronolocation analysis is available at suncalc.org. GeoConfirmed's volunteer conflict verification work is documented at geoconfirmed.org.