ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
I4Image forensicsGeneric ForensicsLayer 1 (Deterministic)

Metadata Analysis

Inspects hidden image metadata (EXIF data) for signs of manipulation, AI generation, or inconsistencies that reveal the image was not taken by a real camera.

Technical description

Reads only metadata (no pixels). Gathers a lowercase text blob from the EXIF Software, ImageDescription, and Model fields, every string/byte value in the image info dictionary, and the parsed XMP packet, then matches two keyword sets. The provenance set holds the IPTC Digital Source Type values that declare AI generation (trainedAlgorithmicMedia, compositeWithTrainedAlgorithmicMedia, algorithmicMedia), as carried in a C2PA manifest or XMP; the software set holds AI generator names. A provenance marker or an AI software string sets the score to 5.0. Otherwise weak heuristics apply: no EXIF adds 1.0; a missing camera Model or DateTime adds 0.5 each; an AI-specific resolution (square or diffusion ratios, with common screen sizes like 1920x1080 excluded) adds 0.5. The score is capped at 5.0.

How it works

Layer 1 (deterministic, metadata only). Collects EXIF, info, and XMP text, matches the IPTC AI-provenance values and the AI generator software names, and applies a priority cascade: an explicit AI declaration scores 5.0, otherwise the absence of EXIF, camera, or timestamp fields and an AI-typical resolution add small amounts. Reports whether EXIF was present, the software and camera-model fields, the resolution match, and which AI software and provenance markers were found.

Why this matters

The industry has converged on embedded provenance as the primary machine-readable way to label AI content. C2PA records, in a signed manifest, who made an asset, with what tools, and whether AI was involved, using the IPTC Digital Source Type vocabulary, whose value trainedAlgorithmicMedia directly declares AI origin. Microsoft, Adobe, Google, and OpenAI signal generative content through exactly these fields. Metadata is cheap to read and highly informative when present, even though it can be stripped or forged, so it is a first stop in forensics that must be paired with pixel-level analysis.

Score thresholds

0-1
No declaration of AI origin; at most a weak hint such as missing camera metadata
2-3
Several weak hints together, such as no EXIF plus an AI-typical resolution
4-5
An explicit declaration: a C2PA or IPTC AI-provenance assertion, or an AI generator software string

Limitations

Metadata is the easiest evidence to remove, so its absence proves nothing: real photos are routinely stripped of EXIF and AI images can be saved without a C2PA assertion, which is why the heuristics carry low weight and a clean record is not evidence of authenticity. The provenance and software signals are read as declared fields rather than verified against a C2PA signature, so a forged string or unsigned assertion is taken at face value, and a stripped or re-encoded AI image loses the declaration. The resolution hint is weak because generators and cameras share many sizes. Pixel-level compression, noise, frequency, and editing analysis live in sibling indicators.

References

  1. Coalition for Content Provenance and Authenticity (C2PA). (2024). C2PA Technical Specification (Version 2.x). C2PA open standard
  2. International Press Telecommunications Council (IPTC). (2024). Digital Source Type NewsCodes (including trainedAlgorithmicMedia). IPTC controlled vocabulary
  3. International Press Telecommunications Council (IPTC). (2024). Microsoft announces signalling of generative AI content using IPTC and C2PA metadata. IPTC News
  4. Verdoliva L. (2020). Media Forensics and DeepFakes: An Overview. IEEE Journal of Selected Topics in Signal Processing 14(5):910-932