ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
I4Image forensicsGeneric ForensicsLayer 1 (Deterministic)

Metadata Analysis

Reads the image's embedded metadata for declarations of origin: the C2PA Content Credentials and IPTC source-type assertions that generators now stamp into AI images, the software string of an AI tool, and weaker hints such as a complete absence of camera metadata or an AI-typical output size. It reads only the metadata, never the pixels, so it runs on any image.

Technical description

I4 is a deterministic screen that inspects what an image declares about itself. The strongest declarations are explicit: the Coalition for Content Provenance and Authenticity (C2PA) Content Credentials and the IPTC Digital Source Type vocabulary now carry a machine-readable assertion of how an asset was made, and generative tools embed the value trainedAlgorithmicMedia (or compositeWithTrainedAlgorithmicMedia) to mark AI output. Failing that, the name of an AI generator often survives in the software or description fields. Weaker hints, an image with no EXIF at all, missing camera or timestamp fields, or a resolution that matches a common generator output, add a little but are not conclusive on their own. The signals combine into a 0 to 5 score. The indicator works purely from metadata and so has no minimum image size.

How it works

The indicator runs deterministically at Layer 1 and reads no pixels. It first gathers a single lowercase text blob from the metadata: the EXIF Software (tag 305), ImageDescription (270), and Model (272) fields, every string or byte value in the image info dictionary, and the parsed XMP packet. Two keyword sets are matched against that blob. The provenance set holds the IPTC Digital Source Type values that declare AI generation, trainedAlgorithmicMedia, compositeWithTrainedAlgorithmicMedia, and algorithmicMedia, as carried by a C2PA manifest or an XMP assertion. The software set holds AI generator names such as stable diffusion, midjourney, dall-e, comfyui, adobe firefly, imagen, and flux.

The scoring is a priority cascade. If a provenance marker is present, the image declares itself AI-generated and the score is set to 5.0 at error severity. Otherwise, if an AI generator software string is present, the score is likewise 5.0. If neither explicit signal is found, the weak heuristics apply: an image with no EXIF metadata adds 1.0 (warning), since AI tools and stripped web images often carry none; if EXIF is present but the camera Model or the DateTime field is missing, each adds 0.5 (info); and a resolution that matches the AI-specific set (512x512, 768x768, 1024x1024, 1024x768 and the other square or diffusion aspect ratios, with common screen and video sizes such as 1920x1080 and 1280x720 deliberately excluded) adds 0.5 (info). The total is capped at 5.0.

The metadata records whether EXIF was present, the software and camera-model fields, whether the resolution matched, and which AI software and provenance markers, if any, were detected.

Score thresholds

Score Meaning
0 to 1 No declaration of AI origin; at most a weak hint such as missing camera metadata.
2 to 3 Several weak hints together (no EXIF plus an AI-typical resolution, for example).
4 to 5 An explicit declaration: a C2PA or IPTC AI-provenance assertion, or an AI generator software string.

Why this matters

The industry has converged on embedded provenance as the primary, machine-readable way to label AI content. The C2PA standard records, in a cryptographically signed manifest, who made an asset, with what tools, and whether AI was involved, and its actions use the IPTC Digital Source Type vocabulary to do so [1, 2]. The decisive value is trainedAlgorithmicMedia: a generative model that supports the standard writes a created action with that digital source type, so reading it is a direct, high-confidence signal of AI origin rather than an inference from the pixels. Adoption is real and growing across the major providers, with Microsoft, Adobe, Google, and OpenAI signalling generative content through exactly these IPTC and C2PA fields [3]. Metadata has always been a first stop in media forensics because it is cheap to read and, when present, highly informative, even as the field recognises that it can be stripped or forged and so must be paired with pixel-level analysis [4]. I4 reads the strong declarations first and falls back to the classic weak hints, while deliberately dropping the most error-prone of them.

Limitations

Metadata is the easiest evidence to remove, so its absence proves nothing: a real photograph posted to social media is routinely stripped of EXIF, and an AI image can be saved without any C2PA assertion, so the weak heuristics carry low weight by design and a clean metadata record is not evidence of authenticity. The provenance and software signals are trustworthy when present but are not tamper-evident here, since the indicator reads the declared fields rather than verifying a C2PA signature, so a forged software string or an unsigned assertion would be taken at face value, and conversely a stripped or re-encoded AI image loses the declaration entirely. The resolution hint is weak even after pruning the common screen sizes, because generators and cameras share many sizes. Pixel-level analysis of compression history, noise, frequency, and editing lives in the sibling indicators, so I4 stays on the declared metadata and provides the provenance read that pixel analysis cannot.

Theoretical background

I4 rests on the distinction between intrinsic and declared evidence. Pixel forensics recovers intrinsic traces of how an image was made; metadata analysis reads what the file declares about its own origin. The declarations have become structured and authoritative: C2PA wraps EXIF, IPTC, and XMP into a signed manifest of provenance and edits, and the IPTC Digital Source Type code list gives a controlled vocabulary in which trainedAlgorithmicMedia means the asset was produced by a trained generative model. Reading these is deterministic and, when the fields are present and honest, decisive, which is why the indicator prioritises them above every heuristic. The heuristics encode the older, weaker correlation that synthetic and stripped images tend to lack camera metadata, kept at low weight precisely because that correlation is noisy. The whole indicator is a property of the file's declared contents rather than a learned model, which keeps it interpretable and complementary to the pixel-level screens.

References

  1. Coalition for Content Provenance and Authenticity (C2PA). C2PA Technical Specification (Version 2.x). 2024. https://spec.c2pa.org/
  2. International Press Telecommunications Council (IPTC). Digital Source Type NewsCodes (including trainedAlgorithmicMedia). 2024.
  3. International Press Telecommunications Council (IPTC). Microsoft announces signalling of generative AI content using IPTC and C2PA metadata. 2024. https://iptc.org/news/microsoft-announces-signalling-of-generative-ai-content-using-iptc-and-c2pa-metadata/
  4. Verdoliva L. Media Forensics and DeepFakes: An Overview. IEEE Journal of Selected Topics in Signal Processing. 2020;14(5):910-932. arXiv:2001.06564. https://arxiv.org/abs/2001.06564