Edge Coherence
Measures how uniform the edge sharpness is across the image. A real photograph varies its sharpness through depth of field, focus falloff, and differing textures, so its edge density varies from region to region; a synthetic image can render the whole frame at one uniform sharpness. The cue is judged only when the image carries enough edge content, so flat images are not mistaken for synthetic. This is a weak supporting signal, used alongside the stronger forensic indicators.
Technical description
I8 is a deterministic, generator-agnostic screen for unnaturally uniform sharpness. The intuition is that a camera produces an uneven distribution of edge sharpness across a scene, because objects at different depths fall in and out of focus, lighting and texture vary, and motion blurs parts of the frame, whereas a generative model can synthesise a picture in which every region is rendered at the same crispness. I8 measures the edge magnitude with the Sobel operator, takes the mean edge density of each block on an 8 by 8 grid, and computes the coefficient of variation (CV) of those densities. A low CV, evaluated only when the image carries enough edge content to be meaningful, indicates a suspiciously uniform sharpness. The deviation maps to a 0 to 5 score, but the cue is capped well below the maximum because it is weak on its own. It requires the image to be at least 64 by 64 pixels.
How it works
The indicator runs deterministically at Layer 1 on the grayscale image. The Sobel derivatives along the two axes are combined into an edge-magnitude map G = sqrt(G_x^2 + G_y^2). The image is tiled into an 8 by 8 grid, and for each block the edge density is the mean of G over the block divided by 255, giving a value near 0 for a flat block and larger for a textured or edge-rich block.
Let the per-block densities have mean d_bar and standard deviation s; their coefficient of variation is CV = s / d_bar. A natural scene, with focus and texture variation, has a high CV; a frame rendered at one uniform sharpness has a low CV. The cue is gated on content: it is evaluated only when the mean edge density exceeds 0.03, so that a flat or near-empty image, whose CV is meaningless, is never flagged. When there is content and CV is below 0.35, the uniformity contribution is (0.35 - CV) / 0.35 x 3.5, which is 0 at the threshold and rises toward 3.5 as the edges become perfectly uniform; the cap of 3.5, below the score maximum, reflects that this is a supporting signal rather than a decisive one. The metadata records the mean edge density, the edge-density CV, whether the image had enough content to judge, and the block count.
Score thresholds
| Score | Meaning |
|---|---|
| 0 to 1 | Edge sharpness varies naturally across the image, or the image is too flat to judge. |
| 2 to 3 | The edge sharpness is noticeably uniform across a content-rich image. |
| 3.5 | The frame is rendered at a strikingly uniform sharpness, a soft cue toward synthetic origin. |
Why this matters
The way sharpness is distributed across an image is a recognised forensic cue. Blur and sharpness inconsistency is a classic basis for splicing localisation: a region inserted from another source carries its own focus and blur, which differs from the host, and exposing that inconsistency localises the tampering even against anti-forensic blurring [1]. The complementary observation, that synthetic images differ from real ones in their local structure as well as their colour and frequency content, is documented for both generative adversarial networks and diffusion models [2], and media-forensics overviews place local sharpness and texture among the cues that distinguish camera images from manipulated or generated ones [3]. I8 reads the simplest, most interpretable version of this: whether the whole frame shares one sharpness, which a real lens and scene rarely produce. Because the cue is weak and easily confounded, the indicator gates it on content and caps its contribution, leaving the decisive evidence to the stronger screens.
Limitations
This is a deliberately weak signal with clear confounders. A genuinely uniform-texture photograph, a close-up of grass, fabric, gravel, or foliage, fills the frame with even edge density and can read as uniform without being synthetic, and a uniformly defocused or motion-blurred photograph is uniform for an honest reason. Conversely a generated image with a simulated depth-of-field blur will vary its sharpness and pass. The content gate prevents false positives on flat images but also leaves them unjudged. The 8 by 8 grid and the global coefficient of variation give a coarse, whole-image view that does not localise a region, and the thresholds are directional rather than exact. Splicing localisation by compression history, noise consistency, and copy-move, and the stronger frequency and colour cues for synthetic origin, live in sibling indicators, so I8 contributes only the global edge-uniformity observation.
Theoretical background
I8 rests on the difference between optical and synthetic image formation. A camera images a three-dimensional scene through a lens with a finite depth of field, so only part of the scene is in sharp focus, and the rest is progressively blurred, while motion, texture, and lighting add further variation; the result is an edge-sharpness field that changes across the frame. A generative model has no lens and no scene depth, and unless it explicitly simulates focus it tends to render every region at a similar crispness, flattening the sharpness field. The coefficient of variation of the per-block edge density is a one-number summary of that field's variability, and a low value on a content-rich image is the trace of the missing optical variation. The measure is a property of the pixels rather than a learned fingerprint, which keeps it interpretable, and its known confounders are the reason it is treated as a weak, supporting cue.
References
- Bahrami K, Kot AC, Li L, Li H. Blurred Image Splicing Localization by Exposing Blur Type Inconsistency. IEEE Transactions on Information Forensics and Security. 2015;10(5):999-1009. DOI: 10.1109/TIFS.2015.2394231
- Corvi R, Cozzolino D, Poggi G, Nagano K, Verdoliva L. Intriguing properties of synthetic images: from generative adversarial networks to diffusion models. In: IEEE/CVF CVPR Workshops. 2023. arXiv:2304.06408. https://arxiv.org/abs/2304.06408
- Verdoliva L. Media Forensics and DeepFakes: An Overview. IEEE Journal of Selected Topics in Signal Processing. 2020;14(5):910-932. arXiv:2001.06564. https://arxiv.org/abs/2001.06564