Histogram Analysis
Reads the colour statistics of an image for two synthetic-image tells: a flat or near-monochrome palette, measured as low per-channel histogram entropy, and the saturation cue, in which a generative model produces a colourful image that nonetheless lacks the highly-saturated colours and clipped highlights a real camera records. It works on the colour values alone, with no model.
Technical description
I5 is a deterministic, generator-agnostic screen on colour statistics. It carries two signals. The first is the Shannon entropy of each colour channel's histogram: a natural photograph spreads its tones across the range and scores 5 to 7 bits per channel, whereas a flat synthetic palette or a heavily posterised image scores far lower. The second is the saturation cue of McCloskey and Albright: because a generative network normalises its activations, it underproduces both the highly-saturated colours and the clipped highlights and shadows that a camera records through sensor saturation, so a generated image can look colourful yet never reach the extremes of saturation and exposure. The two signals sum to a 0 to 5 score. The histogram-comb gaps that contrast and levels adjustments leave are screened by the tone-curve indicator, so I5 stays on entropy and the saturation extremes. It requires the image to be at least 16 by 16 pixels.
How it works
The indicator runs deterministically at Layer 1 on the RGB array.
For the entropy signal, each channel's 256-bin histogram is taken and its Shannon entropy is computed as H = - sum over bins of p_i log2(p_i), where p_i is the fraction of pixels in bin i. Each channel contributes max(0, (3 - H) / 3) to the entropy score, which is zero for a natural channel (H above 3 bits) and rises toward 1 as the channel flattens; the three contributions are summed and capped at 3.0.
For the saturation cue, the per-pixel saturation and value are computed in the HSV sense from the RGB maximum and minimum: with mx and mn the per-pixel channel maximum and minimum (scaled to [0, 1]), the saturation is S = (mx - mn) / mx and the value is V = mx. The cue evaluates only when the image carries real colour, that is when the median saturation exceeds 0.15, so that grayscale images are not judged. It then measures two deficits: the fraction of highly-saturated pixels (S > 0.9) against an expected floor of 0.02, giving sat_deficit = max(0, (0.02 - fraction) / 0.02), and the fraction of clipped pixels (V > 0.98 or V < 0.02) against a floor of 0.005, giving clip_deficit similarly. The saturation score is min(2.0, 2.0 · sat_deficit · clip_deficit); the deficits are multiplied so the cue fires only when both the saturated and the clipped extremes are absent, which is the washed-out signature rather than an ordinary soft image.
The two contributions are summed and the score is reported as min(5.0, total). The metadata records the three channel entropies, the median saturation, the highly-saturated-pixel fraction, and the clipped-pixel fraction.
Score thresholds
| Score | Meaning |
|---|---|
| 0 to 1 | Natural colour spread with saturated and clipped extremes present. |
| 2 to 3 | One signal: a flat or near-monochrome palette, or a colourful image lacking saturated and clipped extremes. |
| 4 to 5 | Both signals together. Consistent with a synthetic or heavily processed palette. |
Why this matters
Colour statistics are a long-standing and surprisingly strong cue for synthetic images. McCloskey and Albright showed that a generator handles colour differently from a camera, and in particular that its internal normalisation suppresses the highly-saturated pixels and the clipped highlights that a real sensor produces, so the frequency of saturated and over-exposed pixels separates generated imagery from camera imagery [1]. The broader point, that real and synthetic images differ in their colour distributions, recurs across the literature: co-occurrence statistics of the colour channels distinguish generated images from real ones across generators [2], and a study of synthetic images from generative adversarial networks and diffusion models found systematic differences in colour and autocorrelation as well as in the frequency domain [3]. Entropy captures the complementary case of a flat or posterised palette. I5 implements the deterministic, model-free versions of these cues, reading the spread of the channels and the absence of the colour and exposure extremes, as one signal in a forensic suite [4].
Limitations
Both signals are directional rather than conclusive. Low entropy is the normal state of legitimately simple images, logos, diagrams, screenshots, and flat graphics, so the entropy signal must be read alongside the others. The saturation cue is conservative, requiring both the saturated and the clipped extremes to be absent before it fires, but a genuinely soft, low-contrast photograph, an overcast or hazy scene, or an image that has been toned down in editing can still trip it, and conversely a generated image that was colour-graded to add saturation or clipping will pass. The thresholds are calibrated for typical content and are not exact. Histogram-comb and tone-curve gaps from contrast and levels edits live in the tone-curve indicator, local intensity-range anomalies in the local-histogram indicator, and frequency and noise fingerprints elsewhere, so I5 stays on global colour entropy and the saturation extremes.
Theoretical background
I5 rests on how cameras and generators populate colour space differently. A camera's response saturates: bright coloured objects drive a channel to its maximum, producing pixels at full saturation and clipped highlights, and the resulting histogram is broad. A generative network instead synthesises colour through normalised activations that pull values toward the interior of the range, so it both narrows the per-channel histogram and rarely reaches the saturated or clipped extremes, even while producing a convincingly colourful picture. Entropy measures the first effect as a loss of spread, and the saturation cue measures the second as a deficit of extremes, conditioned on the image actually carrying colour so that the test is meaningful. Both are properties of the colour distribution rather than learned features, which keeps the screen interpretable and complementary to the pixel-structure indicators.
References
- McCloskey S, Albright M. Detecting GAN-Generated Imagery Using Saturation Cues. In: IEEE International Conference on Image Processing (ICIP). 2019. p. 4584-4588. arXiv:1812.08247. https://arxiv.org/abs/1812.08247
- Nataraj L, Mohammed TM, Chandrasekaran S, Flenner A, Bappy JH, Roy-Chowdhury AK, Manjunath BS. Detecting GAN generated Fake Images using Co-occurrence Matrices. arXiv preprint arXiv:1903.06836. 2019. https://arxiv.org/abs/1903.06836
- Corvi R, Cozzolino D, Poggi G, Nagano K, Verdoliva L. Intriguing properties of synthetic images: from generative adversarial networks to diffusion models. In: IEEE/CVF CVPR Workshops. 2023. arXiv:2304.06408. https://arxiv.org/abs/2304.06408
- Verdoliva L. Media Forensics and DeepFakes: An Overview. IEEE Journal of Selected Topics in Signal Processing. 2020;14(5):910-932. arXiv:2001.06564. https://arxiv.org/abs/2001.06564