Noise Consistency
Estimates the local noise level across the image and checks whether it is uniform. A camera imprints a roughly constant noise floor over the whole frame, so a spliced or generated region, which carries a different noise level, stands out. The noise is estimated robustly in the wavelet domain so that texture and edges do not masquerade as noise. It works on the pixels alone, with no model.
Technical description
I3 is a deterministic, generator-agnostic screen for inconsistent noise. A digital sensor adds a noise floor whose level is roughly constant across the frame, so when a region is spliced in from another source, or synthesised, its noise statistics differ from the host image, and the difference survives even when the edit is seamless to the eye. The difficulty is separating noise from content: a textured region has high-frequency energy that mimics noise. I3 follows the wavelet-domain approach of Mahdian and Saic, estimating the per-block noise standard deviation from the finest diagonal wavelet coefficients with the median absolute deviation, which is robust to edges and texture, and then measuring how much the per-block noise levels vary across the image. The variability maps to a 0 to 5 score. It requires the image to be at least 128 by 128 pixels.
How it works
The indicator runs deterministically at Layer 1. The image is converted to grayscale and its local noise level is estimated in the wavelet domain. The finest-scale Haar diagonal detail (the HH sub-band) is computed by taking, over each non-overlapping 2 by 2 cell of pixels A, B, C, D, the coefficient HH = (A − B − C + D) / 2. This diagonal high-pass cancels constant and linear content, so a flat region or a smooth gradient gives HH = 0, and the surviving coefficients are dominated by noise.
The HH map is tiled into an 8 by 8 grid of blocks, and the noise standard deviation of each block is estimated with the median absolute deviation (MAD) estimator, sigma = median(|HH|) / 0.6745, where 0.6745 is the factor relating the MAD to the standard deviation of a normal distribution. Using the median rather than the variance makes the estimate robust to the sparse strong coefficients that edges produce, so it tracks the noise floor rather than the image content.
Let sigma_1, ..., sigma_n be the per-block noise levels (64 of them for the 8 by 8 grid), with mean sigma_bar and standard deviation s. The consistency measure is the coefficient of variation CV = s / sigma_bar, and the score is min(5.0, 4.0 · CV): a uniform noise floor gives a small CV, while a splice or a generated region whose noise level differs raises it. Individual anomalous blocks are reported as robust outliers, those whose level departs from the median by more than 2.5 robust standard deviations, where the robust standard deviation is 1.4826 times the MAD of the per-block levels. The metadata records the mean noise level, the CV, the anomalous-block count, and the total block count.
Score thresholds
| Score | Meaning |
|---|---|
| 0 to 1 | The noise level is uniform across the image, consistent with a single capture. |
| 2 to 3 | The noise level varies between regions, a possible splice or composite. |
| 4 to 5 | Strongly inconsistent noise levels across the image. Consistent with splicing or a generated region. |
Why this matters
Noise inconsistency is one of the most established blind-forensics cues. Mahdian and Saic showed that the local noise standard deviation, estimated robustly from the highest-resolution wavelet coefficients with the median estimator, exposes regions whose noise differs from the rest of the image, revealing splices without any reference [1]. The robust estimator at its core is the classical median-absolute-deviation rule for noise from the finest wavelet band, sigma = median(|coefficients|) / 0.6745, introduced by Donoho and Johnstone, which is the reason the estimate resists edges and texture that would defeat a plain variance [2]. The cue remains central in modern work: multi-scale noise-curve methods localise forgeries by comparing local noise to the global model across scales [4], and learned camera-noise fingerprints push the same idea further by recovering a residual that is specific to the capturing device, so that any region not matching it is suspect [3]. I3 implements the deterministic, model-free version of this principle, reading the spread of the per-block noise floor as a sign that not every pixel came from the same source.
Limitations
Estimating noise is hard, and several effects bound the screen. The wavelet-MAD estimate is far more robust than a raw variance, but pervasively textured content still raises the apparent noise floor, so a busy region can read as higher-noise without manipulation. Strong JPEG compression suppresses and equalises noise, which can both hide a real inconsistency and create block-grid artefacts that mimic one, and aggressive denoising flattens the floor everywhere. The 8 by 8 grid bounds the spatial resolution, so a small insert that falls within one block is averaged with its surroundings. The thresholds are directional rather than exact. Compression-history analysis, copy-move detection, frequency fingerprints, and learned camera-model residuals live in sibling indicators, so I3 stays on the spread of the robustly-estimated noise level.
Theoretical background
I3 rests on the physics of image capture: photon shot noise and sensor read noise produce a stochastic high-frequency component whose magnitude is a property of the sensor and exposure, roughly uniform across a single frame and largely independent of scene content. A composite breaks that uniformity, because its inserted region was captured by a different sensor, at a different setting, or generated without a sensor at all. The estimation problem is that scene texture also lives in the high frequencies, so a naive variance conflates the two; the median absolute deviation of the finest diagonal wavelet band solves this by measuring the bulk of the high-frequency distribution while ignoring the sparse, large coefficients that edges contribute. Reading the coefficient of variation of the per-block estimates turns the noise floor into a consistency test that depends on the image's physics rather than on a learned fingerprint of any one generator.
References
- Mahdian B, Saic S. Using noise inconsistencies for blind image forensics. Image and Vision Computing. 2009;27(10):1497-1503. DOI: 10.1016/j.imavis.2009.02.001
- Donoho DL, Johnstone IM. Ideal spatial adaptation by wavelet shrinkage. Biometrika. 1994;81(3):425-455. DOI: 10.1093/biomet/81.3.425
- Cozzolino D, Verdoliva L. Noiseprint: A CNN-Based Camera Model Fingerprint. IEEE Transactions on Information Forensics and Security. 2020;15:144-159. arXiv:1808.08396. https://arxiv.org/abs/1808.08396
- Gardella M, Musé P, Morel JM, Colom M. Forgery Detection in Digital Images by Multi-Scale Noise Estimation. Journal of Imaging. 2021;7(7):119. DOI: 10.3390/jimaging7070119