ResAIKit
Research Integrity Toolkit
Back to the encyclopedia
I2Image forensicsGeneric ForensicsLayer 1 (Deterministic)

Frequency Analysis

Reads the image's frequency spectrum and screens for two synthetic-image tells: a power spectrum that does not follow the natural 1/f decay (a bumpy or peaky shape) and a periodic grid of peaks left by the up-sampling layers of generative models. It works from the two-dimensional Fourier transform alone, with no model.

Technical description

I2 is a deterministic, generator-agnostic screen built on the frequency-domain fingerprints of synthetic images. Natural photographs have an azimuthally-averaged power spectrum that follows a smooth 1/f power law, the legacy of scale-invariant natural scenes, whereas the convolutional up-sampling used by generative adversarial networks and diffusion models fails to reproduce that distribution and leaves two traces: a distorted high-frequency decay and a regular lattice of peaks at the up-sampling frequency. The indicator computes the two-dimensional Fourier transform of the grayscale image, reduces it to a one-dimensional radial power spectrum, fits a power law to that spectrum, and detects periodic peaks in the two-dimensional magnitude. Departures from the power law and the presence of peaks sum to a 0 to 5 score. It requires the image to be at least 64 by 64 pixels.

How it works

The indicator runs deterministically at Layer 1. The image is converted to grayscale and its two-dimensional discrete Fourier transform F is computed and centred with an fftshift; the power spectrum is P2(u, v) = |F(u, v)|², and a log-magnitude map log(1 + |F|) is kept for peak detection.

The two-dimensional power spectrum is reduced to a one-dimensional profile by azimuthal averaging: the spectrum is divided into 32 concentric radial rings by distance from the centre, and the mean power in each ring gives P(f) as a function of radial frequency f. A power law P(f) proportional to f^a appears as a straight line in log-log axes, so log P(f) is fit against log f by least squares (skipping the direct-current bin at the centre), yielding the decay slope a (the exponent) and the coefficient of determination R-squared = 1 - SS_res / SS_tot, where SS_res is the residual sum of squares and SS_tot the total sum of squares of log P about its mean.

Three signals follow. The decay slope of a natural power spectrum sits in roughly the band [-4.0, -1.0]; a slope outside that band contributes min(2.0, d x 1.5), where d is the distance from the nearer band edge. A natural spectrum follows the power law closely, so a fit R-squared below 0.80 indicates a bumpy or peaky spectrum and contributes min(1.5, (0.80 - R-squared) x 6.0). Finally, periodic peaks are counted on the log-magnitude map after excluding the central low-frequency blob (radius below one quarter of the maximum, which is bright in every natural image); over the remaining mid-to-high band, with mean and standard deviation of the magnitude, cells exceeding mean + 4 standard deviations are counted, and a count c contributes min(1.5, c x 0.3).

The three contributions are summed and the score is reported as min(5.0, total). The metadata records the decay slope, the fit R-squared, and the peak count.

Score thresholds

Score Meaning
0 to 1 The power spectrum follows a smooth natural 1/f law with no periodic peaks.
2 to 3 One signal: a decay outside the natural band, a spectrum that departs from a power law, or periodic peaks.
4 to 5 Several signals together. Consistent with the up-sampling fingerprint of a generative model.

Why this matters

The frequency domain is one of the most studied fingerprints of synthetic images. Durall and colleagues showed that the convolutional up-sampling at the heart of generative models cannot reproduce the spectral distribution of natural data, and that the one-dimensional azimuthally-averaged spectrum exposes the failure, especially at high frequencies, well enough to detect generated images with very high accuracy [1]. Frank and colleagues independently found that the up-sampling leaves a consistent grid-like pattern of peaks in the spectrum, stable across architectures, datasets, and resolutions [2]. The effect is not limited to early generative adversarial networks: a study of synthetic images from generative adversarial networks, diffusion models, and vector-quantised models found that all of them differ from real images in the Fourier domain and in the mid-to-high frequency content of their radial and angular spectral energy [3]. I2 turns these findings into a deterministic screen by measuring how far the radial power spectrum departs from a power law and by counting the periodic peaks that up-sampling produces.

Limitations

These spectral fingerprints are a screen rather than proof, and their robustness has limits that bound the indicator. A follow-up study showed that the high-frequency spectral discrepancy can be reduced by minor architectural changes to the generator, so a model trained to be spectrally consistent will evade the test [4], and ordinary post-processing such as downscaling, blurring, or recompression also reshapes the spectrum. The natural-spectrum band and the conformity threshold are directional rather than exact, and unusual but authentic content, fine repetitive textures, screened or halftoned prints, or heavy noise, can move the slope or add peaks without manipulation. The test reads a global spectrum, so a small synthetic insert in an otherwise real image is diluted. Localized editing, compression-history analysis, noise consistency, and learned generator fingerprints live in sibling indicators, so I2 stays on the global frequency profile.

Theoretical background

I2 rests on two properties. The first is that natural images are approximately scale-invariant, which makes their power spectrum fall off as a power of frequency, P(f) proportional to f^a with a near -2, a relationship that holds closely enough that the log-log fit has high R-squared for real photographs. The second is that the transposed-convolution and interpolation layers used to up-sample feature maps in generative models impose a fixed sampling grid, which both distorts the high-frequency tail of the spectrum and stamps periodic peaks at the grid frequency. Measuring the departure from the power law captures the first effect and counting the peaks captures the second, both as properties of the global spectrum rather than learned features, which keeps the screen interpretable. The known counter-measures and post-processing sensitivities make it one signal among the forensic suite rather than a standalone verdict.

References

  1. Durall R, Keuper M, Keuper J. Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020. arXiv:2003.01826. https://arxiv.org/abs/2003.01826
  2. Frank J, Eisenhofer T, Schönherr L, Fischer A, Kolossa D, Holz T. Leveraging Frequency Analysis for Deep Fake Image Recognition. In: International Conference on Machine Learning (ICML). 2020. arXiv:2003.08685. https://arxiv.org/abs/2003.08685
  3. Corvi R, Cozzolino D, Poggi G, Nagano K, Verdoliva L. Intriguing properties of synthetic images: from generative adversarial networks to diffusion models. In: IEEE/CVF CVPR Workshops. 2023. arXiv:2304.06408. https://arxiv.org/abs/2304.06408
  4. Chandrasegaran K, Tran NT, Cheung NM. A Closer Look at Fourier Spectrum Discrepancies for CNN-generated Images Detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021. arXiv:2103.17195. https://arxiv.org/abs/2103.17195