Hashing algorithms map data to an arbitrary fixed-size value. Most hashing algorithms actively try to avoid collisions – e.g., minimizing the probability of two different keys having the same hash. Perceptual hashes do the opposite – they maximize collisions by creating some dimension of data locality – similar keys have similar hashes.
A simple implementation of perceptual hashing of an image might be the following:
- First, minimize the image dimensions, convert to grayscale.
- Calculate the average grayscale pixel value
- For each pixel, encode
1
if the color is lighter than the average, otherwise0
.
By removing some of the dimensions (downsampling the color, size, and resolution), you can
- Image search – Google images uses perceptual hashing for its "Search by Image" feature. Apple uses it on-device to check photos against CSAM.
- Audio search – Shazam uses perceptual hashing to identify songs with the same fingerprint.
- Video search – While I'm not sure that products like YouTube use perceptual hashing, it can be used for digital rights management to identify similar video files.
ffmpeg
, the popular swiss-army-knife of audio/video tools can generate perceptual hashes for video formats. - Spam filtering – generate a hash of an email digest to determine whether it is spam.
Of course, perpetual hashing algorithms are vulnerable to adversarial attacks. For example, steganography methods make it trivial to encode nearly unrecognizable image metadata.