Image Tools

How Image Compression Works: JPEG, PNG, WebP, and AVIF Explained

Published: 2026-03-17

byEditorial Team

imagecompressionjpegpngwebpaviflossylossless

Introduction: Why Knowing the Algorithm Changes How You Compress

Every time you drag a quality slider in an image editor or choose between JPEG and PNG, you are making a decision that is only fully understandable in terms of the underlying compression algorithm. Why does reducing JPEG quality from 90 to 80 barely change how the image looks but cut the file size by 40%? Why does a PNG of a photograph weigh five times more than a JPEG of the same photo when the two formats seem equally "lossless" by popular description? Why does WebP consistently outperform JPEG at the same perceptual quality?

These questions all have precise, algorithmic answers. This article explains how the four most important web image formats — JPEG, PNG, WebP, and AVIF — compress image data. Understanding these mechanisms lets you make better decisions about format selection, quality settings, and the trade-offs between file size and visual fidelity. If you want to apply what you learn, use our free image compressor to experiment with formats and quality settings on your own files.

Lossy vs. Lossless: The Fundamental Divide

Every image compression algorithm falls into one of two categories — or combines both.

Lossless Compression

Lossless compression reduces file size without discarding any image data. Decompressing a lossless file reproduces the original pixel data exactly, bit for bit. Think of it like compressing a ZIP archive: the original files are perfectly restored when you unzip. PNG and GIF use lossless compression. WebP and AVIF both support a lossless mode. Use lossless when pixel accuracy matters: screenshots, text-heavy UI images, icons with sharp edges, medical or forensic images, and any asset you plan to edit further.

Lossy Compression

Lossy compression achieves much higher compression ratios by permanently discarding information that human vision is unlikely to notice. The algorithm exploits known weaknesses in human perception — we are more sensitive to brightness differences than to color differences, more sensitive to changes in large uniform areas than to high-frequency detail. Use lossy for photographs and natural scenes where the discarded information genuinely is not visible at the target quality level. JPEG is the classic lossy format; WebP and AVIF both default to lossy mode.

A Useful Analogy

Imagine describing a sunset photo in a text message. Lossless is a complete pixel-by-pixel description: "pixel 1: RGB(255,120,40), pixel 2: RGB(254,119,39)..." — perfectly accurate, very verbose. Lossy is a prose description: "a deep orange and red gradient over the horizon, blue sky above, slight film grain in the shadows" — much shorter, and a reader can reconstruct a convincing version, but not the exact original.

JPEG Compression Deep Dive

JPEG (Joint Photographic Experts Group), standardized in 1992, remains the most common lossy image format on the web. Its compression pipeline has several distinct stages, each contributing to the final file size.

Stage 1: Color Space Conversion (RGB → YCbCr)

JPEG first converts the image from RGB (red, green, blue) to YCbCr, separating the image into three channels: Y (luma, or brightness), Cb (blue-difference chroma), and Cr (red-difference chroma). This split is crucial because human vision is far more sensitive to brightness variations than to color variations. Separating them allows JPEG to apply different levels of compression to each channel — preserving brightness detail while aggressively compressing color.

Stage 2: Chroma Subsampling

After the color space conversion, JPEG typically downsamples the Cb and Cr chroma channels. This is called chroma subsampling. The notation uses a ratio like 4:4:4, 4:2:2, or 4:2:0:

4:4:4 — No subsampling. Full color resolution for all three channels. Highest quality, largest file.
4:2:2 — Chroma resolution halved horizontally. Used in professional video. Slight color softening at high magnification.
4:2:0 — Chroma resolution halved both horizontally and vertically. Used in most consumer JPEG encoders. Reduces the amount of color data by 50% with minimal visible quality loss in photographs.

Most JPEG encoders default to 4:2:0. At typical viewing distances, the color resolution reduction is imperceptible in natural photos. However, 4:2:0 can cause visible "color bleeding" around high-contrast edges in text-on-image charts or memes, which is why those are better compressed as PNG.

Stage 3: Block Division and DCT

JPEG divides the image into non-overlapping 8×8 pixel blocks and applies the Discrete Cosine Transform (DCT) to each block. The DCT converts each block from the spatial domain (pixel values) into the frequency domain (a sum of cosine waves at different spatial frequencies). The result is a matrix of 64 coefficients — one for the DC component (the average color of the block) and 63 AC components (increasingly fine spatial details).

The key insight is that natural images contain most of their energy in low-frequency components. The upper-left coefficient of the DCT matrix (the DC value) describes the overall brightness of the block; the coefficients toward the lower-right describe finer and finer detail. For a smooth sky or skin tone, these high-frequency coefficients are very small or zero.

Stage 4: Quantization — Where Quality Is Controlled

Quantization is where lossy compression actually happens, and it is what your quality slider controls. Each of the 64 DCT coefficients is divided by a value from a quantization table and rounded to the nearest integer. Higher quantization values round more aggressively, producing smaller integers that compress better but lose more detail.

The quantization table has larger values for high-frequency coefficients (fine detail, which the eye is less sensitive to) and smaller values for low-frequency coefficients (the broad shapes and tones the eye notices most). At quality 95, the quantization table uses small divisors — almost no data is lost. At quality 50, the divisors are large — high-frequency detail is almost entirely discarded, which is why the image looks blocky (the notorious "JPEG artifacts" are the visible 8×8 grid when quantization is too aggressive).

Stage 5: Huffman Entropy Encoding

After quantization, many coefficients are zero (high-frequency detail that was rounded away). JPEG uses run-length encoding to represent runs of zeros compactly, then applies Huffman coding — a lossless variable-length coding scheme — to the remaining non-zero values. This final stage is entirely lossless and adds no further quality loss.

PNG Compression Deep Dive

PNG (Portable Network Graphics), introduced in 1996 as an open replacement for GIF, uses a two-stage lossless pipeline: filtering followed by DEFLATE compression.

Stage 1: Filtering

Before compression, PNG applies a filter to each row of pixels. The filter transforms raw pixel values into differences (deltas) that are typically smaller in magnitude and therefore compress better. PNG supports five filter types, applied per row:

None — Raw pixel values. Useful when the image is already incompressible (noise, film grain).
Sub — Each pixel is stored as the difference from the pixel to its left. Works well for images with horizontal gradients.
Up — Difference from the pixel directly above. Works well for vertical gradients.
Average — Difference from the average of the pixel to the left and the pixel above.
Paeth — Uses a predictor function that selects the nearest of left, above, or upper-left as the predicted value. Generally the best-performing filter for natural images.

Modern PNG encoders compute which filter produces the most compressible result for each row and store the filter type alongside the row data so the decoder can reverse it exactly.

Stage 2: DEFLATE Compression

The filtered row data is compressed with DEFLATE — the same algorithm used in ZIP files and gzip. DEFLATE combines LZ77 (back-references to repeated byte strings) with Huffman coding. Because filtering creates small delta values and sequences of zeros, the filtered data is highly compressible. DEFLATE is lossless by definition, so PNG decompression always reproduces the exact original pixel data.

Why PNG Is Large for Photographs

Photographs contain enormous amounts of high-frequency detail (texture, grain, noise). After filtering, the delta values for each row are highly varied — not small, not repetitive. DEFLATE cannot compress unpredictable data well. The result: a photographic PNG is 3–10× larger than a JPEG of the same image at comparable visual quality. PNG shines when pixel data contains large uniform areas, hard edges, or repeating patterns — exactly the structure that filtering turns into runs of small delta values.

WebP: VP8 Meets Lossless Transforms

WebP was developed by Google from its acquisition of On2 Technologies. It uses fundamentally different algorithms for lossy and lossless modes, both of which outperform their JPEG and PNG counterparts in most scenarios.

WebP Lossy: VP8 Intra-Frame Coding

WebP lossy is based on the VP8 video codec's keyframe (intra-frame) coding. Like JPEG, it uses a block-based DCT, but with more sophisticated prediction and transform coding:

Macroblock prediction: Before transforming, VP8 predicts each block from neighboring already-encoded blocks using multiple intra-prediction modes. This prediction is subtracted from the actual block, and only the residual (the error) is transformed and quantized. Highly predictable blocks produce near-zero residuals that compress to almost nothing.
Larger transforms: WebP can use transforms on larger block sizes, capturing broader structure more efficiently than JPEG's fixed 8×8.
More efficient entropy coding: WebP uses arithmetic coding (more efficient than JPEG's Huffman) and more sophisticated probability modeling.

The result: WebP lossy typically produces files 25–35% smaller than JPEG at equivalent perceptual quality (as measured by SSIM). This is why compressing to WebP with our image compressor can dramatically reduce your file sizes compared to JPEG.

WebP Lossless: Transforms and Entropy Coding

WebP lossless applies a cascade of reversible transforms to the image before entropy coding, each designed to reduce statistical redundancy:

Predictor transform: Each pixel is predicted from its neighbors; only the prediction error is stored.
Color transform: Correlations between color channels are removed (similar to YCbCr separation but reversible).
Subtract green transform: Green channel subtracted from red and blue to exploit correlation.
Color indexing transform: Replaces repeated colors with palette indices when the image has few unique colors.

WebP lossless is typically 25% smaller than PNG for the same image.

AVIF: AV1 Brings Video Codec Power to Images

AVIF (AV1 Image File Format) stores still images using intra-frame coding from the AV1 video codec. AV1 was designed by the Alliance for Open Media specifically to push the compression efficiency frontier beyond HEIC/H.265.

AV1 Intra-Frame Compression

AV1 uses a much richer toolkit than JPEG or even VP8:

Variable block sizes: Superblocks up to 128×128 pixels, subdivided recursively. Large smooth areas use large blocks; complex detail uses small ones.
Rich intra-prediction: Over 60 directional intra-prediction modes plus "smooth" and "DC" predictors. The encoder picks the mode that minimizes residual energy per block.
Multiple transforms: AVIF supports DCT, ADST (Asymmetric Discrete Sine Transform), and identity transforms, chosen per block.
Perceptual optimization: Rate-distortion optimization can be tuned to perceptual quality metrics rather than raw MSE, reducing visible artifacts at high compression ratios.

Wide Color Gamut and HDR

AVIF natively supports wide color gamut (P3, Rec. 2020) and HDR (10-bit and 12-bit bit depth). JPEG is limited to 8-bit sRGB. This makes AVIF the only mainstream web format capable of delivering HDR photography without banding on capable displays.

Compression Efficiency

AVIF typically delivers 45–55% smaller files than JPEG at the same perceptual quality. However, encoding AVIF is CPU-intensive — especially at higher quality settings — which is why browser-side AVIF encoding (as in our image compressor) is slower than JPEG or WebP encoding. Decoding AVIF is fast enough for all modern browsers.

Browser Support

AVIF is supported in Chrome 85+, Firefox 93+, Safari 16.1+, Edge 121+. Internet Explorer has no AVIF support. For broad compatibility, serve AVIF with a WebP or JPEG fallback using the <picture> element.

Perceptual Quality Metrics: What "Quality" Actually Measures

When compression engineers compare algorithms, they need objective metrics that correlate with human visual perception.

PSNR (Peak Signal-to-Noise Ratio)

PSNR measures the ratio between the maximum possible pixel value and the mean squared error (MSE) between original and compressed pixels. It is simple to compute but correlates poorly with human perception — an image with blur everywhere can score higher PSNR than one with sharp but localized artifacts.

SSIM (Structural Similarity Index)

SSIM compares local luminance, contrast, and structure between original and compressed images. It correlates much better with human perception than PSNR. An SSIM of 0.95+ is generally considered visually indistinguishable from the original. Most compression benchmarks use SSIM as the primary quality axis.

Butteraugli and DSSIM

Butteraugli (used by Google's GUETZLI and libjxl) and DSSIM are psychovisual models that more accurately model frequency-dependent sensitivity, color adaptation, and masking effects. They are computationally expensive but produce better rate-distortion curves for optimization. AVIF encoders often use these metrics in their quality optimization loops.

Practical Format and Quality Decisions

Armed with this understanding, here is a decision framework for the most common web use cases:

Content Type	Best Format	Quality Sweet Spot	Why
Hero photograph	AVIF or WebP	70–80%	High perceptual efficiency; large impact on LCP
Product photo	WebP (with JPEG fallback)	75–85%	Good quality at reduced size; broad support
Screenshot / UI	PNG or WebP lossless	Lossless	Sharp edges and text need pixel accuracy
Illustration / graphic	SVG or PNG	Lossless	Hard edges; color indexing compresses well
Thumbnail grid	WebP	60–70%	Small dimensions; compression artifacts less visible
Background texture	WebP or JPEG	50–65%	Low salience; aggressive compression acceptable

How Our Image Compressor Applies These Techniques

Our free image compressor applies browser-native codec implementations (via the Canvas API for JPEG and WebP, and dedicated libraries for AVIF) directly in your browser. When you adjust the quality slider, you are controlling the quantization table aggressiveness for JPEG, the rate-distortion target for WebP VP8, or the perceptual quality target for the AV1 encoder. No files are ever uploaded to a server — all encoding happens on your machine, which means your images stay private and the only latency is local CPU time.

Now that you understand what happens inside each algorithm, you can make informed decisions: choose WebP for the best browser-compatible compression on photos, AVIF for maximum compression on a progressive-enhancement basis, PNG for anything with text or hard edges, and experiment with quality settings knowing exactly what trade-offs you are making.

← Back to Blog