Implemented Techniques¶
A comprehensive overview of all compression techniques implemented in Tinify.
Image Compression Models¶
Foundational Architectures¶
These models from Google Research established the core techniques for learned image compression.
FactorizedPrior (bmshj2018-factorized)¶
Paper: Variational Image Compression with a Scale Hyperprior (Ballé et al., ICLR 2018)
The simplest learned compression architecture:
- Analysis transform (g_a): 4 conv layers with GDN activations
- Synthesis transform (g_s): 4 deconv layers with inverse GDN
- Entropy model: Fully-factorized learned density (EntropyBottleneck)
┌───┐ y
x ──►─┤g_a├──►─┐
└───┘ │
▼
┌─┴─┐
│ Q │
└─┬─┘
│
y_hat ▼
│
·
EB : (Entropy Bottleneck)
·
│
┌───┐ │
x_hat◄─┤g_s├───┘
└───┘
ScaleHyperprior (bmshj2018-hyperprior)¶
Paper: Variational Image Compression with a Scale Hyperprior (Ballé et al., ICLR 2018)
Adds a hyperprior network to predict scale parameters:
- Hyperprior encoder (h_a): Encodes absolute values of y to side information z
- Hyperprior decoder (h_s): Decodes z to predict scales for Gaussian conditional
- Entropy model: Gaussian conditional with predicted scales
┌───┐ y ┌───┐ z ┌───┐ z_hat z_hat ┌───┐
x ──►─┤g_a├──►─┬──►──┤h_a├──►──┤ Q ├───►───·⋯⋯·───►───┤h_s├─┐
└───┘ │ └───┘ └───┘ EB └───┘ │
▼ │
┌─┴─┐ │
│ Q │ ▼
└─┬─┘ │
│ scales│
y_hat ▼ │
· │
GC : ◄──────────────────────────────────────────┘
·
┌───┐ │
x_hat◄─┤g_s├───┘
└───┘
MeanScaleHyperprior (mbt2018-mean)¶
Paper: Joint Autoregressive and Hierarchical Priors (Minnen et al., NeurIPS 2018)
Extends hyperprior to predict both mean and scale:
- Enables non-zero-mean Gaussian conditional
- Better entropy modeling for asymmetric distributions
- Uses LeakyReLU instead of ReLU in hyperprior
JointAutoregressiveHierarchicalPriors (mbt2018)¶
Paper: Joint Autoregressive and Hierarchical Priors (Minnen et al., NeurIPS 2018)
Adds autoregressive context prediction:
- Context prediction: 5x5 MaskedConv2d for causal spatial dependencies
- Entropy parameters: Combines hyperprior and context for final Gaussian params
- Best compression but slower due to sequential decoding
┌───┐ y ┌───┐ z ┌───┐ z_hat z_hat ┌───┐
x ──►─┤g_a├──►─┬──►──┤h_a├──►──┤ Q ├───►───·⋯⋯·───►───┤h_s├─┐
└───┘ │ └───┘ └───┘ EB └───┘ │
▼ params ▼
┌─┴─┐ │
│ Q │ │
└─┬─┘ │
y_hat ▼ ┌─────┐ │
├──────────►───────┤ CP ├────────►──────────┤
│ (Context) └─────┘ │
▼ ▼
· ┌─────┐ │
GC : ◄────────◄───────┤ EP ├────────◄──────────┘
· └─────┘
┌───┐ │
x_hat◄─┤g_s├───┘
└───┘
Attention-based Models¶
Cheng2020AnchorCheckerboard¶
Papers: - Learned Image Compression with GMM and Attention (Cheng et al., CVPR 2020) - Checkerboard Context Model (He et al., CVPR 2021)
Key innovations:
- Residual blocks: Uses ResidualBlockWithStride/Upsample instead of plain convolutions
- Sub-pixel convolution: For efficient upsampling
- Checkerboard context: 2-pass spatial context allowing parallel decoding
- CheckerboardMaskedConv2d: Alternating anchor/non-anchor positions
ELIC (Elic2022Official / Elic2022Chandelier)¶
Paper: ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding (He et al., CVPR 2022)
State-of-the-art architecture with:
- Uneven channel groups: Progressive decoding with groups [16, 16, 32, 64, 192]
- Space-channel context (SCCTX): Combines spatial and channel context
- Attention blocks: Non-local attention in encoder/decoder
- ResidualBottleneckBlock: Efficient 1x1→3x3→1x1 bottleneck
Variable Bitrate Models¶
Paper: Variable-Rate Learned Image Compression with Multi-Objective Optimization (Kamisli et al., DCC 2024)
| Model | Base Architecture |
|---|---|
| ScaleHyperpriorVbr | bmshj2018-hyperprior |
| MeanScaleHyperpriorVbr | mbt2018-mean |
| JointAutoregressiveHierarchicalPriorsVbr | mbt2018 |
Key techniques:
-
Learnable Gain Parameter:
-
Quantization-Reconstruction Offsets:
-
Variable-rate Entropy Bottleneck: Optional
EntropyBottleneckVbrwith adjustable quantization step -
Two-stage Training:
- Stage 1: Standard training (VBR modules frozen)
- Stage 2: Multi-objective optimization with VBR modules active
Video Compression Models¶
ScaleSpaceFlow (ssf2020)¶
Paper: Scale-Space Flow for End-to-End Optimized Video Compression (Agustsson et al., CVPR 2020)
Components:
- Scale-space representation: Multi-scale Gaussian blur pyramid (5 levels)
- Flow estimation: Predicts optical flow + scale field
- Motion compensation: Warps reference frame using flow
- Residual coding: Encodes prediction residual with hyperprior
Point Cloud Compression¶
| Model | Description |
|---|---|
| SFUPointNet | PointNet-based geometry compression |
| SFUPointNet2 | Hierarchical PointNet++ features |
| HRTZXF2022 | Hierarchical point cloud compression |
Entropy Models¶
EntropyBottleneck¶
Fully-factorized learned entropy model:
- Models each channel independently with learned density
- Uses quantized CDFs for entropy coding
- Adds uniform noise during training, rounds during inference
y_hat, y_likelihoods = self.entropy_bottleneck(y)
# y_likelihoods used to compute rate: -log2(likelihoods)
GaussianConditional¶
Conditional Gaussian entropy model:
- Requires predicted scale (and optionally mean) parameters
- Uses discretized Gaussian CDF for entropy coding
GaussianMixtureConditional¶
Gaussian Mixture Model for multi-modal distributions:
- Multiple Gaussian components with learned weights
- Better for complex latent distributions
EntropyBottleneckVbr¶
Variable bitrate entropy bottleneck:
- Adjustable quantization step size
- Supports continuous bitrate control
Latent Codecs¶
Modular building blocks for entropy coding architectures:
| Codec | Description |
|---|---|
| HyperpriorLatentCodec | Complete hyperprior (y + z branches) |
| HyperLatentCodec | Side information branch (z only) |
| CheckerboardLatentCodec | 2-pass checkerboard spatial context |
| ChannelGroupsLatentCodec | Progressive channel-wise decoding |
| RasterScanLatentCodec | Sequential autoregressive decoding |
| GainHyperpriorLatentCodec | Variable bitrate with gain control |
| GaussianConditionalLatentCodec | Gaussian conditional wrapper |
| EntropyBottleneckLatentCodec | Entropy bottleneck wrapper |
Example composing latent codecs:
self.latent_codec = HyperpriorLatentCodec(
latent_codec={
"y": CheckerboardLatentCodec(
latent_codec={"y": GaussianConditionalLatentCodec(quantizer="ste")},
context_prediction=CheckerboardMaskedConv2d(N, 2*N, 5, padding=2),
entropy_parameters=entropy_params_net,
),
"hyper": HyperLatentCodec(
entropy_bottleneck=EntropyBottleneck(N),
h_a=h_a, h_s=h_s,
),
},
)
Neural Network Layers¶
Generalized Divisive Normalization (GDN)¶
Paper: Density Modeling of Images Using a Generalized Normalization Transformation (Ballé et al., ICLR 2016)
Adaptive normalization that decorrelates features:
Masked Convolutions¶
For autoregressive context modeling:
# Type A: masks current pixel (first layer)
MaskedConv2d(in_ch, out_ch, kernel_size=5, mask_type='A')
# Type B: includes current pixel (subsequent layers)
MaskedConv2d(in_ch, out_ch, kernel_size=5, mask_type='B')
# Checkerboard: alternating anchor/non-anchor pattern
CheckerboardMaskedConv2d(in_ch, out_ch, kernel_size=5)
Spectral Convolutions¶
Paper: Efficient Nonlinear Transforms for Lossy Image Compression (Ballé, PCS 2018)
Weights stored in frequency domain for better optimization:
Residual Blocks¶
# Standard residual
ResidualBlock(N, N)
# With strided downsampling
ResidualBlockWithStride(N, N, stride=2)
# With sub-pixel upsampling
ResidualBlockUpsample(N, N, upsample=2)
# Bottleneck (1x1 → 3x3 → 1x1)
ResidualBottleneckBlock(N, N)
Attention Block¶
Non-local attention for capturing long-range dependencies:
Other Layers¶
| Layer | Description |
|---|---|
conv3x3 / conv1x1 |
Convenience wrappers |
subpel_conv3x3 |
Sub-pixel convolution for 2x upsampling |
QReLU |
Quantization-friendly ReLU with configurable bit-depth |
Loss Functions¶
RateDistortionLoss¶
Standard rate-distortion loss:
Where: - \(D\) = Distortion (MSE or 1 - MS-SSIM) - \(R\) = Rate (bits per pixel from likelihoods) - \(\lambda\) = Trade-off parameter
criterion = RateDistortionLoss(lmbda=0.01)
out_criterion = criterion(out_net, target)
# Returns: loss, mse_loss, bpp_loss
Lambda values for quality levels:
| Quality | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
| MSE | 0.0018 | 0.0035 | 0.0067 | 0.0130 | 0.0250 | 0.0483 | 0.0932 | 0.1800 |
Entropy Coding¶
Asymmetric Numeral Systems (ANS)¶
Default entropy coder - fast and efficient:
from tinify.ans import BufferedRansEncoder, RansDecoder
# Encoding
encoder = BufferedRansEncoder()
encoder.encode_with_indexes(symbols, indexes, cdf, cdf_lengths, offsets)
bitstream = encoder.flush()
# Decoding
decoder = RansDecoder()
decoder.set_stream(bitstream)
symbols = decoder.decode_stream(indexes, cdf, cdf_lengths, offsets)
Entropy Coder Selection¶
import tinify
# List available coders
print(tinify.available_entropy_coders()) # ['ans', 'rangecoder']
# Set default coder
tinify.set_entropy_coder('rangecoder')
Key Techniques Summary¶
Transform Coding Pipeline¶
- Analysis transform (g_a): Image → Latent representation
- Quantization: Continuous → Discrete (with noise/STE proxy)
- Entropy coding: Discrete symbols → Bitstream
- Entropy decoding: Bitstream → Discrete symbols
- Synthesis transform (g_s): Latent → Reconstructed image
Quantization Strategies¶
| Method | Training | Inference |
|---|---|---|
| Additive Uniform Noise | Add U(-0.5, 0.5) | Round |
| Straight-Through Estimator (STE) | Round (gradient bypass) | Round |
| Quantization Offsets (VBR) | Learned offset from NN | Learned offset |
Context Modeling Evolution¶
- No context: Factorized prior (independent channels)
- Hierarchical: Hyperprior predicts entropy parameters
- Autoregressive: MaskedConv2d for causal spatial context
- Checkerboard: 2-pass for parallel decoding
- Channel groups: Progressive channel-wise context
- Space-channel (SCCTX): Combined spatial + channel context