Skip to content

Tuning the memory budget

There are five memory-cost parameters that determine compression performance, independent of compression level. The code was designed to work with a fixed memory budget regardless of input size; poor compression can result from an insufficient budget. Defaults are automatically lowered for small files.

Units

All flags are set in bytes. For example, a 512 MB source buffer is -B536870912.

Source buffer size (-B)

The encoder uses a buffer for the source input (size set by -B). To ensure the source is read sequentially with no backward seeks, the encoder keeps the source horizon half the source buffer size ahead of the input position. A source copy will not be found if it lies more than half the source buffer size away from its absolute position in the input stream.

For large files, -B may need to be raised. The default is 64 MB, which means data should not shift more than 32 MB — that is, no more than 32 MB should be added or removed relative to the source. The minimum is 16 KB. The source file is read into the buffer; it is not mmaped (Xdelta 1.x used mmap()).

Input window size (-W)

The input window size (-W) determines how much input is compressed in a single VCDIFF window. Smaller windows have higher compression cost and take less memory to decode. Larger windows give better compression, but only up to a point, since large-window addresses take more bits to encode. The default is 8 MB, the minimum 16 KB, and the maximum 16 MB.

Instruction buffer size (-I)

The instruction buffer stores potential, possibly overlapping, copy instructions while the encoder looks ahead. Set its size with -I size (default 32K slots); -I 0 selects an unlimited buffer.

An unencoded instruction occupies 28 bytes, so a bounded buffer has advantages. On the other hand, the minimum and maximum source addresses must be decided before encoding the first instruction, so letting the buffer fill before a window finishes can hurt compression for the rest of the window.

Compression duplicates size (-P)

The compressor uses an array of duplicate positions (-P) to find better matches in the target (not the source). This should be less than or equal to the input window size (-W). The default is 256K slots.

Compression level (-0-9)

The compression level sizes two internal data structures. -9 uses about four times as much memory as -1.

Decoder memory requirements

To decode, -B and -W are used much as for encoding. For the source buffer, the decoder uses the smaller of -B or the source file size. Setting -B smaller than the value used to encode causes seeking, served by an LRU cache of blocks.

The input buffer is sized according to -W. In addition, the decoder allocates three buffers for the data, address, and instruction sections of a VCDIFF window; their sizes depend on the compressed size of a window and therefore on the encoder's -W. If secondary compression is used, an extra set of buffers is allocated for each secondary-compressed section.

In summary, decoding uses -B bytes for the source buffer, plus -W bytes for its input buffer, plus three or six buffers totaling no more than one or two times the encoder's -W (depending on secondary compression).