X (Twitter) Facebook Pinterest LinkedIn E-mail

The zlib-rs project demonstrates how the strategic use of SIMD instructions can transform the performance of critical libraries while enhancing security through Rust’s safe memory management.

The Trifecta Tech Foundation continues to revolutionize open-source infrastructure with its data compression initiative based on Rust. This time, the focus is on improving the performance of the zlib-rs library, a safe and compatible alternative to the popular zlib originally written in C. In a series of technical posts, developers have shown how the use of SIMD (Single Instruction, Multiple Data) instructions can make a substantial difference in efficiency.

Why SIMD?

As Moore’s Law began to slow down, chip manufacturers opted to do more with less: instead of increasing frequency, they allowed a single instruction to process multiple data at once. This gave birth to SIMD, present in architectures like x86_64 and ARM NEON, with registers of 128, 256, or even 512 bits. These extensions allow for batch processing of vector operations like additions, subtractions, or comparisons, accelerating critical functions used by data compressors.

zlib-rs and the case of `slide_hash_chain`

One of the key functions in zlib-rs, slide_hash_chain, adjusts the indices of a table during compression. In its initial version, implemented in pure Rust, it simply iterated through the table subtracting a value from each element. The interesting part comes when developers examine the assembly code generated by the compiler: thanks to autovectorization, Rust is already capable of using SIMD registers without explicit instructions.

However, the story doesn’t end there. Trifecta further optimized the code by using blocks of 32 and 64 elements (depending on the available SIMD instruction type), reducing the number of instructions and improving efficiency without sacrificing portability. On modern architectures, these optimizations enable full use of AVX2 (256 bits) registers, with optimal implementation chosen at runtime using dynamic processor capability detection.

The power of SIMD comparison: `compare256`

In the second technical installment, Trifecta addresses the challenge of comparing two blocks of 256-byte data to detect matches. The compare256 function, in its simplest form, compares byte by byte and counts how many elements match until the first difference. Although Rust generates decent assembly, the inability to vectorize this pattern by default led the team to manually implement a SIMD-based version using xmm registers and instructions like _mm_cmpeq_epi8 and _mm_movemask_epi8.

This SIMD version performs comparisons of 16 bytes simultaneously, converts the results into a bitmask, and then counts the matches using bitwise operations. The outcome: a performance boost of up to 10 times in some cases compared to the traditional version.

Compatibility, security, and efficiency

A key aspect of Trifecta’s approach is to offer generic and optimized versions of each function, selected at runtime based on hardware capabilities. This allows for distributing a single binary compatible with multiple architectures, maintaining security and performance without fragmenting the implementation.

Thanks to Rust and its compile-time safety model, zlib-rs eliminates entire classes of common memory errors found in C, such as buffer overflows or invalid accesses. By combining it with SIMD, a more robust, faster library is created, ready to meet the challenges of modern infrastructure.

Next steps

Trifecta does not stop with zlib. They are also currently working on bzip2-rs and seeking funding to tackle safe Rust implementations of zstd and xz, thereby completing a modern and reliable compression ecosystem.

Interested developers can easily collaborate or adopt zlib-rs, whether in Rust projects or even in C applications, thanks to the libz-rs-sys library, which serves as a compatible substitute for the original zlib API.

Conclusion: The work of the Trifecta Tech Foundation demonstrates that the path toward a safer infrastructure involves using modern languages like Rust and fully leveraging the potential of contemporary hardware. The optimization of zlib-rs through SIMD not only enhances performance but also marks a turning point in how critical libraries should be developed for the operating system of the future.

References: Administración de Sistemas, SIMD in zlib-rs (part 1): Autovectorization and target features, and SIMD in zlib-rs (part 2): compare256.