Summary

This guide has shown how we identified optimization opportunities within the Chromium open source codebase. It also provides detail about a number of specific optimizations made using Neon intrinsics.

One additional notable optimization was a 20% increase in performance by optimizing inflate_fast() to use Neon intrinsics to perform long loads and stores in the byte array.

The end result of all these optimizations was a 2.9x boost to PNG decoding performance. The following figure shows the decoding time improvement (in ms) for test images comparing vanilla (unoptimized) zlib to Neon-optimized zlib:

Optimizations were validated using representative data sets. For PNG, we used three sets of test data:

For more information about Neon programming in general, see the Neon Programmer's Guide for Armv8-A on the Arm Developer website.

For more information about Neon intrinsics, see the Neon Intrinsics Reference.

Previous