Next steps
This guide has shown how we identified optimization opportunities within the Chromium open-source codebase. The guide also provides detail about several specific optimizations made using Neon intrinsics.
One more notable optimization was a 20% increase in performance by optimizing inflate_fast()
to use Neon intrinsics to perform long loads and stores in the byte array.
The result of all these optimizations was a 2.9x boost to PNG decoding performance. The following figure shows the decoding time improvement, in milliseconds, for test images comparing unoptimized zlib to Neon-optimized zlib:
Optimizations were validated using representative data sets. For PNG, we used three sets of test data:
- An internal data set for Chromium developers, with 92 images
- The public Kodak data set, with 24 images
- The public Google doodles data set, with 154 images
For more information about Neon programming in general, see the Neon Programmer's Guide for Armv8-A on the Arm Developer website.
For more information about Neon intrinsics, see the Neon Intrinsics Reference.