Improve Memcached performance up to 41% with Alibaba Cloud Yitian 710 instances
In this blog we demonstrate the advantage of running Memcached on Arm-based Alibaba Yitian 710 instances over x86-based instances.
By Ker Liu

Memcached is an open source, high-performance, distributed memory object caching system. It is a popular choice for powering real-time applications in web, mobile apps, gaming, ad-tech, and e-Commerce. Memcached is an in-memory key-value store that offers higher application performance by removing the need to access disks or SSDs. By keeping its data in memory, it avoids delays and can access data much faster than traditional disk-based databases.
In this blog, we compare the throughput of Memcached on two types of Alibaba Cloud ECS instances, to show the performance advantage of Arm. G8y instances, powered by the Alibaba Yitian 710 processor based on Armv9, represent Arm. G7 instances, powered by 3rd Generation Intel Xeon Scalable processors, represent x86.
Benchmark setup and results
We used Memtier as the load generator and performance benchmarking tool. It is an open-source high-throughput benchmarking tool for Memcached. Memtier was deployed on separate ECS instance.
For the Memcached server, we deployed multiple Memcached processes on each core.

Figure 1. Memcached benchmarking topology
The server under test has two ECS instances with the following configurations. The benchmark client used a single G8y.8xlarge instance.
| Processor | ECS type |
| Yitian 710 | G8y.2xlarge |
| The 3rd Generation Xeon | G7.2xlarge |
Table 1. Test server configurations
The benchmark tests were performed with the following software versions and test parameters.
| Component name | Version |
| Memcached | 1.5.22 |
| GCC version | 10.2.1 20200825 (Alibaba 10.2.1-3 2.32) |
| Memtier benchmarking tool | 1.4.0 |
| Operating system | Alibaba Cloud Linux 3.2104 LTS |
| Test config parameter | Value |
| Number of Memtier clients | 8 |
| Number of threads | 8 |
| Number of clients per thread | 10 |
| Number of consecutive tests runs | 3 |
| Data size | 128 |
| Memcached protocol | text |
| Key pattern | random |
| Pipeline | 1, 50, 100 |
We use 8 Memtier clients to generate requests for 8 Memcached processes simultaneously, each Memtier client created 8 threads with 10 clients per thread, which gave 80 simultaneous connections (sessions). Pipeline 1, 50 and 100 was used in this test. Pipeline values greater than 1 can be used for bulk data transfers to increase the throughput of the application.
After enabling XPS (transmit packet steering), RPS (receive packet steering) and RFS (receive flow steering), the performance on both instances can be improved. We observed up to 41% performance benefit of running a Memcached database on Yitian 710 based instances compared to equivalent x86-based instances. The result shown in the following tables is an aggregated result of 30 consecutive test runs.
Let us look at the performance numbers of Memcached on G8y and G7 instances. We compared the throughput (Operations/Sec) values after multiple test runs.
| Pipeline parameter | G7.2x (Operations/Sec) | G8y.2x (Operations/Sec) | Performance gain (%) |
| Pipeline=1 | 1256257.41 | 1482112.07 | 18% |
| Pipeline=50 | 4870840.43 | 6484505.32 | 33% |
| Pipeline=100 | 5241900.43 | 7379739.17 | 41% |
Table 2. Memcached throughput performance results on G8y vs. G7

Figure 2. Performance gains for G8y vs. G7 instances
Conclusion
To conclude, Memcached deployed on Yitian 710 based ECS provides up to 41% more throughput compared to equivalent x86-based ECS instances. In addition, G8y instances are priced 20% less than comparable G7 instances.
By Ker Liu
Re-use is only permitted for informational and non-commercial or personal use only.
