Overview
Arm Instruction Emulator is an emulator that runs on AArch64 platforms and emulates Scalable Vector Extension (SVE) instructions. The emulator lets you develop and compile SVE code with Arm Compiler for HPC, then run the SVE binary without needing access to SVE-enabled hardware.
Prerequisites
This tutorial also uses the Arm C/C++ Compiler from Arm's suite of HPC tools.
See Installing Arm Compiler for HPC and Environment configuration for instructions on installing and configuring your Linux environment for Arm Compiler for HPC, respectively.
Installing Arm Instruction Emulator
Refer to Installing Arm Instruction Emulator for details on how to perform the installation on Linux.
Environment configuration
Your administrator should have already installed Arm Instruction Emulator and made the Environment Module available.
To see which Environment Modules are available:
module avail
Note: you may need to configure the MODULEPATH environment variable to include the installation directory:
export MODULEPATH=$MODULEPATH:/opt/arm/modulefiles/
To configure your Linux environment to make Arm Instruction Emulator available:
module load <architecture>/<linux_variant>/<linux_version>/suites/arm-instruction-emulator/<version>
For example:
module load Generic-AArch64/SUSE/12/suites/arm-instruction-emulator/1.2.1
You can check your environment by examining the PATH variable. It should contain the appropriate Arm Instruction Emulator bin directory from /opt/arm, as installed in the previous section:
echo $PATH /opt/arm/arm-instruction-emulator-1.2.1_Generic-AArch64_SUSE-12_aarch64-linux/bin:...
Simple example: Compile and run Hello World program
In this example you will write a Hello World program, compile it using Arm C/C++ Compiler, and run it using Arm Instruction Emulator.
- Create a simple "Hello World" C program and save it as a file. In our case, we have saved it in a file named
hello.c.
/* Hello World */
#include <stdio.h>
int main()
{
printf("Hello World");
return 0;
} - To generate an executable binary, compile your program with Arm C/C++ Compiler.
armclang -O3 -march=armv8-a+sve -o hello hello.c
The
-O3flag ensures the highest optimization level with auto-vectorization is enabled. The-march=armv8-a+sveflag targets hardware with Armv8-A architecture.
Note: In this example, no SVE code is used. However, it is good practice to enable the highest level of auto-vectorization and target an SVE-enabled architecture when compiling any code to be run using Arm Instruction Emulator. - Run the generated binary
hellousing Arm Instruction Emulator:
armie -msve-vector-bits=256 ./hello
Hello WorldFor this simple Hello World example, Arm Instruction Emulator runs the code on an emulated SVE-enabled architecture without utilizing SVE instructions.
To use Arm Instruction Emulator to its full potential, that is, to emulate SVE instructions, we need to look at a more complex program. An example of a program containing SVE code is available in the next section of this tutorial.
Advanced example: Compile and run a program with SVE code
In this example we demonstrate using Arm C/C++ Compiler to compile and vectorize an example with SVE code, targeting the SVE-enabled Armv8-A architecture. We then use Arm Instruction Emulator to emulate running the SVE code.
-
Create a new file called
example.c. Open the file, insert the following C code, and save and close the file.// example.c #include <stdio.h> #include <stdlib.h> #define ARRAYSIZE 1024 int a[ARRAYSIZE]; int b[ARRAYSIZE]; int c[ARRAYSIZE]; void subtract_arrays(int *restrict a, int *restrict b, int *restrict c) { for (int i = 0; i < ARRAYSIZE; i++) { a[i] = b[i] - c[i]; } } int main() { for (int i = 0; i < ARRAYSIZE; i++) { // Generate a random number between 200 and 300 b[i] = (rand() % 100) + 200; // Generate a random number between 0 and 100 c[i] = rand() % 100; } subtract_arrays(a, b, c); printf("i \ta[i] \tb[i] \tc[i] \n"); printf("=============================\n"); for (int i = 0; i < ARRAYSIZE; i++) { printf("%d \t%d \t%d \t%d\n", i, a[i], b[i], c[i]); } }This C program subtracts corresponding elements in two arrays, writing the result to a third array. The three arrays are declared using the restrict keyword, indicating to the compiler that they do not overlap in memory.
-
Compile the program as follows:
armclang -O3 -march=armv8-a+sve -o example example.c
-
Run the binary using Arm Instruction Emulator:
armie -msve-vector-bits=256 ./example
To return:
i a[i] b[i] c[i] ============================= 0 197 283 86 1 262 277 15 2 258 293 35 ... 1021 165 234 69 1022 232 295 63 1023 204 235 31
The SVE architecture extension specifies an implementation-defined vector length. The
-msve-vector-bitsoption lets you specify the vector length used by Arm Instruction Emulator. The vector length is a multiple of 128 bits, with a maximum of 2048 bits. Use the-mlist-vector-lengthsoption to list all valid vector lengths:armie -mlist-vector-lengths
To return:
128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048
Advanced example: Gathering profiling data with Arm Instruction Emulator
Arm Instruction Emulator helps you understand which parts of your code affect program performance. It samples which instruction is being executed at a user-specified frequency while the program is running.
-
This example uses the LULESH 2.0 simulation. First download and build LULESH 2.0 by following these steps:
-
Download the latest release version of LULESH 2.0 CPU Models from https://codesign.llnl.gov/lulesh.php. At the time of writing, the latest version is 2.0.3:
wget https://codesign.llnl.gov/lulesh/lulesh2.0.3.tgz
-
Uncompress and extract the downloaded package:
tar -xvf lulesh2.0.3.tgz
-
By default, the LULESH build configuration compiles using g++. We'll change this to use Arm C/C++ Compiler for HPC and generate insights by making the following changes in the
Makefile:Change from
Change to
SERCXX = g++ -DUSE_MPI=0SERCXX = armclang++ -DUSE_MPI=0CXX = $(MPICXX)CXX = $(SERCXX)CXXFLAGS = -g -O3 -fopenmp -I. -WallCXXFLAGS = -g -O3 -fopenmp -I. -Wall -march=armv8-a+sve -insight -
To build the LULESH application:
make
The build produces an executable binary, lulesh2.0 in the current directory.
-
-
Run the LULESH 2.0 binary with
armieusing the--profile-periodor –p option to specify the sample period in microseconds:armie -msve-vector-bits=512 -p 100 -- ./lulesh2.0 -s 9
This runs LULESH 2.0, sampling the program counter every 100 microseconds. When the program terminates, a samples file is created in the current directory with the name format
<binary name>_<PID>.samples, for example:lulesh2.0_3076.samples. This file contains a list of the samples taken. The samples are the instruction address followed by the number of executions, for example:head lulesh2.0_3076.samples
To return:
0x402578 62 0x402580 51 0x406e60 22 0x406e5c 20 0x406e58 14 0x402570 14 0x406e64 12 0x406004 10 0x406214 10 0x406630 9
-
This format enables you to use GNU Linux tools like
addr2linewhich map instruction addresses to source, to understand program behavior.Using the
addr2funchelper script, which comes as part of the release, you can use the samples file to identify which functions were the hottest in the LULESH 2.0 run:addr2func lulesh2.0 lulesh2.0_3076.samples
To return:
CalcElemNodeNormals: 143 SumElemFaceNormal: 2 ApplyMaterialPropertiesForElems: 12 CalcEnergyForElems: 115 CalcPressureForElems: 311 CalcMonotonicQGradientsForElems: 3 CalcElemCharacteristicLength: 1 SQRT: 1 CalcHourglassControlForElems: 1 CalcForceForNodes: 1 IntegrateStressForElems: 1 CollectDomainNodesToElemNodes: 3 Domain::xd: 1 CalcElemFBHourglassForce: 1 UpdateVolumesForElems: 11 InitStressTermsForElems: 8 CalcFBHourglassForceForElems: 3 CalcMonotonicQRegionForElems: 2 ApplyAccelerationBoundaryConditionsForNodes: 2 FABS: 14 EvalEOSForElems: 103
The hottest function in the list isCalcPressureForElems; which was executed 311 times.
Note: the accuracy of the sampling profiler and, thus, the accurate performance measurement of programs, depends on their run time. The longer the run, the more accurate the numbers describing hot code.
Troubleshooting
In the event of a program crash, the operating system kernel creates a core dump file. The location and name of this core dump file depends on your
system's core dump configuration. If your configuration specifies that core dump filenames include the name of the crashed binary, note that this is the name of the
executable being emulated rather than the Arm Instruction Emulator binary name armie.
Core dump files should be sent to Arm support along with the output of armie --version. However, if you have confidentiality concerns
regarding sensitive data in the core dump file, do not send the core dump to Arm. Note that this may mean Arm cannot investigate your issue.
If you encounter problems running a binary with Arm Instruction Emulator, use the --debug option to run internal checks (assert calls) during
execution. If Arm Instruction Emulator finds an internal inconsistency it will stop executing and output a message to stderr
which you should send to Arm support, for example use
armie -msve-vector-bits=256 --debug ./example
To output:
example: ./src/sve_decode.h:93: aarch64_i_rsp_reg::aarch64_i_rsp_reg(unsigned int,
aarch64_i_rsp_reg::element_type): Assertion `reg_id < 32' failed.
Alternatively, to print output messages to an output file, include -o or --output in the command line input.
The --debug option also helps you identify the instructions that were executed by the emulator. The first column is the address of the instruction, the second is the instruction encoding and the third is the number of times the instruction was executed, for example:
0x400684: 0x043f57df 1 0x4006a0: 0x04bf5028 1 0x4006c8: 0x2538c000 1 0x4006cc: 0x25291fe0 1 0x4006d4: 0xe4084140 13 0x4006d8: 0x04285028 13 0x4006dc: 0x25291d00 13 0x4006ec: 0x25a91fe0 1 0x4006f4: 0xe58103a0 1 0x4006f8: 0x04a14500 13 0x4006fc: 0xe5484140 13 0x400700: 0x04b0e3e8 13 0x400704: 0x25a91d00 13 0x400740: 0x858103a1 1 0x40074c: 0x25b8c020 1 0x400758: 0x2598e3e0 1 0x40075c: 0xa5484521 13 0x400760: 0x04938001 13 0x400764: 0xe5484541 13 0x400768: 0x04b0e3e8 13 0x40076c: 0x25ab1d01 13 0x4007bc: 0x043f505f 1
For more information about getting help, see Contacting Arm Support.
Resources
For further information on running SVE code with Arm Instruction Emulator, see the following useful links: