Get Started

Arm Instruction Emulator (ARMIE) is an emulator that runs on AArch64 platforms and emulates Scalable Vector Extension (SVE) instructions. The emulator lets you develop SVE code without needing access to SVE-enabled hardware. This tutorial uses a series of simple examples to demonstrate how to compile SVE code, run the resulting executable and analyze runtime behavior with Arm Instruction Emulator.

Prerequisites

Installing Arm Instruction Emulator

Refer to Installing Arm Instruction Emulator for details on how to perform the installation on Linux.

Environment configuration

  1. To check which Environment Modules are available, enter:

    module avail

    Note: You may need to configure the MODULEPATH environment variable to include the installation directory:

    export MODULEPATH=$MODULEPATH:/opt/arm/modulefiles/
  2. Load the Arm Instruction Emulator module to make it available for use:

    module load <architecture>/<linux_variant>/<linux_version>/suites/arm-instruction-emulator/<version>

    For example:

    module load Generic-AArch64/SUSE/12/suites/arm-instruction-emulator/19.1
  3. Check your environment by examining the PATH variable. It should contain the appropriate Arm Instruction Emulator bin directory from /opt/arm:

     echo $PATH /opt/arm/arm-instruction-emulator-19.1_Generic-AArch64_SUSE-12_aarch64-linux/bin64:...
    

Compile and run a 'Hello World' program

In this example you will write a simple Hello World program in C, compile it, and then run it using Arm Instruction Emulator.

  1. Create a simple "Hello World" C program and save it as a file named hello.c.

    /* Hello World */

    #include <stdio.h>

    int main()
    {
    printf("Hello World\n");
    return 0;
    }
  2. To generate an executable binary, compile your program with Arm C/C++ Compiler.

    armclang -O3 -march=armv8-a+sve -o hello hello.c

    The -O3 flag ensures the highest optimization level with auto-vectorization enabled. The -march=armv8-a+sve flag targets hardware with Armv8-A architecture.

    Note: In this example, no SVE code is used. However, it is good practice to enable the highest level of auto-vectorization and target an SVE-enabled architecture when compiling any code to be run using Arm Instruction Emulator.
  3. Run the generated binary hello using Arm Instruction Emulator:

    armie -msve-vector-bits=256 ./hello
    Hello World

    For this simple Hello World example, Arm Instruction Emulator runs the code on an emulated SVE-enabled architecture without using SVE instructions.

    To use Arm Instruction Emulator to its full potential, that is, to emulate SVE instructions, we need to look at a more complex program. An example of a program containing SVE code is available in the next section of this tutorial.

Compile, vectorize and run a program with SVE code

This example demonstrates how to compile and vectorize some C code targeting the SVE-enabled Armv8-A architecture, and how to emulate running the SVE code using Arm Instruction Emulator.

  1. Create a new file called example.c, containing the following code:

    // example.c
    #include <stdio.h>
    #include <stdlib.h>
    
    #define ARRAYSIZE 1024
    int a[ARRAYSIZE];
    int b[ARRAYSIZE];
    int c[ARRAYSIZE];
    void subtract_arrays(int *restrict a, int *restrict b, int *restrict c)
    {
        for (int i = 0; i < ARRAYSIZE; i++)
        {
            a[i] = b[i] - c[i];
        }
    }
    
    int main() {
        for (int i = 0; i < ARRAYSIZE; i++)
        {
          // Generate a random number between 200 and 300
          b[i] = (rand() % 100) + 200;
          // Generate a random number between 0 and 100
          c[i] = rand() % 100;
        }
    
        subtract_arrays(a, b, c);
    
        printf("i \ta[i] \tb[i] \tc[i] \n");
        printf("=============================\n");
    
        for (int i = 0; i < ARRAYSIZE; i++)
        {
            printf("%d \t%d \t%d \t%d\n", i, a[i], b[i], c[i]);
        }
    
    }

    This C program subtracts corresponding elements in two arrays, writing the result to a third array. The three arrays are declared using the restrict keyword, indicating to the compiler that they do not overlap in memory.

  2. Compile the program:

    armclang -O3 -march=armv8-a+sve -o example example.c
  3. Run the binary with Arm Instruction Emulator:

    armie -msve-vector-bits=256 ./example

    This returns:

    i       a[i]    b[i]    c[i]
    =============================
    0       197     283     86
    1       262     277     15
    2       258     293     35
    ...
    1021    165     234     69
    1022    232     295     63
    1023    204     235     31

    The SVE architecture extension specifies an implementation-defined vector length. The -msve-vector-bits option lets you specify the vector length used by Arm Instruction Emulator. The vector length is a multiple of 128 bits, with a maximum of 2048 bits. Use the -mlist-vector-lengths option to list all valid vector lengths:

    armie -mlist-vector-lengths

    This returns:

    128 256 384 512 640 768 896 1024 1152 1280 1408 1536 1664 1792 1920 2048

Analyze a program with SVE code

When developing high performance programs, some form of runtime analysis is required to gain insights into their execution behavior. This enables developers to identify heavily used loops and instruction sequences so that improvements can be made to execution speed and memory access.

Arm Instruction Emulator is based on the DynamoRIO dynamic binary instrumentation tool platform (DBI) and allows developers to use DynamoRIO’s API to write instrumentation clients which run alongside the SVE emulation client to analyse SVE binaries at runtime.

Before looking at an example of an instrumentation client for emulated binaries using ArmIE, it is recommended for users to understand the basic principles of instrumenting binaries using the DynamoRIO API. See DynamoRIO’s API Usage Tutorial.

The following example demonstrates how to count native AArch64 as well as emulated SVE instructions. event_bb_analysis() is the function which counts instructions in the sample client /path/to/your/arm-instruction-emulator/samples/inscount_emulated.cpp.

    /* Count instructions */
    bb_counts.native_instrs = bb_counts.emulated_instrs = 0;
    bool is_emulation = false;
    for (instr = instrlist_first(bb); instr != NULL; instr = next_instr) {
        next_instr = instr_get_next(instr);

        if (drmgr_is_emulation_start(instr)) {                 ←[1]
            bb_counts.emulated_instrs++;
            is_emulation = true;
            /* Data about the emulated instruction can be extracted from the
             * start label using drmgr_get_emulated_instr_data().
             */
            emulated_instr_t emulated;
            drmgr_get_emulated_instr_data(instr, &emulated);    ←[2]
            dr_printf("SVE: %p\t", emulated.pc);
            int *sveinstr;
            sveinstr = ((int *)instr_get_raw_bits(emulated.instr));
            dr_printf("0x%08x\n", *sveinstr);

            continue;
        }
        if (drmgr_is_emulation_end(instr)) {                    ←[3]
            is_emulation = false;
            continue;
        }
        if (is_emulation)
            continue;
        if (!instr_is_app(instr))
            continue;
        bb_counts.native_instrs++;
    }

     /* Insert clean call */
     dr_insert_clean_call(drcontext, bb, instrlist_first_app(bb),
                          (void *)inscount, false /* save fpstate */, 2,
                          OPND_CREATE_INT64(bb_counts.native_instrs),
                          OPND_CREATE_INT64(bb_counts.emulated_instrs))

This function iterates over each instruction in a basic-block, incrementing bb_counts.native_instrs and bb_counts.emulated_instrs depending on whether the instruction is emulated or not. The way it distinguishes between emulated and native instructions is based on using DynamoRIO’s drmgr_is_emulation_start() , [1] and drmgr_is_emulation_end() , [3] functions.

drmgr_is_emulation_start() returns true to indicate that this instruction is the start of a sequence of instructions which are emulating an SVE instruction. Note that this instruction also contains data about the instruction being emulated, data which can be extracted using drmgr_get_emulated_instr_data() , [2], see below.

drmgr_is_emulation_end() returns true to indicate that this is the last instruction of a sequence of instructions which are emulating an SVE instruction.

Note

The reference documentation for these functions is not yet available at the DynamoRIO web site. See Emulation Functions Reference for a full description of these functions.

Use the drmgr_get_emulated_instr_data() function to extract useful information about the instruction being emulated: the PC address and the instruction encoding.

Running libinscount_emulated.so:

$ armie -msve-vector-bits=512 -i libinscount_emulated.so -- ./example_sve
Client inscount is running
SVE: 0x000000000040053c 0x04a0e3ef
SVE: 0x0000000000400554 0x04a14001
SVE: 0x000000000040055c 0x25aa1fe0
SVE: 0x0000000000400560 0x05a039e0
SVE: 0x0000000000400570 0xe5494101
SVE: 0x0000000000400574 0x04b0e3e9
SVE: 0x0000000000400578 0x04a00021
SVE: 0x000000000040057c 0x25aa1d20
SVE: 0x0000000000400570 0xe5494101
SVE: 0x0000000000400574 0x04b0e3e9
SVE: 0x0000000000400578 0x04a00021
SVE: 0x000000000040057c 0x25aa1d20
SVE: 0x00000000004005a8 0x25ac1fe0
SVE: 0x00000000004005b4 0xa5494100
SVE: 0x00000000004005b8 0xa54941a1
SVE: 0x00000000004005bc 0x85604140
SVE: 0x00000000004005c0 0x04a10000
SVE: 0x00000000004005c4 0xe5494160
SVE: 0x00000000004005c8 0x04b0e3e9
SVE: 0x00000000004005cc 0x25ac1d20
SVE: 0x00000000004005b4 0xa5494100
SVE: 0x00000000004005b8 0xa54941a1
SVE: 0x00000000004005bc 0x85604140
SVE: 0x00000000004005c0 0x04a10000
SVE: 0x00000000004005c4 0xe5494160
SVE: 0x00000000004005c8 0x04b0e3e9
SVE: 0x00000000004005cc 0x25ac1d20
120827 instructions executed of which 709 were emulated instructions
$

The example helper script /path/to/your/arm-instruction-emulator/bin64/enc2instr.py can be used to convert the encodings output by dr_printf("0x%08x\n", *sveinstr) to instruction mnemonics. This script shows use of the enc2instr() function and can be copied and modified for your own output transformations.

Invoking the -s/--show-drrun-cmd option shows how Arm Instruction Emulator used DynamoRIO’s drrun command to emulate and instrument the SVE binary: libsve_512.so is the SVE emulation client and libinscount_emulated.so is the instrumentation client.

$ armie -s -msve-vector-bits=512 -i libinscount_emulated.so -- ./example_sve
/path/to/armie/bin64/drrun -client /path/to/armie/lib64/release/libsve_512.so 0 “” -client /path/to/armie/samples/bin64/libinscount_emulated.so 1 “” -max_bb_instrs 32 -max_trace_bbs 4 -- ./example_sve
Client inscount is running
. . .

Related information