Building Custom Analysis Instrumentation

Using the DynamoRIO API, you can change existing instrumentation clients or write your own from scratch. This tutorial will describe how to modify the instrumentation of an existing client for your own purposes and build and execute the modified client with Arm Instruction Emulator.

Before you begin

Procedure

  1. Use the following command to run Arm Instruction Emulator, with the pre-built instrumentation client, libopcodes_emulated.so. This client writes native AArch64 opcode counts to stdout and emulated counts to a file:

    $ armie -msve-vector-bits=128 -i libopcodes_emulated.so -- ./example

    This returns:

    Client opcodes_emulated is running                                               
    i       a[i]    b[i]    c[i]                                                      =============================                                                     0       197     283     86                                                        1       262     277     15  . . . 1022    232     295     63 1023    204     235     31
    Opcode execution counts in AArch64 mode:       34900 : bl       39725 : and       41232 : csel       44149 : ret       54344 : ldrb       68104 : cbnz       73037 : ldp       77676 : cbz       79184 : stp      100349 : sub      110960 : movz      126343 : str      144182 : bcond      171068 : subs      171899 : orr      183813 : add      234517 : ldr 7 unique emulated instructions written to undecoded.txt
    The file undecoded.txt contains:
              256 : 0xe54842e0
              256 : 0xa54842c1
              256 : 0xa54842a0
              256 : 0x25a91d00
              256 : 0x04b0e3e8
              256 : 0x04a10400
                1 : 0x25a91fe0
        

    We're going to modify this instrumentation client, so that it writes both native and emulated counts to stdout in a format which makes it easier to be parsed by scripts when running and collating output from a large number of applications, typically in an automated test environment.

    Note: To correctly modify the libopcodes_emulated.so client, you need to understand its existing implementation, opcodes_emulated.cpp. Refer to Structure of an Instrumentation Client for a detailed description of instrumentation client structure.

  2. Copy the opcodes_emulated.cpp file to a new file, opcodes_emulated_tut1.cpp and save it in the following location:

    /path/to/your/arm-instruction-emulator-<xx.y>_Generic-AArch64_<OS>_aarch64-linux/samples
  3. Edit opcodes_emulated_tut1.cpp to merge opcount() and record_emulated_inst() into one function:

                
              opcodes_emulated.cpp                    opcodes_emulated_tut1.cpp
    
      static void                            |  static void
      record_emulated_inst(uint code)        |  opcount(uint opcode, int is_emulated)
      {                                      |  {
          emulated[code]++;                  |      if (is_emulated == 0)
      }                                      |          count[opcode]++;
                                             |      else
      static void                            |          emulated[opcode]++;
      opcount(uint opcode)                   |
      {                                      |
          count[opcode]++;                   |
      }                                      |  }
        

  4. Update the dr_insert_clean_call() calls which insert opcount():

                    opcodes_emulated.cpp                                                                opcodes_emulated_tut1.cpp
    
      static dr_emit_flags_t                                                         |  static dr_emit_flags_t
      event_basic_block(void *drcontext, void *tag, instrlist_t *bb,                 |  event_basic_block(void *drcontext, void *tag, instrlist_t *bb,
                        bool for_trace, bool translating)                            |                    bool for_trace, bool translating)
      {                                                                              |  {
          instr_t *instr;                                                            |      instr_t *instr;
                                                                                     |
          for (instr = instrlist_first(bb);                                          |      for (instr = instrlist_first(bb);
               instr != NULL;                                                        |           instr != NULL;
               instr = instr_get_next(instr)) {                                      |           instr = instr_get_next(instr)) {
                                                                                     |
              if (drmgr_is_emulation_start(instr)) {                                 |          if (drmgr_is_emulation_start(instr)) {
                  is_emulation = true;                                               |              is_emulation = true;
                  emulated_instr_t emulated;                                         |              emulated_instr_t emulated;
                  drmgr_get_emulated_instr_data(instr, &emulated);                   |              drmgr_get_emulated_instr_data(instr, &emulated);
                  dr_insert_clean_call(drcontext, bb, instr,                         |              dr_insert_clean_call(drcontext, bb, instr,
                                       (void *)record_emulated_inst, false, 1,       |                  (void *)opcount, false, 2,
                                       OPND_CREATE_INT32(                            |                  OPND_CREATE_INT32(instr_get_raw_word(emulated.instr, 0)),
                                           instr_get_raw_word(emulated.instr, 0)));  |                  OPND_CREATE_INT(1));
              }                                                                      |          }
              if (drmgr_is_emulation_end(instr))                                     |          if (drmgr_is_emulation_end(instr))
                  is_emulation = false;                                              |              is_emulation = false;
              if (is_emulation)                                                      |          if (is_emulation)
                  continue;                                                          |              continue;
              if (!instr_is_app(instr))                                              |          if (!instr_is_app(instr))
                  continue;                                                          |              continue;
              dr_insert_clean_call(drcontext, bb, instr,                             |          dr_insert_clean_call(drcontext, bb, instr,
                                   (void *)opcount, false, 1,                        |                               (void *)opcount, false, 2,
                                   OPND_CREATE_INT32(instr_get_opcode(instr)));      |                               OPND_CREATE_INT32(instr_get_opcode(instr)),
                                                                                     |                               OPND_CREATE_INT(0));
          }                                                                          |      }
                                                                                     |
          return DR_EMIT_DEFAULT;                                                    |      return DR_EMIT_DEFAULT;
      }                                                                              |  } 
    

    Notice that by merging opcount() and record_emulated_inst() into one callback function, opcount(), the dr_insert_clean_call() functions which insert opcount() must now define 2 input parameters rather than one, and must pass 1 for emulated instructions and 0 for native instructions.

  5. Update event_exit() to write the emulated instruction data to stdout rather than a file:

                                   opcodes_emulated.cpp                                                       opcodes_emulated_tut1.cpp
    
      static void                                                                      |  static void
      event_exit(void)                                                                 |  event_exit(void)
      {                                                                                |  {
      #ifdef SHOW_RESULTS                                                              |  #ifdef SHOW_RESULTS
          char msg[(NUM_COUNT_SHOW + 2) * 80];                                         |      char msg[(NUM_COUNT_SHOW + 2) * 80];
          int len, i;                                                                  |      int len, i;
          size_t sofar = 0;                                                            |      size_t sofar = 0;
          /* First, sort the counts */                                                 |      /* First, sort the counts */
          uint indices[NUM_COUNT];                                                     |      uint indices[NUM_COUNT];
          /* Initialise indices */                                                     |      /* Initialise indices */
          for (i = 0; i < NUM_COUNT; i++)                                              |      for (i = 0; i < NUM_COUNT; i++)
              indices[i] = i;                                                          |          indices[i] = i;
          qsort(indices, NUM_COUNT, sizeof(indices[0]), compare_counts);               |      qsort(indices, NUM_COUNT, sizeof(indices[0]), compare_counts);
                                                                                       |
          len = dr_snprintf(msg, sizeof(msg) / sizeof(msg[0]),                         |      len = dr_snprintf(msg, sizeof(msg) / sizeof(msg[0]),
                            "Opcode execution counts in AArch64 mode:\n");             |                        "Opcode execution counts for AArch64 instructions:\n");
          DR_ASSERT(len > 0);                                                          |      DR_ASSERT(len > 0);
          sofar += len;                                                                |      sofar += len;
          for (i = OP_LAST - 1 - NUM_COUNT_SHOW; i <= OP_LAST; i++) {                  |      for (i = OP_LAST - 1 - NUM_COUNT_SHOW; i <= OP_LAST; i++) {
              if(count[indices[i]] != 0) {                                             |          if(count[indices[i]] != 0) {
                  len = dr_snprintf(msg + sofar, sizeof(msg) / sizeof(msg[0]) - sofar, |              len = dr_snprintf(msg + sofar, sizeof(msg) / sizeof(msg[0]) - sofar,
                                    "  %9lu : %-15s\n", count[indices[i]],             |                                "  %9lu : %-15s\n", count[indices[i]],
                                    decode_opcode_name(indices[i]));                   |                                decode_opcode_name(indices[i]));
                  DR_ASSERT(len > 0);                                                  |              DR_ASSERT(len > 0);
                  sofar += len;                                                        |              sofar += len;
              }                                                                        |          }
          }                                                                            |      }
          len = dr_snprintf(msg + sofar, sizeof(msg) / sizeof(msg[0]) - sofar,         |      len = dr_snprintf(msg + sofar, sizeof(msg) / sizeof(msg[0]) - sofar,
                "%u unique emulated instructions written to undecoded.txt\n",          |            "Instruction execution counts for %u emulated instructions:",
                 emulated.size());                                                     |             emulated.size());
          DR_ASSERT(len > 0);                                                          |      DR_ASSERT(len > 0);
          sofar += len;                                                                |      sofar += len;
          NULL_TERMINATE(msg);                                                         |      NULL_TERMINATE(msg);
          DISPLAY_STRING(msg);                                                         |      DISPLAY_STRING(msg);
      #endif /* SHOW_RESULTS */                                                        |  #endif /* SHOW_RESULTS */
          map<uint,long>::iterator iter;                                               |      map<uint,long>::iterator iter;
          multimap<long,uint>::reverse_iterator iter2;                                 |      multimap<long,uint>::reverse_iterator iter2;
                                                                                       |
          for(iter=emulated.begin(); iter!=emulated.end();++iter) {                    |      for(iter=emulated.begin(); iter!=emulated.end();++iter) {
              ranks.insert(make_pair(iter->second,iter->first));                       |          ranks.insert(make_pair(iter->second,iter->first));
          }                                                                            |      }
                                                                                       |
          for(iter2=ranks.rbegin(); iter2!=ranks.rend(); ++iter2) {                    |      for(iter2=ranks.rbegin(); iter2!=ranks.rend(); ++iter2) {
              fprintf(file, "%9lu : 0x%08x\n", iter2->first, iter2->second);           |          dr_printf("  %9lu : 0x%08x\n", iter2->first, iter2->second);
          }                                                                            |      }
                                                                                       |
          fclose(file);                                                                |
          emulated.clear();                                                            |      emulated.clear();
                                                                                       |
          if (!drmgr_unregister_bb_app2app_event(event_basic_block))                   |      if (!drmgr_unregister_bb_app2app_event(event_basic_block))
            DR_ASSERT(false);                                                          |        DR_ASSERT(false);
          drmgr_exit();                                                                |      drmgr_exit();
      }                                                                                |  }
    

    Download the files for opcodes_emulated.cpp and opcodes_emulated_tut1.cpp and compare them with a diff viewer to view the modifications in full.

  6. To build the modified client, add opcodes_emulated_tut1.cpp to /path/to/your/arm-instruction-emulator-<xx.y>_Generic-AArch64_<OS>_aarch64-linux/samples/CMakeLists.txt:

    . . . 
    add_sample_client(opcodes "opcodes.c" "drmgr;drreg;drx")
    add_sample_client(opcodes_emulated "opcodes_emulated.cpp" "drmgr;drreg")
    add_sample_client(opcodes_emulated_tut1 "opcodes_emulated_tut1.cpp" "drmgr;drreg")
    add_sample_client(stl_test "stl_test.cpp" "")
    . . .
  7. Run cmake. Note that the current version of ArmIE (18.4) requires that clients are built with GCC 7.1.0:

    cmake .

    This returns:

    -- The C compiler identification is GNU 7.1.0 
    -- The CXX compiler identification is GNU 7.1.0
    -- Check for working C compiler: /opt/arm/gcc-7.1.0_Generic-AArch64_SUSE-12_aarch64-linux/bin/cc
    -- Check for working C compiler: /opt/arm/gcc-7.1.0_Generic-AArch64_SUSE-12_aarch64-linux/bin/cc -- works
    -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features
    -- Detecting C compile features - done
    -- Check for working CXX compiler: /opt/arm/gcc-7.1.0_Generic-AArch64_SUSE-12_aarch64-linux/bin/c++
    -- Check for working CXX compiler: /opt/arm/gcc-7.1.0_Generic-AArch64_SUSE-12_aarch64-linux/bin/c++ -- works
    -- Detecting CXX compiler ABI info
    -- Detecting CXX compiler ABI info - done
    -- Detecting CXX compile features -- Detecting CXX compile features - done
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /path/to/your/arm-instruction-emulator-<xx.y>_Generic-AArch64_<OS>_aarch64-linux/samples
  8. Run make:

    make

    This returns:

    . . . 
    Scanning dependencies of target opcodes_emulated_tut1
    [ 7%] Building CXX object CMakeFiles/opcodes_emulated_tut1.dir/opcodes_emulated_tut1.cpp.o
    [ 9%] Linking CXX shared library bin/libopcodes_emulated_tut1.so
    Usage: pass to drconfig or drrun: -c /path/to/your/arm-instruction-emulator-<xx.y>_Generic-AArch64_<OS>_aarch64-linux/samples/bin/libopcodes_emulated_tut1.so
    [ 9%] Built target opcodes_emulated_tut1
    . . .
  9. Copy the built client from: 
    /path/to/your/arm-instruction-emulator-<xx.y>_Generic-AArch64_<OS>_aarch64-linux/samples/bin 
    to 
    /path/to/your/arm-instruction-emulator-<xx.y>_Generic-AArch64_<OS>_aarch64-linux/samples/bin64

    For example:

    cp bin/libopcodes_emulated_tut1.so ./bin64/ 
    file ./libopcodes_emulated_tut1.so ./libopcodes_emulated_tut1.so: ELF 64-bit LSB shared object, ARM aarch64, version 1 (SYSV), dynamically linked, not stripped
  10. Run the modified client. Now, the emulated instruction output is written to stdout and the undecoded.txt file is not created:

    armie -msve-vector-bits=128 -i libopcodes_emulated_tut1.so -- ./example 

    This returns:

    . . . 
    1022 232 295 63
    1023 204 235 31
    Opcode execution counts for AArch64 instructions:
    34900 : bl
    39725 : and
    41232 : csel
    44149 : ret
    54344 : ldrb
    68104 : cbnz
    73037 : ldp
    77676 : cbz
    79184 : stp
    100349 : sub
    110960 : movz
    126343 : str
    144182 : bcond
    171068 : subs
    171899 : orr
    183813 : add
    234517 : ldr
    Instruction execution counts for 7 emulated instructions:
    256 : 0xe54842e0
    256 : 0xa54842c1
    256 : 0xa54842a0
    256 : 0x25a91d00
    256 : 0x04b0e3e8
    256 : 0x04a10400
    1 : 0x25a91fe0

Results

Notice that the emulated instructions appear as raw encodings rather than mnemonics. This is a reflection of the current state of emulation support in the Public DynamoRIO API. Arm are working to improve such emulated instrumentation features and more comprehensive features will be available in the public API for future ArmIE releases.

Until then, as a workaround, a helper script is provided with ArmIE, enc2instr.py, which can be used to disassemble the encodings in your own post-processing scripts:

export LLVM_MC=/opt/arm/arm-hpc-compiler-18.4_Generic-AArch64_SUSE-12_aarch64-linux/llvm-bin/llvm-mc 
echo 0xe54842e0 | /path/to/your/arm-instruction-emulator-<xx.y>_Generic-AArch64_<OS>_aarch64-linux//bin64/enc2instr.py 0xe54842e0 : st1w {z0.s}, p0, [x23, x8, lsl #2]

Next steps

Related information