You copied the Doc URL to your clipboard.

Exampleshowing the benefits of conditional instructions in A32 and T32code

Using conditional instructions rather than conditional branches can save both code size and cycles.

This example shows the difference between using branches andusing conditional instructions. It uses the Euclid algorithm forthe Greatest Common Divisor (gcd) to show howconditional instructions improve code size and speed.

In C the gcd algorithm can be expressed as:

int gcd(int a, int b){    while (a != b)      {        if (a > b)            a = a - b;        else            b = b - a;      }    return a;}

The following examples show implementations of the gcd algorithmwith and without conditional instructions.

Note

The detailed analysis of execution speed only applies to an ARM7™ processor. The code density calculations apply to all ARM® processors.

Example of conditional execution using branches inA32 code

This example is an A32 code implementation of the gcd algorithm.It achieves conditional execution by using conditional branches,rather than individual conditional instructions:

gcd     CMP      r0, r1        BEQ      end        BLT      less        SUBS     r0, r0, r1  ; could be SUB r0, r0, r1 for A32        B        gcdless        SUBS     r1, r1, r0  ; could be SUB r1, r1, r0 for A32        B        gcdend

The code is seven instructions long because of the numberof branches. Every time a branch is taken, the processor must refillthe pipeline and continue from the new location. The other instructionsand non-executed branches use a single cycle each.

The following table shows the number of cycles this implementation uses on an ARM7 processor when R0 equals 1 and R1 equals 2.

Table 7-4 Conditional branches only

R0: aR1: bInstructionCycles (ARM7)
12CMP r0, r11
12BEQ end1 (not executed)
12BLT less3
12SUB r1, r1, r01
12B gcd3
11CMP r0, r11
11BEQ end3
   Total = 13

Example of conditional execution using conditionalinstructions in A32 code

This example is an A32 code implementation of the gcd algorithmusing individual conditional instructions in A32 code. The gcd algorithmonly takes four instructions:

gcd        CMP      r0, r1        SUBGT    r0, r0, r1        SUBLE    r1, r1, r0        BNE      gcd

In addition to improving code size, in most cases this codeexecutes faster than the version that uses only branches.

The following table shows the number of cycles this implementation uses on an ARM7 processor when R0 equals 1 and R1 equals 2.

Table 7-5 All instructions conditional

R0: aR1: bInstructionCycles (ARM7)
12CMP r0, r11
12SUBGT r0,r0,r11 (not executed)
11SUBLT r1,r1,r01
11BNE gcd3
11CMP r0,r11
11SUBGT r0,r0,r11 (not executed)
11SUBLT r1,r1,r01 (not executed)
11BNE gcd1 (not executed)
   Total = 10

Comparing this with the example that uses only branches:

  • Replacing branches with conditional execution ofall instructions saves three cycles.
  • Where R0 equals R1, both implementations executein the same number of cycles. For all other cases, the implementationthat uses conditional instructions executes in fewer cycles thanthe implementation that uses branches only.

Example of conditional execution using conditionalinstructions in T32 code

You can use the IT instructionto write conditional instructions in T32 code. The T32 code implementationof the gcd algorithm using conditional instructions is similar tothe implementation in A32 code. The implementation in T32 code is:

gcd        CMP     r0, r1        ITE     GT         SUBGT   r0, r0, r1        SUBLE   r1, r1, r0        BNE     gcd

These instructions assemble equally well to A32 or T32 code.The assembler checks the IT instructions,but omits them on assembly to A32 code.

It requires one more instruction in T32 code (the IT instruction)than in A32 code, but the overall code size is 10 bytes in T32 code,compared with 16 bytes in A32 code.

Example of conditional execution code using branchesin T32 code

In architectures before ARMv6T2, there is no IT instructionand therefore T32 instructions cannot be executed conditionallyexcept for the B branch instruction.The gcd algorithm must be written with conditional branches andis similar to the A32 code implementation using branches, withoutconditional instructions.

The T32 code implementation of the gcd algorithm without conditionalinstructions requires seven instructions. The overall code sizeis 14 bytes. This figure is even less than the A32 implementationthat uses conditional instructions, which uses 16 bytes.

In addition, on a system using 16-bit memory this T32 implementationruns faster than both A32 implementations because only one memoryaccess is required for each 16-bit T32 instruction, whereas each32-bit A32 instruction requires two fetches.

Was this page helpful? Yes No