As Figure 13.2 shows, the register file is divided into four banks with eight registers in each bank for single-precision instructions and eight banks with four registers per bank for double-precision instructions. CDP instructions access the banks in a circular manner. Load and store multiple instructions do not access the registers in a circular manner but treat the register file as a linearly ordered structure.
The VFPv3 architecture adds 16 double-precision registers, making use of the additional register addressing bits currently used to specify single-precision registers. The first 16 registers, D0 through D15, in the NEON register file provides the same functionality as the register file defined in the VFPv2 architecture. VFPv3 adds 16 new double-precision registers, D16 through D31, which provides a second set of 16 double-precision registers. These registers behave in vector mode in an identical manner to the lower 16 registers, with bank 4 specified as registers D16-D19, bank 5 specified as registers D20-D23, bank 6 specified as registers D24-D27, and bank 7 specified as D28-D31. Bank 4 of the second set of registers has the same characteristics when used in short vector instructions as bank 0 of the first set of registers.
Short vector operations on double-precision data support vector lengths of two through four iterations. The additional registers provides the capability to double-buffer double-precision operations in a similar way as is available for single-precision operations.
See the ARM Architecture Reference Manual for more information on VFP addressing modes.
A short vector CDP operation that has a source or destination vector crossing a bank boundary wraps around and accesses the first register in the bank.
Example 13.1 shows the iterations of the following short vector add instruction:
FADDS S11, S22, S31
In this instruction, the LEN field contains b101, selecting a vector length of six iterations, and the STRIDE field contains b00, selecting a vector stride of one.
See Floating-Point Status and Control Register, FPSCR for details of the LEN and STRIDE fields and the FPSCR Register.
FADDS S11, S22, S31 ; 1st iteration
FADDS S12, S23, S24 ; 2nd iteration. The 2nd source vector wraps around
; and accesses the 1st register in the 4th bank
FADDS S13, S16, S25 ; 3rd iteration. The 1st source vector wraps around
; and accesses the 1st register in the 3rd bank
FADDS S14, S17, S26 ; 4th iteration
FADDS S15, S18, S27 ; 5th iteration
FADDS S8, S19, S28 ; 6th and last iteration. The destination vector
; wraps around and writes to the 1st register in the
; 2nd bank