Program conventions
Macros
In order to use the intrinsics the Advanced SIMD architecture must be supported, and some specific instructions may or may not be enabled in any case. When the following macros are defined and equal to 1, the corresponding features are available:
-
__ARM_NEON
- Advanced SIMD is supported by the compiler
- Always 1 for AArch64
-
__ARM_NEON_FP
- Neon floating-point operations are supported
- Always 1 for AArch64
-
__ARM_FEATURE_CRYPTO
- Crypto instructions are available.
- Cryptographic Neon intrinsics are therefore available.
-
__ARM_FEATURE_FMA
- The fused multiply-accumulate instructions are available.
- Neon intrinsics which use these are therefore available.
This list is not exhaustive and further macros are detailed in the Arm C Language Extensions document.
Types
There are three major categories of data type available in arm_neon.h
which follow these patterns:
baseW_t
- Scalar data types
baseWxL_t
- Vector data types
baseWxLxN_t
- Vector array data types
Where:
base
refers to the fundamental data type.W
is the width of the fundamental type.L
is the number of scalar data type instances in a vector data type, for example an array of scalars.N
is the number of vector data type instances in a vector array type, for example a struct of arrays of scalars.
Generally W
and L
are such that the vector data types are 64 or 128 bits long, and so fit completely into a Neon register. N
corresponds with those instructions which operate on multiple registers at once.
In our earlier code we encountered an example of all three:
uint8_t
uint8x16_t
uint8x16x3_t
Functions
As per the Arm C Language Extensions, the function prototypes from arm_neon.h
follow a common pattern. At the most general level this is:
ret v[p][q][r]name[u][n][q][x][_high][_lane | laneq][_n][_result]_type(args)
Be wary that some of the letters and names are overloaded, but in the order above:
ret
- the return type of the function.
v
- short for
vector
and is present on all the intrinsics. p
- indicates a pairwise operation. (
[value]
meansvalue
may be present). q
- indicates a saturating operation (with the exception of
vqtb[l][x]
in AArch64 operations where theq
indicates 128-bit index and result operands). r
- indicates a rounding operation.
name
- the descriptive name of the basic operation. Often this is an Advanced SIMD instruction, but it does not have to be.
u
- indicates signed-to-unsigned saturation.
n
- indicates a narrowing operation.
q
- postfixing the name indicates an operation on 128-bit vectors.
x
- indicates an Advanced SIMD scalar operation in AArch64. It can be one of
b
,h
,s
ord
(that is, 8, 16, 32, or 64 bits). _high
- In AArch64, used for widening and narrowing operations involving 128-bit operands. For widening 128-bit operands,
high
refers to the top 64-bits of the source operand(s). For narrowing, it refers to the top 64-bits of the destination operand. _n
- indicates a scalar operand supplied as an argument.
_lane
- indicates a scalar operand taken from the lane of a vector.
_laneq
indicates a scalar operand taken from the lane of an input vector of 128-bit width. (left | right
means onlyleft
orright
would appear). type
- the primary operand type in short form.
args
- the function's arguments.