How many JTAG TCK cycles are required to create a transaction on an ADIv5-based MEM-AP?
Article ID: 212212958
Published date: 06 Jun 2018
Last updated: -
Applies to: CoreSight Debug and Trace
How many JTAG TCK cycles are required to create a transaction on an ADIv5-based Memory Access Port (MEM-AP), for example AHB-AP or AXI-AP?
This article is of interest to anyone examining the timing of debug transactions that are created within a CoreSight (Debug Access Port) DAP based on the ADIv5 specifications. This could be JTAG activity which is creating AMBA bus transfers on APB-AP, AHB-AP, or AXI-AP.
There are many things that can influence the number of TCK cycles, but typically it is in the range 100-300.
The following is an analysis of cycles required to create a bus transaction using the 32-bit AHB Access Port (AHB-AP) module:
(1) ~10 TCKs: Scan in DPACC instruction (4'b1010) to the JTAG-DP IR.
(2) ~40 TCKs: Scan in 35-bit pattern to write JTAG-DP AP select register
(SELECT), e.g. to select AHB-AP register bank 0 (is 0 in CSDK DAP).
(3) ~10 TCKs: Scan in APACC instruction (4'b1011) to the JTAG-DP IR.
(4) ~40 TCKs: Scan in 35-bit pattern to write AHB-AP Transfer Address
Register (TAR) to indicate AHB address to be accessed.
(5) ~40 TCKs: For AHB write, scan in 35-bit pattern to write AHB-AP Data
Read/Write Register (DRW) to indicate AHB data to be written, this will
initiate an AHB-AP write transfer. For AHB read, scan in 35-bit pattern
to read AHB-AP DRW, which will initiate an AHB-AP read transfer.
[To return read data to the debugger for the AHB read case, the following steps are required]
(6) ~10 TCKs: Scan in DPACC instruction (4'b1010) to the JTAG-DP IR.
(7) ~40 TCKs: Scan in 35-bit pattern to perform "dummy access" to scan
out value returned from previous AHB-AP transfer.
These numbers were generated in a RTL simulation running CoreSight tests on CoreSight RTL components. So as an ideal baseline, that is ~140 TCK cycles (Steps 1-5) for an AHB write, or ~190 TCK cycles (Steps 1-7) for an AHB read which returns read data to the debugger.
Now there are several factors that can change these numbers:
Original AHB-AP was 32-bit address, 32-bit data. AXI-AP can be 64-bit. If you're using a 64-bit AXI-AP, then you're looking at least +40 cycles for 64-bit address (TAR) and at least +40 cycles for 64-bit data (DRW).
As part of each "Scan in 35-bit pattern..." in above sequence, an implicit polling step is required. First 3-bits scanned out (when scanning 35-bits in) is an ACK response. Best case (which is shown above) is that ACK==OKAY always. If AP is busy processing previous transaction, then you can encounter ACK==WAIT which means debugger must spend cycles polling until ACK=OKAY is received. This is more likely to happen if AHB/AXI transaction times are 'long', i.e. many cycles instead of one/few cycles. Impact impossible to quantify -- depends on system.
When polling for ACK response, if debugger routine is not optimized, it will scan out entire 35-bit scan chain to test 3-bit value, when only 3-bit scan is required to test 3-bit value. I've seen a few instances of debug vendors doing this. Impact impossible to quantify -- depends on debugger and system.
If you're using JTAG-DP 8-bit IR length option (4-bit is default) then that will add 4 cycles to each JTAG-DP IR scan, which is at least two in the above sequence.
Debugger checks for transaction errors by reading sticky-error bits in DP CTRL/STAT register (~40 TCKs). Might not occur after every access, but after a series of accesses.
Optimization: If current access is using same AP as previous access, then no need to reprogram SELECT register. Skip steps 1-2. Save 50 cycles.
Optimization: For sequential accesses, "auto incrementing" mode is available which eliminates requirement to reprogram Transfer Address Register (TAR). Skip step 4. Save 40 cycles.
Optimization: For accesses within 16 bytes address location, "banked data" registers may be used, eliminating requirement to reprogram Transfer Address Register (TAR). Skip step 4. Save 40 cycles.
Therefore, considering this situation:
The best case is, if the debugger is using auto-increment mode to write a sequential block of memory, and the AMBA interface is zero-wait-state, then each 32-bit write could happen in something like 40-50 TCK cycles. Essentially this is just repeated writes to a 32-bit DRW register as long as ACK==OKAY.
The worst case is, if the debugger is switching to a new AP or doesn't keep track of the previously accessed AP (it means we must reprogram SELECT), writing a non-sequential/non-adjacent location or doesn't keep track of previous location (it means we must reprogram TAR), and is writing 64-bit data (it means we must scan in more data), then this could easily approach 300 TCK cycles.