Semihosting with Arm Cortex-A53 and Cortex-A57
Arm Compiler 6
The information presented here is relevant for the Arm Compiler 6, which is used to compile programs for the AArch64 execution state and is provided in DS-5 Ultimate Edition. Programs compiled for the AArch32 execution state for the Arm Cortex-A53 and Cortex-A57 processors behave differently because the C library is different.
Semihosting Overview
There are two primary benefits of semihosting:
- Using semihosting saves simulation cycles compared to traditional output methods such as UART, because with semihosting, I/O functions are intercepted at the CPU and transactions to hardware peripherals are bypassed.
- Semihosting supports accessing host computer resources, such as writing to and reading from the host file system without the need to create special simulation models that provide this feature.
Semihosting Challenges
Using the C library provided by the Arm C compiler for Armv8 can result in behavior that differs from the C library’s behavior with Armv7. The challenges revolve around multi-core software in combination with semihosted functions like printf() or fprintf(). Under certain conditions, using these functions can result in corrupted memory and scrambled output messages. It should be noted that there is actually nothing wrong with the C library, but it takes some understanding of multi-core programming to obtain the desired results.
In the past, bare-metal software programs provided with CPAKs were constructed to run the same instructions (.axf file) from the same memory location on multiple cores. During initialization, the software included some checks to determine which core it was running on and made provisions for certain things, like making sure each CPU had a separate stack. This enabled multiple cores to run the same code at the same time without interfering with each other. Previously this worked fine, but when using the Arm Compiler for 64-bit programs scrambled output messages and memory corruption occurred.
The impact of the C library on software running in different environments can be summarized as follows:
Software impacted by the behavior of the C library:
- Multi-core systems running the same .axf file from the same memory space
Software NOT impacted by the behavior of the C library:
- Single core systems
- Multi-core systems running software programs out of different memory spaces
Understanding the Arm C Library
C functions like fprintf() cannot be executed by multiple cores at the same time. The first call to a function like printf() causes an underlying malloc() call which provides memory to be used by the library. Subsequent calls don’t allocate any new memory, which implies the memory is reused from call to call. Another note to keep in mind is that the C library has no way to determine which core the code is running on, and is assumed to take no special precautions for the multi-core case.
Calls to printf() that occur before another core has exited the printf() call chain commonly result in memory corruption and scrambled output data.
Therefore for a multi-core program running from the same memory space, entering a function such as fprintf() while another core is running this function carries a high risk of memory corruption.
Care must also be taken with the C library initialization. During the execution sequence between the startup assembly code and the C main() function there are a number of library initialization steps that occur. Examples are initializing file descriptors for stdout, stdin, and stderr and initializing global variables in the bss section of the program. Corruption can also occur when multiple cores run the library initialization at the same time.
Alternative Solutions
There are a number of ways to avoid memory corruption when using semihosted functions. This section provides some alternatives and describes the pros and cons so users can select the best solution for each application.
- Designate a Single CPU for Semihosted Function Calls
One way to avoid entering the C library from multiple cores is to limit the I/O functions to a single core. Use the Arm cluster id and core id to limit the cores that will call semihosted I/O functions to just a single core. This removes the possibility of memory corruption, but requires extra code to identify the running core, and may require other software changes to get the desired output. Performance impact is minimal. - Use Different File Descriptors
Another alternative is to use different file descriptors for each CPU. For example, use stdout for the first CPU and stderr for the next CPU in a 2-core system. This can also be done using fopen() to write to a log file using semihosting. For example, open a file named cpu0.log for the first CPU, and a different file named cpu1.log for the next CPU. This strategy improves the odds of avoiding memory corruption, but is not guaranteed to eliminate the problem. Corruption will occur with this strategy if the I/O is frequent. This approach does require some software changes, and performance impact is minimal, but this solution is not recommended because it relies on timing to avoid corrupted output. - Skip Semihosting and Use UARTs
Another alternative is to skip semihosting altogether and use UARTs for I/O. This means losing the functionality of accessing the host file system, and the performance benefit of semihosting; however, the gain is that no corruption has been seen when each core uses a different UART for I/O. - Mutexes to protect Reentrancy
The general solution to allow the use of printf() from any core is to use a mutex lock and mutex unlock around the printf() calls so that other cores are blocked from entering while a call is in progress. The mutex lock and unlock functions can be implemented as small assembly language functions. Using mutexes will guarantee that no corruption occurs, but this approach has a performance impact because one or more cores will be blocked from any progress while waiting to perform I/O functions. This blocking may have an impact on system performance analysis.
Change the Startup Software Architecture
Although mutexes do a good job avoiding reentrancy of functions, they don’t help with the problem of multiple cores running the C library initialization, __main. SoC Designer provides a solution which involves restructuring the bare-metal startup software to initialize the C library only once. This is done by a single CPU, known as the "primary" CPU, and all other CPUs, known as "secondary" CPUs, skip this step to avoid repeating the tasks done in C library initialization. Different cores have access to initialization that was done on a different core, so a semihosted function like printf() may run on a CPU that has not done the C library initialization.
All Cortex-A53 and Cortex-A57 bare metal CPAKs have been updated to use the new software architecture. The same structure is used by all Armv8 bare metal examples provided by DS-5.
The structure of the code is shown in the figure below from one of the DS-5 examples.
Only the primary CPU goes through the __main() function and reaches main(). The secondary cores wait to be released and then jump directly to main_app() once the primary core is finished with initialization. All cores run the same code starting from main_app().
- The holding pen may be implemented as a memory location which is read by the secondary CPUs and written by the primary CPU. This is used in simple CPU plus memory systems. The holding pen may also be implemented as an interrupt in systems where an interrupt controller is available and interrupts from the primary core can be sent to all of the secondary cores. This is represented by the SendSGI() call in the diagram above.
- To implement the diagram using a memory location the startup code branches to different functions for primary and secondary cores as shown below.
- The primary CPU will continue on and call __main() and then reach main(). The secondary cores will wait until the holding_pen variable has been changed by the primary core as shown below.
- When the primary core reaches main it will release the secondary cores to run using the code below.
With this software architecture and SoC Designer semihosting works from any core. Using this architecture all of the semihosted printf() calls will appear on the same window in SoC designer, not in the separate windows that were associated with each core as in the past. It is still important to use mutexes around printf() calls or one of the other techniques described above to avoid corrupted output.
Semihosting with Arm Compiler 6
Using Arm Compiler 6 has some differences related to avoiding the use of semihosting when a UART is used instead. For armcc, there is a global symbol __use_no_semihosting which can be added to a single assembly file using the statement IMPORT __use_no_semihosting or to a single C file using #pragma import(__use_no_semihosting). This prevents linking with any libraries which contain semihosting.
For armclang, the symbol name is the same, but the format to specifying symbols has changed. To define the __use_no_semihosting symbol in a C file compiled with armclang use the code below.
asm(" .global __use_no_semihosting\n");
Summary
The details of using semihosting in the AArch64 state with the Arm C library has been described and multiple alternatives presented. The two main issues are C library initialization and re-entrancy of C library functions. The Cortex-A53 and Cortex-A57 CPAKs contain useful examples of semihosting. The most general purpose solution is to update to the latest SoC Designer and use primary/secondary startup code for one time initialization of the C library and mutexes around C library functions such as printf(). If you are not sure whether corruption is caused by the issues described, use the SoC Designer Software Profiling features to look at the function trace and identify situations where multiple cores have entered I/O functions.
This article was originally written as a blog by Jason Andrews.