Arm CPU Architecture

Chris Shore, Arm

An introduction to multicore programming for Arm Cortex CPUs, big.LITTLE technology and NEON, which will show you how to extract the maximum performance from the Arm CPU architecture.

Arm Mali GPU Architecture

Sam Martin, Senior Principal Graphics Architect, Arm

An overview of the unique features of Arm Mali GPUs covering tile-based GPU architecture, designed to maximize performance with little power consumption.

Armv8-A overview

Chris Shore, Arm

An overview of Armv8-A, the 64-bit Arm architecture increasingly being adopted in mobile platforms. Covering major features, the relation to earlier architectures, a brief overview of the programmer’s model, instruction set, memory model, memory management, privilege model and exception architecture.

The A64 instruction set

Matteo Franchin, Staff Software Engineer, Arm

A comprehensive overview of the new A64 instruction set as supported by Armv8-A.
It will cover the scalar and SIMD register banks, data processing capability, memory access instructions as well as touching on the procedure call standards in use.

Migrating to 64-bit on Arm

Chris Shore, Arm

A look at some of the major activities involved in porting code to the AArch64 environment on an Armv8-A platform. Also touches on data widths, pointers, data handling, and call procedure standards and provides pointers to optimization strategies.

Multi-Core and big.LITTLE programming

Ed Plowman, Arm

An introduction to multi-core programming for Arm Cortex CPUs and big.LITTLE technology showing you how to extract maximum performance from the latest Arm systems. After covering how to get the best out of Arm NEON™ technology with the Ne10 library, there is a discussion on the tools and programming models available for the Armv8-A architecture which will help you prepare for the move to 64 bit.

Porting to Arm 64-bit

Chris Shore, Arm

This White Paper gives an introduction to porting existing code to the A64 instruction set supported by Armv8-A processors like the Cortex-A53 and Cortex-A57 from Arm. It will also be useful for those writing new code for these platforms.

Arm NEON optimization

Project Ne10

A library of the most commonly used functions that have been heavily optimized for Arm-based CPUs with NEON. These functions provide a consistent well tested behavior that can be easily incorporated into applications enabling developers to get the most out of the Arm V7/NEON without arduous assembly coding. Ne10 is usable as a ‘drop and go’ pre-built library or as a set of modular functions that can be incorporated in a more modular “pick and mix” form where binary size might be an issue.

Mali Optimization

Porting Unreal Engine 4 to Armv8

Ramin Zaghi, Arm

This tutorial illustrates the work and results of porting the Unreal Engine 4 to Armv8 architecture, allowing mobile game developers to move to 64-bit and its improved instruction set for their games. The example also covers important battery saving techniques and other Arm features and tools integrated into Unreal Engine 4.

Bandwidth efficient graphics with Arm Mali GPUs

Marius Bjørge, Arm

Modern mobile games use post processing effects in various ways and while the GPU itself is capable of doing this, the bandwidth available to the GPU is typically not.

A major strength of Mali is that a lot of operations can be performed on-chip without having to access external memory. For an application to run efficiently it is beneficial to try and keep the processing on-chip for as long as possible.

Arm has implemented extensions for OpenGL ES 2.0 and 3.0 to help reduce the requirement of accessing external memory. This presentation and white paper introduce these extensions as well as use-cases (deferred shading, order independent transparency, volume rendering, etc.).

Efficient rendering with Tile Local Storage

Marius Bjørge, Arm

With advances in bandwidth expected to be incremental for many years, mobile graphics must be tailored to work efficiently in a bandwidth-scarce environment. This is true at all levels of the hardware-software stack. We showed previously that deferred rendering could be made bandwidth efficient by exploiting the on-chip memory used to store tile framebuffer contents in many tile-based GPUs. We refer to this memory as Tile Local Storage (TLS).

In this presentation, we demonstrate the versatility and effectiveness of TLS with real world content. We show how key rendering challenges can be met efficiently by use of TLS, and present an updated extension that has cross-vendor support.

Performance analysis and optimization

Lorenzo Dal Col, Arm

This presentation introduces DS-5 Streamline, one of Arm’s key developer tools, and proceeds to explain how you can use it to recognize if your application is CPU bound, vertex bound or fragment bound.

Android on Arm

64-bit development on Android

Ramin Zaghi, Arm

How 64-bit support has changed Android, in particular ART and the way ART interacts with native code, and how those affect Application performance, size and execution. Also attempts to answer the critical question: Should I develop for 64-bit and if so, when?

Accelerate Apps and games for Android

Matt Du Puy, Principal Software Engineer, Arm

Mobile Apps require special design considerations that aren’t always clear and the number of tools to solve increasingly complex systems is limited. Fortunately Google, Arm and many others are developing analysis tools and solutions to these problems. Figure out if your app is CPU/GPU bound, I/O or memory constrained and find common efficiency issues. Discover simple ways to optimize your apps, from basic design parameters to open source projects that best utilize underlying mobile technology.