Putting our code together

We can now put all of the code together. To do this, we modify the MainActivity.stringFromJNI method in the following code.

extern "C" JNIEXPORT jstring JNICALL
MainActivity.stringFromJNI (
    	JNIEnv* env,
    	jobject /* this */) {
 
	// Ramp length and number of trials
	const int rampLength = 1024;
	const int trials = 10000;
 
	// Generate two input vectors
	// (0, 1, ..., rampLength - 1)
	// (100, 101, ..., 100 + rampLength-1)
	auto ramp1 = generateRamp(0, rampLength);
	auto ramp2 = generateRamp(100, rampLength);
 
	// Without NEON intrinsics
	// Invoke dotProduct and measure performance
	int lastResult = 0;
 
	auto start = now();
	for(int i = 0; i < trials; i++) {
    		lastResult = dotProduct(ramp1, ramp2, rampLength);
	}
	auto elapsedTime = msElapsedTime(start);
 
	// With NEON intrinsics
	// Invoke dotProductNeon and measure performance
	int lastResultNeon = 0;
 
	start = now();
	for(int i = 0; i < trials; i++) {
    		lastResultNeon = dotProductNeon(ramp1, ramp2, rampLength);
	}
	auto elapsedTimeNeon = msElapsedTime(start);
 
	// Clean up
	delete ramp1, ramp2;
 
	// Display results
	std::string resultsString =
        	"----==== NO NEON ====----\nResult: " + to_string(lastResult)
        	+ "\nElapsed time: " + to_string((int)elapsedTime) + " ms"
        	+ "\n\n----==== NEON ====----\n"
        	+ "Result: " + to_string(lastResultNeon)
        	+ "\nElapsed time: " + to_string((int)elapsedTimeNeon) + " ms";
 
	return env->NewStringUTF(resultsString.c_str());
}

The MainActivity.stringFromJNI method proceeds as follows:

Create two equal-length vectors using generateRamp methods.

Calculate the dot product of those vectors using the non-Neon method dotProduct. Repeat this calculation several times (trials constant) and measure the computation time using msElasedTime.

Perform the same operations as in Step 1 and Step 2, but now using the Neon-enabled method dotProductNeon.

Combine the results of those two methods along with the computation times within the resultsString. The latter is displayed in the TextView. To build and run the preceding code successfully, you need an Arm-v7-A or Armv8-A device. The following image shows the improvements that Neon Intrinsics can bring to an application.

Neon off

Using built-in intrinsics provided a seven percent improvement in elapsed time. A theoretical improvement of 25 percent could be achieved on Arm 64 devices.

Previous Next