Dynamic Routing of Machine Learning Workloads with CNCF wasmCloud

Deployment of Machine Learning (ML) to production is notoriously difficult, made so by variations in models, engines, platforms, and networks. How can we deploy distributed ML in production across dissimilar devices from edge to cloud, make optimal use of available resources, and support practical considerations like blue/green testing, privacy preservation, and live updates?

In this talk, learn how to meet these challenges with WebAssembly and Cloud Native Computing Foundations (CNCF) wasmCloud, the distributed WebAssembly platform for portable business logic. Discover how you can make use of the open source machine learning capability provider with the open WASI-NN API to deploy a common code base, for use with inference engines like Tensorflow or ONNX, on embedded devices, LAN workstations, and the cloud. We will discuss how inference models can be dynamically and securely updated in the field, and discuss design decisions that have a direct impact on privacy, latency, throughput, and model accuracy.

The examples presented in the talk will be open source and available for use immediately by attendees. In the examples presented the machine learning models are seamlessly moved and leveraged across a distributed edge – from a variety of light weight ARM based single board computers, to a developers ARM based laptop, and finally to a large cloud hosted ARM based system running in a public cloud provider. We will discuss many architectural strategies for how distributed ML models may be used in collaboration and some of the guiding principles for architecting such models.