Watch the talk
Check the slides
Following the famous Netflix Culture
that encourages "Freedom and Responsibility", I use this talk to demonstrate how Data Scientists can use PipelineAI to safely deploy their ML / AI pipelines into production using live data.
Using live demos, I will show how to train, optimize, profile, deploy, and monitor high performance, distributed TensorFlow AI Models in production with Docker, Kubernetes, and GPUs.
I then optimize our TensorFlow models using various training-optimization techniques such as TensorFlow's Accelerated Linear Algebra (XLA) framework and JIT compiler for operator fusing, drop out, and batch normalization.
Next, I discuss some post-training model optimization techniques including TensorFlow's Graph Transform Tool for weight quantization, batch normalization folding, and layer fusing.
Last, I will demonstrate and compare various GPU-based, TensorFlow model-serving runtimes including TensorFlow Serving, TensorFlow Lite, and Nvidia's GPU-optimized TensorRT runtime.