Topics:
gpu
pipeline.ai
machine learning
artificial intelligence
intelligent infrastructure
science ops
ai ops
tensorflow
streaming
model training
model serving
inference
predictive analytics
Following the famous
Netflix Culture that encourages "Freedom and Responsibility", I use this talk to demonstrate how Data Scientists can use PipelineAI to safely deploy their ML / AI pipelines into production using live data.
Using live demos, I will show how to train, optimize, profile, deploy, and monitor high performance, distributed TensorFlow AI Models in production with Docker, Kubernetes, and GPUs.
I then optimize our TensorFlow models using various training-optimization techniques such as TensorFlow's Accelerated Linear Algebra (XLA) framework and JIT compiler for operator fusing, drop out, and batch normalization.
Next, I discuss some post-training model optimization techniques including TensorFlow's Graph Transform Tool for weight quantization, batch normalization folding, and layer fusing.
Last, I will demonstrate and compare various GPU-based, TensorFlow model-serving runtimes including TensorFlow Serving, TensorFlow Lite, and Nvidia's GPU-optimized TensorRT runtime.
Watch the talk
Check the slides