How to statically quantize a PyTorch model (Eager mode)

Oscar Savolainen 1,970 1 year ago

Video Not Working? Fix It Now

If you need help with anything quantization or ML related (e.g. debugging code) feel free to book a 30 minute consultation session! https://calendly.com/oscar-savolainen I'm also available for long-term freelance work, e.g. for training / productionizing models, teaching AI concepts, etc. Video Summary: In this video, we go over the theory of how to statically quantize a PyTorch model in Eager mode. Timestamps: 00:00 Intro 03:05 Required Architecture Changes (QuantStubs/ DeQuantStubs/ FloatFunctionals) 08:54 Fusing modules 12:18 Assignment of QConfigs (recipe for quantization for each module), 15:26 Preparing the model for quantization (i.e. making the model fake-quantizable), 20:25 Converting the model to a "true" quantized int8 model. 23:06 Conclusion For more background on what it means to quantize a tensor, see: https://www.youtube.com/watch?v=rzMs-wKQU_U&feature=youtu.be Links (PyTorch documentation): - Quant/DeQuant stub definition: https://pytorch.org/docs/stable/_modules/torch/ao/quantization/stubs.html - FloatFunctionals definition: https://pytorch.org/docs/stable/generated/torch.ao.nn.quantized.FloatFunctional.html - QConfig: https://pytorch.org/docs/stable/generated/torch.ao.quantization.qconfig.QConfig.html - `prepare_qat`: https://pytorch.org/docs/stable/generated/torch.ao.quantization.prepare_qat.html -Converting the model: https://pytorch.org/docs/stable/generated/torch.ao.quantization.convert.html

Comment