Overview
Quantization reduces model size and increases inference speed but can affect accuracy.
Key Considerations
- Types of quantization: post-training, quantization-aware training
- Trade-offs between speed and accuracy
Implementation Steps
- Choose quantization method
- Evaluate retrieval quality pre and post-quantization