Overview
Batching and streaming are two methods of processing tokens in NLP applications, each with unique cost implications.
Batching
- Processes multiple tokens at once, reducing per-token costs.
- Can introduce latency as tokens are queued before processing.
Streaming
- Processes tokens as they arrive, minimizing latency.
- Can be more expensive per token due to constant processing.