Tutorials

Playbooks

Tutorials

Long-form guides optimized for engineers shipping GenAI features responsibly.

15 min read

Optimizing Golden-Set Design for RAG in Healthcare Applications

This tutorial covers the design of golden sets for ensuring RAG (Retrieval-Augmented Generation) faithfulness in healthcare applications. It requires an understanding of RAG principles and access to domain-specific datasets.

Updated today

RAGHealthcareGolden SetData Annotation

10 min read

Multimodal Prompts for Document QA in Legal Settings

Using multimodal prompts can improve document question answering (QA) in legal contexts. Prerequisites include access to relevant legal documents and a model capable of processing multimodal inputs.

Updated today

multimodal promptsdocument QAlegal settingsAI

11 min read

Reducing Hallucinations with Citation Constraints in Research Models

Implementing citation constraints can significantly reduce hallucinations in research-oriented models. Prerequisites include a robust database of citations and a model capable of handling constraints.

Updated today

hallucinationscitation constraintsresearch modelsAI reliability

9 min read

Latency Budgets for Streaming Chat UX in Customer Support

Establishing latency budgets can enhance the user experience in customer support chat applications. Prerequisites include understanding user expectations and system capabilities.

Updated today

latency budgetscustomer supportchat applicationsuser experience

12 min read

Embedding Drift Monitoring in Financial Services

Monitoring embedding drift is crucial for financial services to ensure model accuracy over time. Prerequisites include a data pipeline that captures embeddings and a monitoring framework.

Updated today

embedding driftfinancial servicesmonitoringmachine learning

10 min read

Shadow Traffic for Safe Model Rollouts in E-commerce

Implementing shadow traffic allows e-commerce platforms to test new models against live traffic without affecting user experience. Prerequisites include a robust logging mechanism and a dual model setup.

Updated today

shadow trafficmodel rolloute-commerceA/B testing

10 min read

Offline vs Online Evaluation Frequency

This tutorial explores the differences between offline and online evaluation methods for machine learning models, focusing on their respective benefits and drawbacks. Prerequisites include a basic understanding of machine learning evaluation metrics and experience with model deployment.

Updated today

evaluationmachine learningofflineonline

20 min read

Pgvector Index Tuning (HNSW vs IVF)

Learn how to tune pgvector indexes using HNSW and IVF algorithms for optimal performance. Prerequisites include familiarity with PostgreSQL and vector databases.

Updated today

pgvectorindex tuningHNSWIVF

11 min read

Multimodal Prompts for Document QA

Explore how to create effective multimodal prompts for document question answering (QA) systems. Prerequisites include understanding of multimodal models and QA frameworks.

Updated today

multimodaldocument QAprompt engineering

18 min read

SLI/SLO for Generative Endpoints

Establishing Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for generative endpoints is crucial for maintaining quality and reliability. This tutorial outlines how to define and implement SLIs/SLOs effectively.

Updated today

SLISLOgenerative models

12 min read

Offline vs Online Eval Frequency

This tutorial discusses the trade-offs between offline and online evaluation frequencies for machine learning models, focusing on their impact on model performance and user experience.

Updated today

evaluationofflineonline

18 min read

Planner–Executor Loops and Failure Recovery

This tutorial explains the planner-executor loop in AI systems and how to implement effective failure recovery strategies. Prerequisites include knowledge of AI planning algorithms and system design.

Updated today

AIsystem designfailure recovery

20 min read

Agent Memory: Scratchpad vs Vector Store

This tutorial compares scratchpad memory and vector store memory in AI agents, focusing on their use cases and performance characteristics. Prerequisites include a basic understanding of AI memory architectures.

Updated today

AImemory architectureperformance comparison

15 min read

Runbooks When Quality Regresses Overnight

This tutorial outlines how to create effective runbooks to address overnight quality regressions in software systems. Prerequisites include familiarity with incident management and basic scripting skills.

Updated today

runbooksincident managementquality assurance

16 min read

Canary Prompts for Regression Detection

Utilizing canary prompts to detect regressions in language models. Prerequisites include familiarity with regression testing and LLM evaluation metrics.

Updated today

regression testingcanary promptsLLM

22 min read

Prompt Injection Defenses in Multi-Tenant Apps

Developing strategies to protect multi-tenant applications from prompt injection attacks. Prerequisites include understanding of security vulnerabilities and multi-tenant architecture.

Updated today

securitymulti-tenantprompt injection

18 min read

Observability: Traces for LLM + Tool Spans

Implementing observability practices to trace interactions between large language models (LLMs) and external tools. Prerequisites include knowledge of observability tools and LLM architectures.

Updated today

observabilityLLMtracing

20 min read

Sandboxing Tools with Least Privilege

Implementing sandboxing techniques to limit tool access and enhance security. Prerequisites include familiarity with security protocols and system architecture.

Updated today

sandboxingsecurityleast privilege

15 min read

Human-in-the-Loop for High-Stakes Actions

Integrating human oversight in automated systems to ensure accuracy and accountability in critical scenarios. Prerequisites include understanding of automation frameworks and risk management principles.

Updated today

human-in-the-loopautomationrisk management

20 min read

Graph RAG for Entity-Heavy Domains

Explore the use of Graph Retrieval-Augmented Generation (RAG) for domains with complex entities, requiring knowledge of graph databases and RAG techniques.

Updated today

RAGgraph databasesentity recognition

10 min read

Hybrid Search: BM25 + Dense Re-Ranking

This tutorial explores the integration of BM25 and dense re-ranking techniques to enhance search accuracy. Prerequisites include familiarity with information retrieval concepts and basic machine learning.

Updated today

searchBM25dense re-ranking

6 min read

PII Handling in Retrieval Pipelines

Effective handling of Personally Identifiable Information (PII) is essential in retrieval systems to ensure compliance and user trust. Prerequisites include knowledge of data privacy regulations and retrieval system architecture.

Updated today

data privacyretrieval systemsPII compliance

6 min read

Cost Controls: Batching vs Streaming Tokens

Understanding the trade-offs between batching and streaming token processing can optimize costs in NLP applications. Prerequisites include familiarity with tokenization and processing pipelines.

Updated today

cost optimizationNLPtoken processing

18 min read

Golden-Set Design for RAG Faithfulness

Understand how to design a golden set for evaluating the faithfulness of Retrieval-Augmented Generation (RAG) models. Prerequisites include familiarity with RAG systems and evaluation metrics.

Updated today

RAGevaluationbenchmarking

Publish a tutorial