Close

AI Basics - LLM Output Controls: Temperature, Top-p & Top-k

[Last Updated: Jan 6, 2026]

Learn how to control AI responses using three key settings that control creativity and predictability.

1. Temperature: The Creativity Dial

A setting that controls how creative vs. predictable the AI's responses are.

Low Temperature (0.1 - 0.3)

  • More focused and consistent responses
  • Less creative, more predictable outputs
  • Gives similar answers to similar questions
  • Best for: Code generation, data analysis, factual Q&A, technical documentation

High Temperature (0.7 - 0.9)

  • More creative and varied responses
  • Sometimes unpredictable but interesting
  • Generates different responses each time
  • Best for: Creative writing, brainstorming, marketing copy, storytelling

Simple Analogy

Think of temperature like a chef following a recipe:

  • Low temperature (0.2): Chef follows the recipe exactly every time
  • Medium temperature (0.7): Chef follows the recipe but adds personal touches
  • High temperature (0.9): Chef experiments freely with ingredients

2. Top-p: The Quality Filter

Also called "nucleus sampling" - this setting selects only from the most likely words, filtering out low-quality options in output response.

How Top-p Works

  • Top-p = 0.9: "Consider only the top 90% most probable words"
  • Top-p = 0.5: "Consider only the top 50% most probable words"
  • Lower values make responses more focused and consistent
  • Higher values allow more variety while maintaining quality

When to Use Top-p

  • Use low top-p (0.1-0.3) for technical content where accuracy is critical
  • Use medium top-p (0.7-0.9) for general chat and creative tasks
  • Often used together with temperature for fine-tuned control

3. Top-k: The Choice Limiter

Limits the selection pool to only the top 'k' most likely words, providing strict control over response variety.

How Top-k Works

  • Top-k = 40: "Only consider the 40 most probable words"
  • Top-k = 10: "Only consider the 10 most probable words"
  • Lower values create more predictable, repetitive responses
  • Higher values allow more variety within a controlled range

Top-p vs. Top-k: Key Differences

  • Top-p selects by probability percentage (dynamic range)
  • Top-k selects by fixed number of options (static range)
  • Use Top-p when you want consistent quality across different inputs
  • Use Top-k when you need strict, predictable limits on choices

Practical Implementation

In Developer APIs

When using LLMs programmatically, you control all three settings in your code:

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Write a Python function"}],
    temperature=0.2,    # Controls overall creativity
    top_p=0.1,         # Filters low-probability words
    top_k=40,          # Limits choice pool size
    max_tokens=1000
)

In Web Chat Interfaces

  • Most web interfaces like ChatGPT and DeepSeek chat don't expose these controls to end users
  • Platforms use optimized default settings for general use
  • To access these controls, you need to use the platform's API programmatically

In Local Model Tools

  • Tools like Ollama and LM Studio allow full control over all settings
  • You can adjust temperature, top-p, and top-k through their APIs or interfaces
  • Ideal for experimentation and fine-tuning specific use cases

Quick Developer Guide

Recommended Starting Points

  • For code generation: temperature=0.2, top_p=0.1, top_k=20
  • For creative writing: temperature=0.8, top_p=0.9, top_k=60
  • For balanced chat: temperature=0.7, top_p=0.8, top_k=40
  • For factual Q&A: temperature=0.1, top_p=0.1, top_k=10

Best Practices

  • Start with temperature only - it's the most impactful setting
  • Add top-p if you need more consistent quality control
  • Use top-k sparingly - it can make responses too repetitive
  • Experiment gradually - change one setting at a time to see its effect
  • Default values (temperature=0.7, top_p=1.0, top_k=∞) work well for most general purposes

Summary

  • Temperature is your main creativity control - adjust this first
  • Top-p acts as a quality filter for word selection
  • Top-k provides strict limits on response variety
  • You can only control these settings when using APIs or local tools, not in simple web chats
  • Start with conservative values and adjust based on your specific needs

See Also