AI Basics - LLM Output Controls: Temperature, Top-p & Top-k

[Last Updated: Jan 6, 2026]

Previous Page Next Page

Learn how to control AI responses using three key settings that control creativity and predictability.

1. Temperature: The Creativity Dial

A setting that controls how creative vs. predictable the AI's responses are.

Low Temperature (0.1 - 0.3)

More focused and consistent responses
Less creative, more predictable outputs
Gives similar answers to similar questions
Best for: Code generation, data analysis, factual Q&A, technical documentation

High Temperature (0.7 - 0.9)

More creative and varied responses
Sometimes unpredictable but interesting
Generates different responses each time
Best for: Creative writing, brainstorming, marketing copy, storytelling

Simple Analogy

Think of temperature like a chef following a recipe:

Low temperature (0.2): Chef follows the recipe exactly every time
Medium temperature (0.7): Chef follows the recipe but adds personal touches
High temperature (0.9): Chef experiments freely with ingredients

2. Top-p: The Quality Filter

Also called "nucleus sampling" - this setting selects only from the most likely words, filtering out low-quality options in output response.

How Top-p Works

Top-p = 0.9: "Consider only the top 90% most probable words"
Top-p = 0.5: "Consider only the top 50% most probable words"
Lower values make responses more focused and consistent
Higher values allow more variety while maintaining quality

When to Use Top-p

Use low top-p (0.1-0.3) for technical content where accuracy is critical
Use medium top-p (0.7-0.9) for general chat and creative tasks
Often used together with temperature for fine-tuned control

3. Top-k: The Choice Limiter

Limits the selection pool to only the top 'k' most likely words, providing strict control over response variety.

How Top-k Works

Top-k = 40: "Only consider the 40 most probable words"
Top-k = 10: "Only consider the 10 most probable words"
Lower values create more predictable, repetitive responses
Higher values allow more variety within a controlled range

Top-p vs. Top-k: Key Differences

Top-p selects by probability percentage (dynamic range)
Top-k selects by fixed number of options (static range)
Use Top-p when you want consistent quality across different inputs
Use Top-k when you need strict, predictable limits on choices

Practical Implementation

In Developer APIs

When using LLMs programmatically, you control all three settings in your code:

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Write a Python function"}],
    temperature=0.2,    # Controls overall creativity
    top_p=0.1,         # Filters low-probability words
    top_k=40,          # Limits choice pool size
    max_tokens=1000
)

In Web Chat Interfaces

Most web interfaces like ChatGPT and DeepSeek chat don't expose these controls to end users
Platforms use optimized default settings for general use
To access these controls, you need to use the platform's API programmatically

In Local Model Tools

Tools like Ollama and LM Studio allow full control over all settings
You can adjust temperature, top-p, and top-k through their APIs or interfaces
Ideal for experimentation and fine-tuning specific use cases

Quick Developer Guide

Recommended Starting Points

For code generation: temperature=0.2, top_p=0.1, top_k=20
For creative writing: temperature=0.8, top_p=0.9, top_k=60
For balanced chat: temperature=0.7, top_p=0.8, top_k=40
For factual Q&A: temperature=0.1, top_p=0.1, top_k=10

Best Practices

Start with temperature only - it's the most impactful setting
Add top-p if you need more consistent quality control
Use top-k sparingly - it can make responses too repetitive
Experiment gradually - change one setting at a time to see its effect
Default values (temperature=0.7, top_p=1.0, top_k=∞) work well for most general purposes

Summary

Temperature is your main creativity control - adjust this first
Top-p acts as a quality filter for word selection
Top-k provides strict limits on response variety
You can only control these settings when using APIs or local tools, not in simple web chats
Start with conservative values and adjust based on your specific needs