Tuning LLM Parameters in AgentRunner

This article explains the various parameters available in AgentRunner for tuning Large Language Models (LLMs). Adjusting these parameters allows you to fine-tune the behavior of the AI, influencing the length, creativity, and consistency of the responses generated by your agents. This guide covers settings applicable to different LLMs, including OpenAI, Gemini, and Anthropic models, empowering you to optimize agent performance for specific tasks.

Configuring LLM Parameters for Optimal Performance

Understanding and correctly configuring LLM parameters is crucial for achieving the desired behavior from your AgentRunner agents. These parameters control various aspects of the AI's output, such as its length, creativity, and tendency to repeat information. By carefully tuning these settings, you can optimize your agents for specific tasks, ensuring they provide accurate, relevant, and engaging responses. AgentRunner supports a range of parameters across different LLMs like OpenAI, Gemini and Anthropic allowing for precise customization.

Max Token: Controlling the Length of AI Responses

The Max Token parameter determines the maximum length of the AI-generated response. It is available for OpenAI, Gemini, and Anthropic models.

Function: Sets an upper limit on the number of tokens in the AI's output.
Impact: Lowering the max token value results in shorter responses, while increasing it allows for more verbose and detailed answers. The actual number of words per token varies between models due to differing tokenization methods.
Tokenization: Tokenization processes differ between models, so the same text might be represented by a different number of tokens depending on the model you use.
Model Limits: Each model has its own maximum token limit, which can impact the quality of the response if the max token setting is higher than this limit, potentially resulting in cut-off endings.
Cost: The more tokens generated, the higher the cost.

To configure the maximum token length:

Locate the "Max Token" setting in the AgentRunner node editor under the Controls tab.
Enter the desired token limit.
Ensure that the chosen value is within the model's capabilities.

There are tools available online that can help you estimate how many tokens a particular piece of text will be for each model. Note: Setting a max token value that is too high can lead to unexpected costs, while setting it too low may result in incomplete or less informative responses.

Temperature: Adjusting the "Creativity" of AI Responses

The Temperature parameter, available for OpenAI, Gemini, and Anthropic models, controls the randomness and creativity of the AI's output.

Function: Influences the AI's sampling process, allowing it to output less likely tokens, thus increasing randomness.
Impact:
- Higher Temperature (closer to 2): Leads to more creative, unexpected, and potentially less coherent responses. The AI is more likely to take risks and generate novel text.
- Lower Temperature (closer to 0): Results in more grounded, predictable, and focused responses. The AI sticks to the most probable tokens, producing text that is similar and relevant.
Use Cases:
- Low Temperature: Ideal for tasks requiring precision and consistency, such as editing structured texts or generating code.
- High Temperature: Suitable for creative writing, brainstorming, or scenarios where originality is valued over strict accuracy.
Range: The temperature value typically ranges from 0 to 2, with 1 being the default.

To adjust the temperature:

Locate the "Temperature" setting in the AgentRunner node editor, under the Controls tab.
Move the slider to the desired value between 0 and 2.

Tip: Experiment with different temperature values to find the optimal balance between creativity and accuracy for your specific use case.

TopP and TopK: Refining Token Sampling for Coherence

TopP (Nucleus Sampling): Available for OpenAI, Gemini, and Anthropic models, TopP limits the AI's token choices based on probability percentage.

Function: Chooses from a subset of the most probable tokens whose cumulative probability exceeds a certain threshold.
Impact:
- Lower TopP (closer to 0): Increases the diversity of the output by considering a wider range of tokens.
- Higher TopP (closer to 1): Makes the output more consistent and coherent by focusing on the most likely tokens.
Range: The TopP value ranges from 0 to 1 for each model included in AgentRunner.

TopK: Available for Gemini and Anthropic models, TopK limits the number of sampled tokens.

Function: Limits the AI's choices to the K most likely tokens.
Impact:
- Lower TopK: Produces more similar and predictable answers, as the AI only considers the most probable tokens.
- Higher TopK: Increases the diversity of the output by allowing the AI to sample from a larger pool of tokens.

Tip: Both TopP and TopK determine which tokens the AI takes into consideration, with TopP deciding the token sample pool's size based on the confidence score of each upcoming token, and TopK altering the taken samples by considering the exact number of tokens we set, starting from the most confident one.

Penalizing Token Repetition: Presence and Frequency Penalties

LLMs provide two parameters, Presence Penalty and Frequency Penalty, to control token repetition in AI-generated text. These settings help ensure that the output remains diverse and engaging.

Presence Penalty: Discouraging Token Repetition

The Presence Penalty, available for OpenAI and Gemini models, penalizes token repetition based on whether a token has already appeared in the text.

Function: Discourages the AI from using tokens that have already been used, regardless of how frequently they have appeared.
Impact: A higher presence penalty leads to less repetition of tokens.
Range: The presence penalty value typically ranges from 0 to 2.

To configure the presence penalty:

Locate the "Presence Penalty" setting in the AgentRunner node editor, under the Controls tab.
Configure the slider to a value between 0 and 2.

Example: If a presence penalty is set to a high value, the AI will avoid using tokens that have already been used, even if they are highly relevant to the context.

Frequency Penalty: Reducing Token Repetition Based on Frequency

The Frequency Penalty, available for OpenAI and Gemini models, penalizes token repetition based on the number of times a token has appeared in the text.

Function: Reduces the likelihood of using tokens that have already appeared frequently in the output.
Impact: A higher frequency penalty leads to less repetition of frequently used tokens. The more often a token appears, the less it will be repeated in the text going forward.
Range: The frequency penalty value typically ranges from 0 to 2.

To configure the frequency penalty:

Locate the "Frequency Penalty" setting in the AgentRunner node editor.
Move the slider to the desired value between 0 and 2.

Example: If a frequency penalty is set to a high value, the AI will reduce the use of tokens that have already appeared multiple times in the output, promoting more diverse language.

Advanced Settings for Enhanced Reasoning and Contextual Awareness

AgentRunner offers advanced settings, such as Reasoning Effort and Search Context Size, to enhance the reasoning capabilities and contextual awareness of specific OpenAI models. These settings can significantly improve the quality and relevance of AI-generated responses.

Reasoning Effort: Enhancing Complex Reasoning in OpenAI Models

The Reasoning Effort setting is available for specific OpenAI models (o1, o1-mini, o1-pro, o3, o3-mini).

Function: Controls the amount of internal "Chain of Thought" (CoT) reasoning the model employs. CoT reasoning involves breaking down complex problems into intermediate steps, leading to more accurate and insightful solutions.
Possible Settings: Low, Medium or High, the default value is Medium.
Impact: The higher the reasoning effort, the better the answer you’ll receive. The internal reasoning tokens won’t be displayed, but they’ll be counted as output tokens.
Use Cases: Higher reasoning effort is beneficial for more complex tasks.
Cost: The lower the reasoning effort, the less it costs, because the tokens used for reasoning are counted as output tokens at billing.

To adjust the reasoning effort:

Locate the "Reasoning Effort" setting in the AgentRunner node editor.
Select the desired level of reasoning effort (Low, Medium or High).

Tip: Experiment with different reasoning effort levels to find the optimal balance between cost and accuracy for your specific task.

Search Context Size: Integrating Real-Time Web Information

The Search Context Size setting is available for specific OpenAI models (GPT-4o, GPT-4o-mini,GPT- 4.1, GPT-4.1-mini, GPT- 4.1-nano).

Function: Enables the AI to search the web for up-to-date information and visit links provided by the user.
Possible Settings: Low, Medium, High, or None.
Impact:
- Turning on this setting allows the AI to search the web for newer and more information and visit links you inserted.
- Higher search context allows you to receive richer information, however, it raises latency time and token cost.
Citations: OpenAI Documentation urges you to ensure the citations are clickable links in the displayed end results on your website.
Cost: The search tokens will be only used during generation, they’ll not be counted towards the max token limit of the models, but they’ll be priced as output tokens.
Latency: Higher search context increases the time it takes to generate a response.

To configure the search context size:

Locate the "Search Context Size" setting in the AgentRunner node editor.
Select the desired search context size (Low, Medium, High, or None (uncheck the checkbox)).

Warning: Ensure that any citations displayed on your website are clickable links, as recommended by OpenAI.

Summary of LLM Parameter Support

The following table summarizes the availability of LLM parameters across different models:

LLM/Property	Max Token	Temperature	TopP	TopK	Presence Penalty	Frequency Penalty	Reasoning effort	Web Search
OpenAI	✔	✔	✔	✖	✔	✔	✖	✔
OpenAI Reasoning	✖	✖	✖	✖	✖	✖	✔	✖
Gemini	✔	✔	✔	✔	✔	✔	✖	✖
Anthropic	✔	✔	✔	✔	✖	✖	✖	✖

By understanding and adjusting these parameters, you can optimize your AgentRunner agents for a wide range of tasks, ensuring they deliver the best possible results.

Nodes