Developers
Models and Pricing
An overview of our models' capabilities and their associated pricing.
Tools Pricing
Requests which make use of QJS provided server-side tools are priced based on two components: token usage and server-side tool invocations. Since the agent autonomously decides how many tools to call, costs scale with query complexity.
Token Costs
All standard token types are billed at the rate for the model used in the request:
Input tokens: Your query and conversation history
Reasoning tokens: Agent's internal thinking and planning
Completion tokens: The final response
Image tokens: Visual content analysis (when applicable)
Cached prompt tokens: Prompt tokens that were served from cache rather than recompute
Batch API Pricing
The Batch API lets you process large volumes of requests asynchronously at 50% of standard pricing — effectively cutting your token costs in half. Batch requests are queued and processed in the background, with most completing within 24 hours.
Voice Agent API Pricing
The Voice Agent API is a real-time voice conversation offering, billed at a straightforward flat rate of $0.05 per minute of connection time.
Usage Guidelines Violation Fee
When your request is deemed to be in violation of our usage guideline by our system, we will still charge for the generation of the request.
For violations that are caught before generation in the Responses API, we will charge a $0.05 usage guideline violation fee per request.
Adiitional Information Regarding Models
No access to realtime events without search tools enabled
QJS has no knowledge of current events or data beyond what was present in its training data.
To incorporate realtime data with your request, enable server-side search tools
Chat models
No role order limitation: You can mix
system,user, orassistantroles in any sequence for your conversation context.
Image input models
Maximum image size:
20MiBMaximum number of images: No limit
Supported image file types:
jpg/jpegorpng.Any image/text input order is accepted (e.g. text prompt can precede image prompt)
Model Aliases
Some models have aliases to help users automatically migrate to the next version of the same model. In general:
<modelname> is aliased to the latest stable version.
<modelname>-latest is aliased to the latest version. This is suitable for users who want to access the latest features.
<modelname>-<date> refers directly to a specific model release. This will not be updated and is for workflows that demand consistency.
For most users, the aliased <modelname> or <modelname>-latest are recommended, as you would receive the latest features automatically.
Billing and Availability
Your model access might vary depending on various factors such as geographical location, account limitations, etc.
Model Input and Output
Each model can have one or multiple input and output capabilities. The input capabilities refer to which type(s) of prompt can the model accept in the request message body. The output capabilities refer to which type(s) of completion will the model generate in the response message body.
This is a prompt example for models with text input capability:
This is a prompt example for models with text and image input capabilities:
This is a prompt example for models with text input and image output capabilities:
Context Window
The context window determines the maximum amount of tokens accepted by the model in the prompt.
For more information on how token is counted, visit Consumption and Rate Limits.
If you are sending the entire conversation history in the prompt for use cases like chat assistant, the sum of all the prompts in your conversation history must be no greater than the context window.
Cached prompt tokens
Trying to run the same prompt multiple times? You can now use cached prompt tokens to incur less cost on repeated prompts. By reusing stored prompt data, you save on processing expenses for identical requests. Enable caching in your settings and start saving today!
Join our Community Forum
Any other questions? Get in touch