You can paste the command below into your terminal or IDE to run your first API request. Make sure to replace $SHUTTLEAI_API_KEY with your secret API key from our Dashboard.

Each model has a set cost that will count as the number of “requests” used for that model. Limits can be found by visiting your Dashboard.
Model costs can be found here.
curl \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $SHUTTLEAI_API_KEY" \
  -d '{
     "model": "gpt-4",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7,
     "max_tokens": 5

This will call ShuttleAI’s gpt-4 model with certain kwargs set like max_tokens and temperature. Since we asked as a user to “Say this is a test!”, we will receive a response similar to:

    "choices": [
            "finish_reason": "length",
            "index": 0,
            "message": {
                "content": "This is a test!",
                "role": "assistant"
    "created": 1707784511,
    "id": "chatcmpl-c18f90b794e8574ef85b8566b070ce55",
    "model": "gpt-4",
    "object": "chat.completion",
    "system_fingerprint": "fp_1wzhx7ttnt",
    "usage": {
        "completion_tokens": 5,
        "prompt_tokens": 13,
        "total_tokens": 18

You just made your first ChatCompletion request with ShuttleAI!

But, what does that mean? Let’s break down the response object; Since we inputted a max_tokens parameter set to 5, we can see in our usage that the completion_tokens does not and will not surpass our set value of 5. Additionally, we have a finish_reason field, since the completion_tokens reached the max_tokens set, we get a “length” value, however, any of the strings listed below are applicable:

  • “stop” (When the full output of the request has been delivered)
  • “tool_calls” (When the response delivered uses a tool call)
  • “length” (When completion_tokens meets or exceeds max_tokens)

What if you want to receive each chunk of data as soon as it’s ready for you?

Check out Streaming!