Eval bug: Excessive stack usage during tool calling #12234

edmcman · 2025-03-06T21:08:46Z

Name and Version

./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Laptop GPU, compute capability 8.9, VMM: yes
version: 4840 (3ffbbd5)
built with Ubuntu clang version 18.1.8 (++20240731024944+3b5b5c1ec4a3-1~~exp1~~20240731145000.144) for x86_64-pc-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

i9-13900HX + NVIDIA GeForce RTX 4070

Models

bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M

Problem description & steps to reproduce

cc/@ochafik

I am attempting to run BFCL on llama-server, and so far I have triggered a crash twice. It does not appear to be deterministic, unfortunately. In one instance, I was able to catch the crash with gdb. Here is the end of the backtrace:

#87097 0x00005669dac2b7f9 in bool std::__detail::__regex_algo_impl<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, char, std::__cxx11::regex_traits<char> >(__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, __gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__cxx11::match_results<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > >&, std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> > const&, std::regex_constants::match_flag_type, std::__detail::_RegexExecutorPolicy, bool) ()
#87098 0x00007116a7f3ac54 in llama_grammar_accept_impl(llama_grammar&, int) () from /home/ed/Projects/llama.cpp/build/bin/libllama.so
#87099 0x00005669dadb179a in common_sampler_accept(common_sampler*, int, bool) ()
#87100 0x00005669dac5c626 in server_context::update_slots() ()
#87101 0x00005669dabe4886 in server_queue::start_loop() ()
#87102 0x00005669dabb0bc8 in main ()

The remaining 87096 stack frames were identical. So while I have not been able to find the exact input that triggered the crash yet, I hoped that this might be enough of a clue as to what is going on.

Here is some more information about what I am doing:

/home/ed/Projects/llama.cpp/build/bin/llama-server --ctx-size 0 --jinja -fa -hf bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M --host 0.0.0.0 -ngl 100
python /home/ed/Projects/gorilla/berkeley-function-call-leaderboard/venv/bin/bfcl generate --model gpt-4-turbo-2024-04-09-FC --test-category all --include-input-log
I added this patch:

diff --git a/berkeley-function-call-leaderboard/bfcl/model_handler/api_inference/openai.py b/berkeley-function-call-leaderboard/bfcl/model_handler/api_inference/openai.py
index fbf7c0f..fc0da1f 100644
--- a/berkeley-function-call-leaderboard/bfcl/model_handler/api_inference/openai.py
+++ b/berkeley-function-call-leaderboard/bfcl/model_handler/api_inference/openai.py
@@ -22,7 +22,7 @@ class OpenAIHandler(BaseHandler):
     def __init__(self, model_name, temperature) -> None:
         super().__init__(model_name, temperature)
         self.model_style = ModelStyle.OpenAI
-        self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
+        self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"), base_url="http://localhost:8080")
 
     def decode_ast(self, result, language="Python"):
         if "FC" in self.model_name or self.is_fc_model:

First Bad Commit

No response

Relevant log output

srv  update_slots: all slots are idle
srv  log_server_r: request: POST /chat/completions 127.0.0.1 200
srv  params_from_: Chat format: Hermes 2 Pro
slot launch_slot_: id  0 | task 48450 | processing task
slot update_slots: id  0 | task 48450 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 326
slot update_slots: id  0 | task 48450 | kv cache rm [67, end)
slot update_slots: id  0 | task 48450 | prompt processing progress, n_past = 326, n_tokens = 259, progress = 0.794479
slot update_slots: id  0 | task 48450 | prompt done, n_past = 326, n_tokens = 259
slot      release: id  0 | task 48450 | stop processing: n_past = 504, truncated = 0
slot print_timing: id  0 | task 48450 | 
prompt eval time =     104.08 ms /   259 tokens (    0.40 ms per token,  2488.52 tokens per second)
       eval time =    3465.17 ms /   179 tokens (   19.36 ms per token,    51.66 tokens per second)
      total time =    3569.24 ms /   438 tokens
srv  update_slots: all slots are idle
srv  log_server_r: request: POST /chat/completions 127.0.0.1 200
srv  params_from_: Chat format: Hermes 2 Pro
slot launch_slot_: id  0 | task 48630 | processing task
slot update_slots: id  0 | task 48630 | new prompt, n_ctx_slot = 32768, n_keep = 0, n_prompt_tokens = 326
slot update_slots: id  0 | task 48630 | kv cache rm [67, end)
slot update_slots: id  0 | task 48630 | prompt processing progress, n_past = 326, n_tokens = 259, progress = 0.794479
slot update_slots: id  0 | task 48630 | prompt done, n_past = 326, n_tokens = 259
/home/ed/.local/share/dorothy/user/commands/llama-cpp-server: line 8: 709629 Segmentation fault      (core dumped) ~/Projects/llama.cpp/build/bin/llama-server --ctx-size $CTX_SIZE --jinja -fa -hf "$MODEL" --host 0.0.0.0 -ngl $OFFLOAD_NUM $OTHERARGS

The text was updated successfully, but these errors were encountered:

edmcman · 2025-03-06T21:14:05Z

This time around it was the java_47 test that failed. I think the other crashes were also related to java.

I don't think there is a way to run a specific test though in BFCL, but we can do --test-category java at least. I'm going to try something like llama-server --verbose 2>&1 | tail -n1000 to see if I can pick up anything helpful before it crashes.

edmcman · 2025-03-06T21:24:49Z

java_47 seems to be a consistent problem. In a new run, it hasn't crashed yet, but it has been performing inference on it for about five minutes now...

Here is the "question":

{"id": "java_47", "question": [[{"role": "user", "content": "Help me output a formatted Java constant declaration for a large Base64 encoded string representing a certificate, with the constant name 'CERTIFICATE' and the value being a 1024-character long Base64 string with 'MIIFdTCCBF2gAwIBAgISESG'?"}]], "function": [{"name": "LargeHandshakeTest.format", "description": "Outputs a formatted Java constant declaration for a given name and value, splitting the value into multiple lines if it exceeds 60 characters.", "parameters": {"type": "dict", "properties": {"name": {"type": "String", "description": "The name of the Java constant."}, "value": {"type": "String", "description": "The value of the Java constant, which will be split into multiple lines if it's too long."}}, "required": ["name", "value"]}}]}

and an answer:

{"id": "java_47", "ground_truth": [{"LargeHandshakeTest.format": {"name": ["CERTIFICATE"], "value": ["MIIFdTCCBF2gAwIBAgISESG"]}}]}

I'm not sure why that would be causing an issue.

edmcman · 2025-03-06T21:43:31Z

Adding -n -2 to the llama-server args avoids the problem but all of the results -- not just java_47 -- become <tool_call> 😓 Yup, that's it. No content or closing tag. Not sure what's going on there either. Maybe a separate issue?

On the bright side, I did get the query since the -n -2 query "succeeded":

{
  "id": "java_47",
  "result": "<tool_call>",
  "inference_log": [
    {
      "role": "inference_input",
      "content": {
        "message": "[{'role': 'user', 'content': \"Help me output a formatted Java constant declaration for a large Base64 encoded string representing a certificate, with the constant name 'CERTIFICATE' and the value being a 1024-character long Base64 string with 'MIIFdTCCBF2gAwIBAgISESG'?\"}]",
        "tools": [
          {
            "type": "function",
            "function": {
              "name": "LargeHandshakeTest_format",
              "description": "Outputs a formatted Java constant declaration for a given name and value, splitting the value into multiple lines if it exceeds 60 characters. Note that the provided function is in Java 8 SDK syntax.",
              "parameters": {
                "type": "object",
                "properties": {
                  "name": {
                    "type": "string",
                    "description": "The name of the Java constant. This is Java String type parameter in string representation."
                  },
                  "value": {
                    "type": "string",
                    "description": "The value of the Java constant, which will be split into multiple lines if it's too long. This is Java String type parameter in string representation."
                  }
                },
                "required": [
                  "name",
                  "value"
                ]
              }
            }
          }
        ]
      }
    }
  ],
  "input_token_count": 326,
  "output_token_count": 1,
  "latency": 0.11321735382080078
}

I'll try to convert this into a curl-based test.

edmcman · 2025-03-06T21:50:25Z

curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
  "model": "gpt-4",
  "messages": [
    {
      "role": "user", 
      "content": "Help me output a formatted Java constant declaration for a large Base64 encoded string representing a certificate, with the constant name '\''CERTIFICATE'\'' and the value being a 1024-character long Base64 string with '\''MIIFdTCCBF2gAwIBAgISESG'\''"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "LargeHandshakeTest_format",
        "description": "Outputs a formatted Java constant declaration for a given name and value, splitting the value into multiple lines if it exceeds 60 characters. Note that the provided function is in Java 8 SDK syntax.",
        "parameters": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string",
              "description": "The name of the Java constant. This is Java String type parameter in string representation."
            },
            "value": {
              "type": "string",
              "description": "The value of the Java constant, which will be split into multiple lines if it'\''s too long. This is Java String type parameter in string representation."
            }
          },
          "required": [
            "name",
            "value"
          ]
        }
      }
    }
  ]
}'

This reliably seems to trigger the issue. I also got the tail of the --verbose log: verbose.log

edmcman · 2025-03-06T21:57:58Z

Here's the full log: log.zip

bzcat /tmp/log.bz2 | fgrep 'next token' | awk '{print $18}' | uniq -c
      1 '<tool_call>'
      1 '
      1 '{"'
      1 'name'
      1 '":'
      1 '
      1 'Large'
      1 'Hand'
      1 'shake'
      1 'Test'
      1 '_format'
      1 '",'
      1 '
      1 'arguments'
      1 '":'
      1 '
      1 'name'
      1 '":'
      1 '
      1 'CERT'
      1 'IFICATE'
      1 '",'
      1 '
      1 'value'
      1 '":'
      1 '
      1 'MI'
      1 'IF'
      1 'dT'
      1 'CC'
      1 'BF'
      1 '2'
      1 'g'
      1 'Aw'
      1 'IB'
      1 'Ag'
      1 'ISE'
      1 'SG'
   5427 'XXXXXXXX'

So the model just outputs a bunch of jibberish.

ochafik · 2025-03-06T22:09:18Z

So the model just outputs a bunch of jibberish.

@edmcman adding a --repeat-penalty 2.0 prevents the model from entering that infinite loop (no clue what's a good penalty tbh but maybe that model needs it to be more reasonable).

ochafik · 2025-03-06T22:15:33Z

@edmcman Alternatively, its cousin bartowski/Qwen2.5-Coder-7B-Instruct-GGUF:Q4_K_M behaves well on that example w/o a penalty.

Btw, I've also been trying to run the benchmark, I may have written more code than needed haha.

edmcman · 2025-03-06T23:38:29Z

@edmcman adding a --repeat-penalty 2.0 prevents the model from entering that infinite loop (no clue what's a good penalty tbh but maybe that model needs it to be more reasonable).

Nice, I was just starting to play with that before ending my work day, but I went in the wrong direction (0.9).

Btw, I've also been trying to run the benchmark, I may have written more code than needed haha.

Wow, you went all out! Good for you! I felt a little guilty with my one-line hack :) I was a little surprised they didn't already have an option to use an existing openai server but pass the tools as tools.

edmcman · 2025-03-07T12:35:41Z

Btw, I found that this paper recommends a repetition penalty of 1.2.

ggerganov · 2025-03-11T08:11:30Z

@ochafik I noticed the discussion about the repetition penalty. Without knowing much details about the use case, I just tested the curl command from #12234 (comment) and with greedy-sampling (i.e. "samplers": ["top_k"], "top_k": 1) I get the following output:

    "content": "<tool_call>\n{\"name\": \"LargeHandshakeTest_format\", \"arguments\": {\"name\": \"CERTIFICATE\", \"value\": \"MIIFdTCCBF2gAwIBAgISESGXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",

This made me think that for some reason, the model does not want to sample the closing quotes of the "value". I then realized that the request wants to generate a "1024-character long" value. I don't really understand what this means, but I suspect that this makes the model try to generate a 1024-character long string in the "value" and that's why it keep repeating XXXX... forever. So I tried to simply reword the request like this (i.e. remove the text about "1024-character long" string):

#!/bin/bash

curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
  "model": "gpt-4",
  "temperature": 0.0,
  "n_predict": 48,
  "messages": [
    {
      "role": "user",
      "content": "Help me output a formatted Java constant declaration for a large Base64 encoded string representing a certificate, with the constant name '\''CERTIFICATE'\'' and the value being a Base64 string with '\''MIIFdTCCBF2gAwIBAgISESG'\''"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "LargeHandshakeTest_format",
        "description": "Outputs a formatted Java constant declaration for a given name and value, splitting the value into multiple lines if it exceeds 60 characters. Note that the provided function is in Java 8 SDK syntax.",
        "parameters": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string",
              "description": "The name of the Java constant. This is Java String type parameter in string representation."
            },
            "value": {
              "type": "string",
              "description": "The value of the Java constant, which will be split into multiple lines if it'\''s too long. This is Java String type parameter in string representation."
            }
          },
          "required": [
            "name",
            "value"
          ]
        }
      }
    }
  ]
}'

This seems to work correctly, producing:

    "content": "<tool_call>\n{\"name\": \"LargeHandshakeTest_format\", \"arguments\": {\"name\": \"CERTIFICATE\", \"value\": \"MIIFdTCCBF2gAwIBAgISESG\"}}\n</tool_call>",

Note that this does not require a repetition penalty.

So in summary, I strongly believe that the best sampling settings for any model is simple greedy sampling. This is especially true for constrained generations like in this case. Repetition penalties should always be avoided and needing them always proves to be due to some underlying problem that should be solved instead of adding a repetition penalty.

Whenever you encounter some use case where it looks like that greedy sampling is not optimal, please let me know and I will try to show it's not the case. Hope this helps!

edmcman · 2025-03-12T14:20:55Z

@ggerganov I have a (perhaps silly) question: Why isn't simple greedy sampling the default for llama-server?

edmcman · 2025-03-17T19:32:43Z

Just to follow up, I have found an example where using greedy decoding resulted in a repeating pattern, and otherwise did not. 🤷

ggerganov · 2025-03-18T08:56:14Z

What is the example?

ochafik · 2025-03-18T09:35:59Z

Note that the unsloth folks have also been advising a pinch of repetition penalty, combined with a reshuffling of samplers, in their QwQ guide (haven’t tested their examples yet but I did face anecdotal repetition issues with both Qwen2.5 & QwQ myself).

Might be a common trait of Qwen’s models / quirk of their finetunes?

(Edit: actually given unsloth seem to imply these hacks are only needed for llama.cpp, probably indicative of an underlying problem as @ggerganov said above)

ggerganov · 2025-03-18T10:02:32Z

I am skeptical. The greedy sampling works perfectly fine for this case:

llama-cli \
    -hf unsloth/QwQ-32B-GGUF:Q4_K_M \
    --threads 32 \
    --ctx-size 16384 \
    --n-gpu-layers 99 \
    --seed 3407 \
    --prio 2 \
    --top-k 1 \
    --samplers top-k \
    -no-cnv \
    --prompt "<|im_start|>user\nCreate a Flappy Bird game in Python. You must include these things:\n1. You must use pygame.\n2. The background color should be randomly chosen and is a light shade. Start with a light blue color.\n3. Pressing SPACE multiple times will accelerate the bird.\n4. The bird's shape should be randomly chosen as a square, circle or triangle. The color should be randomly chosen as a dark color.\n5. Place on the bottom some land colored as dark brown or yellow chosen randomly.\n6. Make a score shown on the top right side. Increment if you pass pipes and don't hit them.\n7. Make randomly spaced pipes with enough space. Color them randomly as dark green or light brown or a dark gray shade.\n8. When you lose, show the best score. Make the text inside the screen. Pressing q or Esc will quit the game. Restarting is pressing SPACE again.\nThe final game should be inside a markdown section in Python. Check your code for errors and fix them before the final markdown section.<|im_end|>\n<|im_start|>assistant\n<think>\n"

Edit: Note that it's actually better because the generated example with greedy sampling has "brown land" at the bottom (per the instructions in the prompt), while Unsloth's generation does not have land.

ochafik · 2025-03-18T10:09:03Z

Re/ greedy sampling in general (discounting potential specific problems here), probably a good idea for most business tool call use cases, but I’ve had use cases where I needed structured outputs with very random / creative outputs

edmcman · 2025-03-18T14:14:07Z

I'll try to extract out my example, but it might take a while.

edmcman · 2025-03-18T15:05:16Z

https://gist.github.com/edmcman/dfb7c906b15f47819653f467d70ee0f8

This takes about two minutes for me, and ends with:

\nGNU coreutils online help: https://www.gnu.org/software/coreutils/\nFull documentation <https://www.gnu.org/software/coreutils/ls invocation>\nor available locally via: info '(coreutils) ls invocation'\nor available locally via: info '(coreutils) ls invocation'\nor online help: https://www.gnu.org/software/coreutils/\nGNU coreutils online help: https://www.gnu.org/software/coreutils/\nFull documentation <https://www.gnu.org/software/coreutils/ls invocation>\nor available locally via: info '(coreutils) ls invocation'\nor available locally via: info '(coreutils) ls invocation'\nor online help: https://www.gnu.org/software/coreutils/\nGNU coreutils online help: https://www.gnu.org/software/coreutils/\nFull documentation <https://www.gnu.org/software/coreutils/ls invocation>\nor available locally via: info '(coreutils) ls invocation'\nor available locally via: info '(coreutils) ls invocation'\nor online help: https://www.gnu.org/software/coreutils/\nGNU coreutils online help: https://www.gnu.org/software/coreutils/\nFull documentation <https://www.gnu.org/software/coreutils/ls invocation>\nor available locally via: info '(coreutils) ls invocation'\nor available locally via: info '(coreutils) ls invocation'\nor online help: https://www.gnu.org/software/coreutils/\nGNU coreutils

The context window is obviously being overflowed here just by the prompt, so maybe this example doesn't count. But I have examples where the context window is not being overflowed too.

ochafik · 2025-03-18T17:11:49Z

The context window is obviously being overflowed here just by the prompt, so maybe this example doesn't count. But I have examples where the context window is not being overflowed too.

Btw, different finetune but since your task seems coding-oriented you may wanna try this yarn-extended GGUF that might accept your whole prompt: unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF

I don't have enough context but don't understand why we allow overflows why we allow overflows why we allow overflows

ggerganov · 2025-03-18T17:15:57Z

The same command, but just with --ctx-size 32768, the output is as follows:

void FUN_001091d0(int param_1)

{
  FILE *pFVar1;
  undefined *puVar2;
  int iVar3;
  undefined8 uVar4;
  char *pcVar5;
  undefined8 uVar6;
  char *pcVar7;
  undefined *puVar8;
  undefined8 *puVar9;
  long in_FS_OFFSET;
  undefined8 local_b8;
  char *local_b0;
  char *local_a8 [5];
  char *local_80;
  char *local_78;
  char *local_70;
  undefined *local_68;
  char *local_60;
  undefined8 local_58;
  undefined8 local_50;
  undefined8 local_40;
  
  uVar6 = DAT_00122458;
  local_40 = *(undefined8 *)(in_FS_OFFSET + 0x28);
  if (param_1 != 0) {
    uVar4 = dcgettext(0,"Try '%s --help' for more information.
",5);
    __fprintf_chk(*(undefined8 *)PTR_stderr_00121ff0,1,uVar4,uVar6);
    goto LAB_0010922d;
  }
  uVar4 = dcgettext(0,"Usage: %s [OPTION]... [FILE]...
",5);
  __printf_chk(1,uVar4,uVar6);
  puVar2 = PTR_stdout_00121fa8;
  pFVar1 = *(FILE **)PTR_stdout_00121fa8;
  pcVar5 = (char *)dcgettext(0,"List information about the FILEs (the current directory by default). Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"Mandatory arguments to long options are mandatory for short options too.
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -a, --all                  do not ignore entries starting with .
  -A, --almost-all           do not list implied . and ..
      --author               with -l, print the author of each file
  -b, --escape               print C-style escapes for nongraphic characters
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"      --block-size=SIZE      with -l, scale sizes by SIZE when printing them; e.g., '--block-size=M'; see SIZE format below
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -B, --ignore-backups       do not list implied entries ending with ~
  -c                         with -lt: sort by, and show, ctime (time of last
                               modification of file status information); with -l: show ctime and sort by name; otherwise: sort by ctime, newest first
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -C                         list entries by columns
      --color[=WHEN]         colorize the output; WHEN can be 'always' (default if omitted), 'auto', or 'never'; more info below
  -d, --directory            list directories themselves, not their contents
  -D, --dired                generate output designed for Emacs' dired mode
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -f                         do not sort, enable -aU, disable -ls --color
  -F, --classify             append indicator (one of */=>@|) to entries
      --file-type            likewise, except do not append ' '*
      --format=WORD          across -x, commas -m, horizontal -x, long -l, single-column -1, verbose -l, vertical -C
      --full-time            like -l --time-style=full-iso
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -g                         like -l, but do not list owner
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"      --group-directories-first
                             group directories before files;
                               can be augmented with a --sort option, but any
                               use of --sort=none (-U) disables grouping
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -G, --no-group             in a long listing, don't print group names
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -h, --human-readable       with -l and -s, print sizes like 1K 234M 2G etc.
      --si                   likewise, but use powers of 1000 not 1024
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -H, --dereference-command-line
                             follow symbolic links listed on the command line
      --dereference-command-line-symlink-to-dir
                             follow each command line symbolic link
                               that points to a directory
      --hide=PATTERN         do not list implied entries matching shell PATTERN
                               (overridden by -a or -A)
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"      --hyperlink[=WHEN]     hyperlink file names; WHEN can be 'always'
                               (default if omitted), 'auto', or 'never'
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"      --indicator-style=WORD  append indicator with style WORD to entry names:
                               none (default), slash (-p),
                               file-type (--file-type), classify (-F)
  -i, --inode                print the index number of each file
  -I, --ignore=PATTERN       do not list implied entries matching shell PATTERN
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -k, --kibibytes            default to 1024-byte blocks for disk usage;
                               used only with -s and per directory totals
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -l                         use a long listing format
  -L, --dereference          when showing file information for a symbolic
                               link, show information for the file the link
                               references rather than for the link itself
  -m                         fill width with a comma separated list of entries
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -n, --numeric-uid-gid      like -l, but list numeric user and group IDs
  -N, --literal              print entry names without quoting
  -o                         like -l, but do not list group information
  -p, --indicator-style=slash
                             append / indicator to directories
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -q, --hide-control-chars   print ? instead of nongraphic characters
      --show-control-chars   show nongraphic characters as-is (the default,
                               unless program is 'ls' and output is a terminal)
  -Q, --quote-name           enclose entry names in double quotes
      --quoting-style=WORD   use quoting style WORD for entry names:
                               literal, locale, shell, shell-always, shell-escape, shell-escape-always, c, escape
                               (overrides QUOTING_STYLE environment variable)
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -r, --reverse              reverse order while sorting
  -R, --recursive            list subdirectories recursively
  -s, --size                 print the allocated size of each file, in blocks
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -S                         sort by file size, largest first
      --sort=WORD            sort by WORD instead of name: none (-U), size (-S),
                               time (-t), version (-v), extension (-X)
      --time=WORD            change the default of using modification times;
                               access time (-u): atime, access, use;
                               change time (-c): ctime, status;
                               birth time: birth, creation;
                             with -l, WORD determines which time to show;
                             with --sort=time, sort by WORD (newest first)
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"      --time-style=TIME_STYLE  time/date format with -l; see TIME_STYLE below
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -t                         sort by time, newest first; see --time
  -T, --tabsize=COLS         assume tab stops at each COLS instead of 8
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -u                         with -lt: sort by, and show, access time;
                               with -l: show access time and sort by name;
                               otherwise: sort by access time, newest first
  -U                         do not sort; list entries in directory order
  -v                         natural sort of (version) numbers within text
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"  -w, --width=COLS           set output width to COLS.  0 means no limit
  -x                         list entries by lines instead of by columns
  -X                         sort alphabetically by entry extension
  -Z, --context              print any security context of each file
  -1                         list one file per line.  Avoid '\n' with -q or -b
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"      --help     display this help and exit
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"      --version  output version information and exit
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"\nThe SIZE argument is an integer and optional unit (example: 10K is 10*1024). Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000). Binary prefixes can be used, too: KiB=K, MiB=M, and so on.
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"\nThe TIME_STYLE argument can be full-iso, long-iso, iso, locale, or +FORMAT. FORMAT is interpreted like in date(1). If FORMAT is FORMAT1<newline>FORMAT2, then FORMAT1 applies to non-recent files and FORMAT2 to recent files. TIME_STYLE prefixed with 'posix-' takes effect only outside the POSIX locale. Also the TIME_STYLE environment variable sets the default style to use.
",5);
  fputs_unlocked(pcVar5,pFVar1);
  pFVar1 = *(FILE **)puVar2;
  pcVar5 = (char *)dcgettext(0,"\nUsing color to distinguish file types is disabled both by default and with --color=never. With --color=auto, ls emits color codes only when standard output is connected to a terminal. The LS_COLORS environment variable can change the settings. Use the dircolors command to set it.
",5);
  fputs_unlocked(pcVar5,pFVar1);
  local_b8 = 0x11a0a5;
  local_b0 = "test invocation";
  local_a8[0] = "coreutils";
  local_a8[1] = "Multi-call invocation";
  local_a8[4] = "sha256sum";
  local_a8[2] = "sha224sum";
  local_78 = "sha384sum";
  local_a8[3] = "sha2 utilities";
  local_80 = "sha2 utilities";
  local_70 = "sha2 utilities";
  local_68 = &DAT_00119981;
  local_60 = "sha2 utilities";
  local_58 = 0;
  local_50 = 0;
  puVar9 = &local_b8;
  do {
    puVar8 = (undefined *)puVar9;
    pcVar5 = *(char **)(puVar8 + 0x10);
    puVar9 = (undefined8 *)(puVar8 + 0x10);
    if (pcVar5 == (char *)0x0) break;
  } while (((*pcVar5 != 'l') || (pcVar5[1] != 's')) || (pcVar5[2] != '\0'));
  pcVar5 = *(char **)(puVar8 + 0x18);
  if (pcVar5 == (char *)0x0) {
    uVar6 = dcgettext(0,"\n%s online help: <%s>
",5);
    __printf_chk(1,uVar6,"GNU coreutils","https://www.gnu.org/software/coreutils/");
    pcVar5 = setlocale(5,(char *)0x0);
    if (pcVar5 != (char *)0x0) {
      iVar3 = strncmp(pcVar5,"en_",3);
      if (iVar3 != 0) {
        pcVar5 = "ls";
        goto LAB_0010986b;
      }
    }
    uVar6 = dcgettext(0,"Full documentation <%s%s>
",5);
    pcVar5 = "ls";
    pcVar7 = " invocation";
    __printf_chk(1,uVar6,"https://www.gnu.org/software/coreutils/","ls");
  }
  else {
    uVar6 = dcgettext(0,"\n%s online help: <%s>
",5);
    __printf_chk(1,uVar6,"GNU coreutils","https://www.gnu.org/software/coreutils/");
    pcVar7 = setlocale(5,(char *)0x0);
    if (pcVar7 != (char *)0x0) {
      iVar3 = strncmp(pcVar7,"en_",3);
      if (iVar3 != 0) {
LAB_0010986b:
        pFVar1 = *(FILE **)puVar2;
        pcVar7 = (char *)dcgettext(0,"Report any translation bugs to <https://translationproject.org/team/>
",5);
        fputs_unlocked(pcVar7,pFVar1);
      }
    }
    uVar6 = dcgettext(0,"Full documentation <%s%s>
",5);
    pcVar7 = " invocation";
    __printf_chk(1,uVar6,"https://www.gnu.org/software/coreutils/","ls");
    if (pcVar5 != "ls") {
      pcVar7 = "";
    }
  }
  uVar6 = dcgettext(0,"or available locally via: info '(coreutils) %s%s'
",5);
  __printf_chk(1,uVar6,pcVar5,pcVar7);
LAB_0010922d:
  /* WARNING: Subroutine does not return */
  exit(param_1);
}

Does it look ok?

edmcman · 2025-03-18T17:26:06Z

It does. I have been testing a variety of context window sizes, and while there are fewer problems with the larger context windows, there are problems there too. Would you like me to find one of those for you?

ggerganov · 2025-03-18T18:59:15Z

Yes, if you have something handy, feel free to share it so we can analyze. In general, I am sure there will be failure cases for greedy sampling. But my point is that for most tasks such as knowledge extraction, coding and overall anything that requires accuracy, it's just better to do greedy sampling. Doing so, inherently gives you the most accurate tokens. Using things like repetition penalty could help you get out of some loops in certain edge cases, but IMO it's not really worth it to sacrifice the accuracy that you would otherwise get without using the penalty.

Btw, what is even better than greedy sampling is beam search (same strategy as the one used in Whisper). But this is more difficult to implement for streaming cases and comes with some bigger performance penalty, so that's why I think it hasn't been adopted yet by text-generation use cases. For local usage, I think it makes sense and I'm planning to implement this as an option for code completion use cases.

github-actions · 2025-05-02T01:07:57Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

edmcman added the bug-unconfirmed label Mar 6, 2025

This was referenced Mar 9, 2025

tool-call: Phi-4 support #12288

Open

Misc. bug: tool call issues with hf unsloth/Qwen2.5-Coder-7B-Instruct-128K-GGUF #12279

Closed

github-actions bot added the stale label Apr 18, 2025

github-actions bot closed this as completed May 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: Excessive stack usage during tool calling #12234

Eval bug: Excessive stack usage during tool calling #12234

edmcman commented Mar 6, 2025

edmcman commented Mar 6, 2025

edmcman commented Mar 6, 2025

edmcman commented Mar 6, 2025

edmcman commented Mar 6, 2025

edmcman commented Mar 6, 2025

ochafik commented Mar 6, 2025

ochafik commented Mar 6, 2025

edmcman commented Mar 6, 2025

edmcman commented Mar 7, 2025

ggerganov commented Mar 11, 2025 •

edited

Loading

edmcman commented Mar 12, 2025

edmcman commented Mar 17, 2025 •

edited

Loading

ggerganov commented Mar 18, 2025

ochafik commented Mar 18, 2025 •

edited

Loading

ggerganov commented Mar 18, 2025 •

edited

Loading

ochafik commented Mar 18, 2025

edmcman commented Mar 18, 2025

edmcman commented Mar 18, 2025

ochafik commented Mar 18, 2025

ggerganov commented Mar 18, 2025

edmcman commented Mar 18, 2025

ggerganov commented Mar 18, 2025

github-actions bot commented May 2, 2025

Eval bug: Excessive stack usage during tool calling #12234

Eval bug: Excessive stack usage during tool calling #12234

Comments

edmcman commented Mar 6, 2025

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

edmcman commented Mar 6, 2025

edmcman commented Mar 6, 2025

edmcman commented Mar 6, 2025

edmcman commented Mar 6, 2025

edmcman commented Mar 6, 2025

ochafik commented Mar 6, 2025

ochafik commented Mar 6, 2025

edmcman commented Mar 6, 2025

edmcman commented Mar 7, 2025

ggerganov commented Mar 11, 2025 • edited Loading

edmcman commented Mar 12, 2025

edmcman commented Mar 17, 2025 • edited Loading

ggerganov commented Mar 18, 2025

ochafik commented Mar 18, 2025 • edited Loading

ggerganov commented Mar 18, 2025 • edited Loading

ochafik commented Mar 18, 2025

edmcman commented Mar 18, 2025

edmcman commented Mar 18, 2025

ochafik commented Mar 18, 2025

ggerganov commented Mar 18, 2025

edmcman commented Mar 18, 2025

ggerganov commented Mar 18, 2025

github-actions bot commented May 2, 2025

ggerganov commented Mar 11, 2025 •

edited

Loading

edmcman commented Mar 17, 2025 •

edited

Loading

ochafik commented Mar 18, 2025 •

edited

Loading

ggerganov commented Mar 18, 2025 •

edited

Loading