-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Eval bug: Excessive stack usage during tool calling #12234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This time around it was the I don't think there is a way to run a specific test though in BFCL, but we can do |
Here is the "question": {"id": "java_47", "question": [[{"role": "user", "content": "Help me output a formatted Java constant declaration for a large Base64 encoded string representing a certificate, with the constant name 'CERTIFICATE' and the value being a 1024-character long Base64 string with 'MIIFdTCCBF2gAwIBAgISESG'?"}]], "function": [{"name": "LargeHandshakeTest.format", "description": "Outputs a formatted Java constant declaration for a given name and value, splitting the value into multiple lines if it exceeds 60 characters.", "parameters": {"type": "dict", "properties": {"name": {"type": "String", "description": "The name of the Java constant."}, "value": {"type": "String", "description": "The value of the Java constant, which will be split into multiple lines if it's too long."}}, "required": ["name", "value"]}}]} and an answer: {"id": "java_47", "ground_truth": [{"LargeHandshakeTest.format": {"name": ["CERTIFICATE"], "value": ["MIIFdTCCBF2gAwIBAgISESG"]}}]} I'm not sure why that would be causing an issue. |
Adding On the bright side, I did get the query since the {
"id": "java_47",
"result": "<tool_call>",
"inference_log": [
{
"role": "inference_input",
"content": {
"message": "[{'role': 'user', 'content': \"Help me output a formatted Java constant declaration for a large Base64 encoded string representing a certificate, with the constant name 'CERTIFICATE' and the value being a 1024-character long Base64 string with 'MIIFdTCCBF2gAwIBAgISESG'?\"}]",
"tools": [
{
"type": "function",
"function": {
"name": "LargeHandshakeTest_format",
"description": "Outputs a formatted Java constant declaration for a given name and value, splitting the value into multiple lines if it exceeds 60 characters. Note that the provided function is in Java 8 SDK syntax.",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the Java constant. This is Java String type parameter in string representation."
},
"value": {
"type": "string",
"description": "The value of the Java constant, which will be split into multiple lines if it's too long. This is Java String type parameter in string representation."
}
},
"required": [
"name",
"value"
]
}
}
}
]
}
}
],
"input_token_count": 326,
"output_token_count": 1,
"latency": 0.11321735382080078
} I'll try to convert this into a curl-based test. |
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Help me output a formatted Java constant declaration for a large Base64 encoded string representing a certificate, with the constant name '\''CERTIFICATE'\'' and the value being a 1024-character long Base64 string with '\''MIIFdTCCBF2gAwIBAgISESG'\''"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "LargeHandshakeTest_format",
"description": "Outputs a formatted Java constant declaration for a given name and value, splitting the value into multiple lines if it exceeds 60 characters. Note that the provided function is in Java 8 SDK syntax.",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the Java constant. This is Java String type parameter in string representation."
},
"value": {
"type": "string",
"description": "The value of the Java constant, which will be split into multiple lines if it'\''s too long. This is Java String type parameter in string representation."
}
},
"required": [
"name",
"value"
]
}
}
}
]
}' This reliably seems to trigger the issue. I also got the tail of the |
Here's the full log: log.zip
So the model just outputs a bunch of jibberish. |
@edmcman adding a |
@edmcman Alternatively, its cousin Btw, I've also been trying to run the benchmark, I may have written more code than needed haha. |
Nice, I was just starting to play with that before ending my work day, but I went in the wrong direction (0.9).
Wow, you went all out! Good for you! I felt a little guilty with my one-line hack :) I was a little surprised they didn't already have an option to use an existing openai server but pass the tools as tools. |
Btw, I found that this paper recommends a repetition penalty of 1.2. |
@ochafik I noticed the discussion about the repetition penalty. Without knowing much details about the use case, I just tested the "content": "<tool_call>\n{\"name\": \"LargeHandshakeTest_format\", \"arguments\": {\"name\": \"CERTIFICATE\", \"value\": \"MIIFdTCCBF2gAwIBAgISESGXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX", This made me think that for some reason, the model does not want to sample the closing quotes of the #!/bin/bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-4",
"temperature": 0.0,
"n_predict": 48,
"messages": [
{
"role": "user",
"content": "Help me output a formatted Java constant declaration for a large Base64 encoded string representing a certificate, with the constant name '\''CERTIFICATE'\'' and the value being a Base64 string with '\''MIIFdTCCBF2gAwIBAgISESG'\''"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "LargeHandshakeTest_format",
"description": "Outputs a formatted Java constant declaration for a given name and value, splitting the value into multiple lines if it exceeds 60 characters. Note that the provided function is in Java 8 SDK syntax.",
"parameters": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The name of the Java constant. This is Java String type parameter in string representation."
},
"value": {
"type": "string",
"description": "The value of the Java constant, which will be split into multiple lines if it'\''s too long. This is Java String type parameter in string representation."
}
},
"required": [
"name",
"value"
]
}
}
}
]
}' This seems to work correctly, producing: "content": "<tool_call>\n{\"name\": \"LargeHandshakeTest_format\", \"arguments\": {\"name\": \"CERTIFICATE\", \"value\": \"MIIFdTCCBF2gAwIBAgISESG\"}}\n</tool_call>", Note that this does not require a repetition penalty. So in summary, I strongly believe that the best sampling settings for any model is simple greedy sampling. This is especially true for constrained generations like in this case. Repetition penalties should always be avoided and needing them always proves to be due to some underlying problem that should be solved instead of adding a repetition penalty. Whenever you encounter some use case where it looks like that greedy sampling is not optimal, please let me know and I will try to show it's not the case. Hope this helps! |
@ggerganov I have a (perhaps silly) question: Why isn't simple greedy sampling the default for |
Just to follow up, I have found an example where using greedy decoding resulted in a repeating pattern, and otherwise did not. 🤷 |
What is the example? |
Note that the unsloth folks have also been advising a pinch of repetition penalty, combined with a reshuffling of samplers, in their QwQ guide (haven’t tested their examples yet but I did face anecdotal repetition issues with both Qwen2.5 & QwQ myself). Might be a common trait of Qwen’s models / quirk of their finetunes? (Edit: actually given unsloth seem to imply these hacks are only needed for llama.cpp, probably indicative of an underlying problem as @ggerganov said above) |
I am skeptical. The greedy sampling works perfectly fine for this case: llama-cli \
-hf unsloth/QwQ-32B-GGUF:Q4_K_M \
--threads 32 \
--ctx-size 16384 \
--n-gpu-layers 99 \
--seed 3407 \
--prio 2 \
--top-k 1 \
--samplers top-k \
-no-cnv \
--prompt "<|im_start|>user\nCreate a Flappy Bird game in Python. You must include these things:\n1. You must use pygame.\n2. The background color should be randomly chosen and is a light shade. Start with a light blue color.\n3. Pressing SPACE multiple times will accelerate the bird.\n4. The bird's shape should be randomly chosen as a square, circle or triangle. The color should be randomly chosen as a dark color.\n5. Place on the bottom some land colored as dark brown or yellow chosen randomly.\n6. Make a score shown on the top right side. Increment if you pass pipes and don't hit them.\n7. Make randomly spaced pipes with enough space. Color them randomly as dark green or light brown or a dark gray shade.\n8. When you lose, show the best score. Make the text inside the screen. Pressing q or Esc will quit the game. Restarting is pressing SPACE again.\nThe final game should be inside a markdown section in Python. Check your code for errors and fix them before the final markdown section.<|im_end|>\n<|im_start|>assistant\n<think>\n" Edit: Note that it's actually better because the generated example with greedy sampling has "brown land" at the bottom (per the instructions in the prompt), while Unsloth's generation does not have land. |
Re/ greedy sampling in general (discounting potential specific problems here), probably a good idea for most business tool call use cases, but I’ve had use cases where I needed structured outputs with very random / creative outputs |
I'll try to extract out my example, but it might take a while. |
https://gist.github.com/edmcman/dfb7c906b15f47819653f467d70ee0f8 This takes about two minutes for me, and ends with:
The context window is obviously being overflowed here just by the prompt, so maybe this example doesn't count. But I have examples where the context window is not being overflowed too. |
Btw, different finetune but since your task seems coding-oriented you may wanna try this yarn-extended GGUF that might accept your whole prompt: unsloth/Qwen2.5-Coder-32B-Instruct-128K-GGUF I don't have enough context but don't understand why we allow overflows why we allow overflows why we allow overflows |
The same command, but just with void FUN_001091d0(int param_1)
{
FILE *pFVar1;
undefined *puVar2;
int iVar3;
undefined8 uVar4;
char *pcVar5;
undefined8 uVar6;
char *pcVar7;
undefined *puVar8;
undefined8 *puVar9;
long in_FS_OFFSET;
undefined8 local_b8;
char *local_b0;
char *local_a8 [5];
char *local_80;
char *local_78;
char *local_70;
undefined *local_68;
char *local_60;
undefined8 local_58;
undefined8 local_50;
undefined8 local_40;
uVar6 = DAT_00122458;
local_40 = *(undefined8 *)(in_FS_OFFSET + 0x28);
if (param_1 != 0) {
uVar4 = dcgettext(0,"Try '%s --help' for more information.
",5);
__fprintf_chk(*(undefined8 *)PTR_stderr_00121ff0,1,uVar4,uVar6);
goto LAB_0010922d;
}
uVar4 = dcgettext(0,"Usage: %s [OPTION]... [FILE]...
",5);
__printf_chk(1,uVar4,uVar6);
puVar2 = PTR_stdout_00121fa8;
pFVar1 = *(FILE **)PTR_stdout_00121fa8;
pcVar5 = (char *)dcgettext(0,"List information about the FILEs (the current directory by default). Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0,"Mandatory arguments to long options are mandatory for short options too.
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -a, --all do not ignore entries starting with .
-A, --almost-all do not list implied . and ..
--author with -l, print the author of each file
-b, --escape print C-style escapes for nongraphic characters
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," --block-size=SIZE with -l, scale sizes by SIZE when printing them; e.g., '--block-size=M'; see SIZE format below
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -B, --ignore-backups do not list implied entries ending with ~
-c with -lt: sort by, and show, ctime (time of last
modification of file status information); with -l: show ctime and sort by name; otherwise: sort by ctime, newest first
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -C list entries by columns
--color[=WHEN] colorize the output; WHEN can be 'always' (default if omitted), 'auto', or 'never'; more info below
-d, --directory list directories themselves, not their contents
-D, --dired generate output designed for Emacs' dired mode
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -f do not sort, enable -aU, disable -ls --color
-F, --classify append indicator (one of */=>@|) to entries
--file-type likewise, except do not append ' '*
--format=WORD across -x, commas -m, horizontal -x, long -l, single-column -1, verbose -l, vertical -C
--full-time like -l --time-style=full-iso
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -g like -l, but do not list owner
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," --group-directories-first
group directories before files;
can be augmented with a --sort option, but any
use of --sort=none (-U) disables grouping
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -G, --no-group in a long listing, don't print group names
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -h, --human-readable with -l and -s, print sizes like 1K 234M 2G etc.
--si likewise, but use powers of 1000 not 1024
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -H, --dereference-command-line
follow symbolic links listed on the command line
--dereference-command-line-symlink-to-dir
follow each command line symbolic link
that points to a directory
--hide=PATTERN do not list implied entries matching shell PATTERN
(overridden by -a or -A)
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," --hyperlink[=WHEN] hyperlink file names; WHEN can be 'always'
(default if omitted), 'auto', or 'never'
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," --indicator-style=WORD append indicator with style WORD to entry names:
none (default), slash (-p),
file-type (--file-type), classify (-F)
-i, --inode print the index number of each file
-I, --ignore=PATTERN do not list implied entries matching shell PATTERN
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -k, --kibibytes default to 1024-byte blocks for disk usage;
used only with -s and per directory totals
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -l use a long listing format
-L, --dereference when showing file information for a symbolic
link, show information for the file the link
references rather than for the link itself
-m fill width with a comma separated list of entries
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -n, --numeric-uid-gid like -l, but list numeric user and group IDs
-N, --literal print entry names without quoting
-o like -l, but do not list group information
-p, --indicator-style=slash
append / indicator to directories
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -q, --hide-control-chars print ? instead of nongraphic characters
--show-control-chars show nongraphic characters as-is (the default,
unless program is 'ls' and output is a terminal)
-Q, --quote-name enclose entry names in double quotes
--quoting-style=WORD use quoting style WORD for entry names:
literal, locale, shell, shell-always, shell-escape, shell-escape-always, c, escape
(overrides QUOTING_STYLE environment variable)
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -r, --reverse reverse order while sorting
-R, --recursive list subdirectories recursively
-s, --size print the allocated size of each file, in blocks
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -S sort by file size, largest first
--sort=WORD sort by WORD instead of name: none (-U), size (-S),
time (-t), version (-v), extension (-X)
--time=WORD change the default of using modification times;
access time (-u): atime, access, use;
change time (-c): ctime, status;
birth time: birth, creation;
with -l, WORD determines which time to show;
with --sort=time, sort by WORD (newest first)
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," --time-style=TIME_STYLE time/date format with -l; see TIME_STYLE below
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -t sort by time, newest first; see --time
-T, --tabsize=COLS assume tab stops at each COLS instead of 8
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -u with -lt: sort by, and show, access time;
with -l: show access time and sort by name;
otherwise: sort by access time, newest first
-U do not sort; list entries in directory order
-v natural sort of (version) numbers within text
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," -w, --width=COLS set output width to COLS. 0 means no limit
-x list entries by lines instead of by columns
-X sort alphabetically by entry extension
-Z, --context print any security context of each file
-1 list one file per line. Avoid '\n' with -q or -b
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," --help display this help and exit
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0," --version output version information and exit
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0,"\nThe SIZE argument is an integer and optional unit (example: 10K is 10*1024). Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000). Binary prefixes can be used, too: KiB=K, MiB=M, and so on.
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0,"\nThe TIME_STYLE argument can be full-iso, long-iso, iso, locale, or +FORMAT. FORMAT is interpreted like in date(1). If FORMAT is FORMAT1<newline>FORMAT2, then FORMAT1 applies to non-recent files and FORMAT2 to recent files. TIME_STYLE prefixed with 'posix-' takes effect only outside the POSIX locale. Also the TIME_STYLE environment variable sets the default style to use.
",5);
fputs_unlocked(pcVar5,pFVar1);
pFVar1 = *(FILE **)puVar2;
pcVar5 = (char *)dcgettext(0,"\nUsing color to distinguish file types is disabled both by default and with --color=never. With --color=auto, ls emits color codes only when standard output is connected to a terminal. The LS_COLORS environment variable can change the settings. Use the dircolors command to set it.
",5);
fputs_unlocked(pcVar5,pFVar1);
local_b8 = 0x11a0a5;
local_b0 = "test invocation";
local_a8[0] = "coreutils";
local_a8[1] = "Multi-call invocation";
local_a8[4] = "sha256sum";
local_a8[2] = "sha224sum";
local_78 = "sha384sum";
local_a8[3] = "sha2 utilities";
local_80 = "sha2 utilities";
local_70 = "sha2 utilities";
local_68 = &DAT_00119981;
local_60 = "sha2 utilities";
local_58 = 0;
local_50 = 0;
puVar9 = &local_b8;
do {
puVar8 = (undefined *)puVar9;
pcVar5 = *(char **)(puVar8 + 0x10);
puVar9 = (undefined8 *)(puVar8 + 0x10);
if (pcVar5 == (char *)0x0) break;
} while (((*pcVar5 != 'l') || (pcVar5[1] != 's')) || (pcVar5[2] != '\0'));
pcVar5 = *(char **)(puVar8 + 0x18);
if (pcVar5 == (char *)0x0) {
uVar6 = dcgettext(0,"\n%s online help: <%s>
",5);
__printf_chk(1,uVar6,"GNU coreutils","https://www.gnu.org/software/coreutils/");
pcVar5 = setlocale(5,(char *)0x0);
if (pcVar5 != (char *)0x0) {
iVar3 = strncmp(pcVar5,"en_",3);
if (iVar3 != 0) {
pcVar5 = "ls";
goto LAB_0010986b;
}
}
uVar6 = dcgettext(0,"Full documentation <%s%s>
",5);
pcVar5 = "ls";
pcVar7 = " invocation";
__printf_chk(1,uVar6,"https://www.gnu.org/software/coreutils/","ls");
}
else {
uVar6 = dcgettext(0,"\n%s online help: <%s>
",5);
__printf_chk(1,uVar6,"GNU coreutils","https://www.gnu.org/software/coreutils/");
pcVar7 = setlocale(5,(char *)0x0);
if (pcVar7 != (char *)0x0) {
iVar3 = strncmp(pcVar7,"en_",3);
if (iVar3 != 0) {
LAB_0010986b:
pFVar1 = *(FILE **)puVar2;
pcVar7 = (char *)dcgettext(0,"Report any translation bugs to <https://translationproject.org/team/>
",5);
fputs_unlocked(pcVar7,pFVar1);
}
}
uVar6 = dcgettext(0,"Full documentation <%s%s>
",5);
pcVar7 = " invocation";
__printf_chk(1,uVar6,"https://www.gnu.org/software/coreutils/","ls");
if (pcVar5 != "ls") {
pcVar7 = "";
}
}
uVar6 = dcgettext(0,"or available locally via: info '(coreutils) %s%s'
",5);
__printf_chk(1,uVar6,pcVar5,pcVar7);
LAB_0010922d:
/* WARNING: Subroutine does not return */
exit(param_1);
} Does it look ok? |
It does. I have been testing a variety of context window sizes, and while there are fewer problems with the larger context windows, there are problems there too. Would you like me to find one of those for you? |
Yes, if you have something handy, feel free to share it so we can analyze. In general, I am sure there will be failure cases for greedy sampling. But my point is that for most tasks such as knowledge extraction, coding and overall anything that requires accuracy, it's just better to do greedy sampling. Doing so, inherently gives you the most accurate tokens. Using things like repetition penalty could help you get out of some loops in certain edge cases, but IMO it's not really worth it to sacrifice the accuracy that you would otherwise get without using the penalty. Btw, what is even better than greedy sampling is beam search (same strategy as the one used in Whisper). But this is more difficult to implement for streaming cases and comes with some bigger performance penalty, so that's why I think it hasn't been adopted yet by text-generation use cases. For local usage, I think it makes sense and I'm planning to implement this as an option for code completion use cases. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Name and Version
./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4070 Laptop GPU, compute capability 8.9, VMM: yes
version: 4840 (3ffbbd5)
built with Ubuntu clang version 18.1.8 (++20240731024944+3b5b5c1ec4a3-1
exp120240731145000.144) for x86_64-pc-linux-gnuOperating systems
Linux
GGML backends
CUDA
Hardware
i9-13900HX + NVIDIA GeForce RTX 4070
Models
bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M
Problem description & steps to reproduce
cc/@ochafik
I am attempting to run BFCL on llama-server, and so far I have triggered a crash twice. It does not appear to be deterministic, unfortunately. In one instance, I was able to catch the crash with gdb. Here is the end of the backtrace:
The remaining 87096 stack frames were identical. So while I have not been able to find the exact input that triggered the crash yet, I hoped that this might be enough of a clue as to what is going on.
Here is some more information about what I am doing:
/home/ed/Projects/llama.cpp/build/bin/llama-server --ctx-size 0 --jinja -fa -hf bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M --host 0.0.0.0 -ngl 100
python /home/ed/Projects/gorilla/berkeley-function-call-leaderboard/venv/bin/bfcl generate --model gpt-4-turbo-2024-04-09-FC --test-category all --include-input-log
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: