[User] Interactive mode immediately exits on Windows with Zig

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

When building with Zig, running the example command for Alpaca results in an interactive prompt that I can type in.

# Current Behavior

When building with Zig, running the example command for Alpaca results in an interactive prompt that exits immediately, without producing an error message. Non-interactive mode works fine.

When using Command Prompt specifically, the process exits and leaves the console text green - that doesn't happen in PowerShell or Git Bash, which reset the console color. I think that means that it isn't reaching [this line](https://github.com/ggerganov/llama.cpp/blob/74f5899df4a6083fc467b620baa1cf821e37799d/examples/main/main.cpp#L472).

The commands, for reference:
```pwsh
zig build -Drelease-fast
.\zig-out\bin\main.exe -m .\models\ggml-alpaca-7b-q4.bin --color -f .\prompts\alpaca.txt --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7
```

I also tried using LLaMA in interactive mode, which resulted in the same behavior.

```pwsh
.\zig-out\bin\main.exe -m D:\llama\LLaMA\7B\ggml-model-q4_0.bin -ins
```

Building with MSVC via CMake produces a binary that works perfectly fine (besides also leaving the Command Prompt console text green when exiting with Ctrl+C).

```pwsh
mkdir build
cd build
cmake ..
cmake --build . --config Release
cd ..
.\build\bin\Release\main.exe -m .\models\ggml-alpaca-7b-q4.bin -f .\prompts\alpaca.txt --color --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7
```

# Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

* Physical (or virtual) hardware you are using:

Device name	DESKTOP-HP640DM
Processor	11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz   3.50 GHz
Installed RAM	24.0 GB (23.8 GB usable)
System type	64-bit operating system, x64-based processor
Pen and touch	Pen support

* Operating System:

Edition	Windows 10 Home
Version	22H2
Installed on	‎3/‎17/‎2021
OS build	19045.2728
Experience	Windows Feature Experience Pack 120.2212.4190.0

* SDK version:

```pwsh
> pyenv exec python3 --version
Python 3.10.9
> cmake --version
cmake version 3.20.0-rc5
> cmake ..
-- Building for: Visual Studio 16 2019
-- Selecting Windows SDK version 10.0.22000.0 to target Windows 10.0.19045.
-- The C compiler identification is MSVC 19.29.30148.0
-- The CXX compiler identification is MSVC 19.29.30148.0
<snip>
> zig version
0.10.1
> git log | head -1
commit 74f5899df4a6083fc467b620baa1cf821e37799d
```

* Model checksums:

```pwsh
> md5sum .\models\ggml-alpaca-7b-q4.bin
\7a81638857b7e03f7e3482f3e68d78bc *.\\models\\ggml-alpaca-7b-q4.bin
> md5sum D:\llama\LLaMA\7B\ggml-model-q4_0.bin
\b96f7e3c1cd6dcc6ffd9aaf975b776e5 *D:\\llama\\LLaMA\\7B\\ggml-model-q4_0.bin
```

# Failure Information (for bugs)

Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.

# Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

1. Clone the repo (https://github.com/ggerganov/llama.cpp/commit/74f5899df4a6083fc467b620baa1cf821e37799d)
2. Run `zig build -Drelease-fast`
3. Run the example command (adjusted slightly for the env): `.\zig-out\bin\main.exe -m .\models\ggml-alpaca-7b-q4.bin --color -f .\prompts\alpaca.txt --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7`
4. Observe that the process exits immediately after reading the prompt

# Failure Logs

Running the Zig build:
```pwsh
> .\zig-out\bin\main.exe -m .\models\ggml-alpaca-7b-q4.bin --color -f .\prompts\alpaca.txt --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7
main: seed = 1681616581
llama.cpp: loading model from .\models\ggml-alpaca-7b-q4.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format     = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required  = 5809.32 MB (+ 1026.00 MB per state)
...................................................................................................
.
llama_init_from_file: kv self size  = 1024.00 MB

system_info: n_threads = 7 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: temp = 0.200000, top_k = 10000, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.000000
generate: n_ctx = 2048, n_batch = 256, n_predict = -1, n_keep = 23


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.

 Below is an instruction that describes a task. Write a response that appropriately completes the request.
>

> # (process exited automatically)
```

Running the MSVC/CMake build:
```pwsh
> .\build\bin\Release\main.exe -m .\models\ggml-alpaca-7b-q4.bin -f .\prompts\alpaca.txt --color --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7
main: seed = 1681616683
llama.cpp: loading model from .\models\ggml-alpaca-7b-q4.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format     = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required  = 5809.32 MB (+ 1026.00 MB per state)
...................................................................................................
.
llama_init_from_file: kv self size  = 1024.00 MB

system_info: n_threads = 7 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: temp = 0.200000, top_k = 10000, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.000000
generate: n_ctx = 2048, n_batch = 256, n_predict = -1, n_keep = 23


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.

 Below is an instruction that describes a task. Write a response that appropriately completes the request.
> Tell me something I don't know.
The longest river in the world is the Nile River, which stretches 6,650 km (4,130 miles) across the continent of Africa.
>

> # (Ctrl+C)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[User] Interactive mode immediately exits on Windows with Zig #1010

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[User] Interactive mode immediately exits on Windows with Zig #1010

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions