Skip to content

xxh3 hash ignores seed if string? #10305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lolli42 opened this issue Jan 12, 2023 · 12 comments · Fixed by #15236
Closed

xxh3 hash ignores seed if string? #10305

lolli42 opened this issue Jan 12, 2023 · 12 comments · Fixed by #15236

Comments

@lolli42
Copy link

lolli42 commented Jan 12, 2023

Description

The following code:

<?php
var_dump(hash('xxh3', 'foo'));
# same hash?
var_dump(hash('xxh3', 'foo', options: ['seed' => 'mySeed']));
# different hash!
var_dump(hash('xxh3', 'foo', options: ['seed' => 42]));

Resulted in this output:

string(16) "ab6e5f64077e7d8a"
string(16) "ab6e5f64077e7d8a"
string(16) "d33da54e20ebf99e"

I'd expect the second hash to be different from the first one, just like the third hash. It seems a string as 'seed' is ignored. In case this is intended, it might be good to document this, or maybe except on string non-int input?

PHP Version

Tested with PHP 8.1.13 & 8.2.1

@damianwadley
Copy link
Member

There's very little documentation for many of the hash algorithms, but XXH3 accepts a numeric seed or a string secret of sufficient length (apparently >=136 bytes).
https://github.com/php/php-src/blob/PHP-8.2.1/ext/hash/hash_xxhash.c#L157

It makes sense to me that PHP should complain if either is provided incorrectly, yet right now it silently ignores a non-integer seed.

@cmb69
Copy link
Member

cmb69 commented Jan 13, 2023

It seems to me we should use zval_get_long() instead of actually requiring an IS_LONG value.

And while not directly related to this ticket, we also should review the convert_to_string() call for secret. It apparently is not properly prepared to handle exceptions from ::__toString().

@nielsdos
Copy link
Member

nielsdos commented Jan 13, 2023

If no one is working on fixing this yet, I'd like to give this a shot later today :) (both this issue and the toString exception)

@nielsdos
Copy link
Member

I worked on this, and used zval_get_long and checked for exceptions (see below).
Now numeric strings are handled properly. The remaining question is how to handle non-numeric strings for seed like in OP's example. Currently non-numeric strings are zero, but is this appropriate? XXH3 really wants numbers as a seed according to https://hexdocs.pm/xxh3/0.2.0/xxh3.html.

Additional test here for testing strings:

var_dump(hash('xxh3', 'foo', options: ['seed' => '1'])); // 54557a2c8b633298
var_dump(hash('xxh3', 'foo', options: ['seed' => 1])); // 54557a2c8b633298

class StringableThrowingClass {
    public function __toString(): string {
        throw new Exception('exception in __toString');
        return '';
    }
}

$testString = str_repeat('a', 136);

class StringableClass {
    public function __toString(): string {
        global $testString;
        return $testString;
    }
}

try {
    var_dump(hash('xxh3', 'foo', options: ['secret' => new StringableThrowingClass]));
} catch (Exception $e) {
    echo $e->getMessage() . "\n";
}
var_dump(hash('xxh3', 'foo', options: ['secret' => new StringableClass]));
var_dump(hash('xxh3', 'foo', options: ['secret' => $testString]));

Current changes/patch:

diff --git a/ext/hash/hash_xxhash.c b/ext/hash/hash_xxhash.c
index 7ecedd8128..36540e91b9 100644
--- a/ext/hash/hash_xxhash.c
+++ b/ext/hash/hash_xxhash.c
@@ -168,13 +168,16 @@ zend_always_inline static void _PHP_XXH3_Init(PHP_XXH3_64_CTX *ctx, HashTable *a
                        return;
                }
 
-               if (_seed && IS_LONG == Z_TYPE_P(_seed)) {
+               if (_seed) {
+                       zend_long seed_as_long = zval_get_long(_seed);
                        /* This might be a bit too restrictive, but thinking that a seed might be set
                                once and for all, it should be done a clean way. */
-                       func_init_seed(&ctx->s, (XXH64_hash_t)Z_LVAL_P(_seed));
+                       func_init_seed(&ctx->s, (XXH64_hash_t)seed_as_long);
                        return;
                } else if (_secret) {
-                       convert_to_string(_secret);
+                       if (!try_convert_to_string(_secret)) {
+                               return;
+                       }
                        size_t len = Z_STRLEN_P(_secret);
                        if (len < PHP_XXH3_SECRET_SIZE_MIN) {
                                zend_throw_error(NULL, "%s: Secret length must be >= %u bytes, %zu bytes passed", algo_name, XXH3_SECRET_SIZE_MIN, len);

@weltling
Copy link
Contributor

weltling commented Jan 14, 2023

While involving zval_get_long sounds good, it still should be a sane value. The hash is not cryptographic, but if a seed value is garbage it can possibly cause harm. Say like in this case, mySeed will evaluate to 0 which needs to be warned. Same also, say 1abc will have same effect as '1'. Same if it evaluates to negative, or is a negative - passing that is perhaps not wrong as internally it will wrap, but it's a non obvious behavior. IMO in general any non numeric string should be considered invalid. While on it, checking more on sanity would be not a waste.

Thansk

@nielsdos
Copy link
Member

I'm going to PR the string exception fix later today, because that can stand on its own as a commit and shouldn't impact BC.

As for the string seed: I think it might be a bad idea after all to use zval_get_long because of the implicit behaviour with strings like "1abc" and non-numeric strings (like the concern weltling raised in the previoud comment). Perhaps it is best to be strict about this and only allow LONGs (or floats that are actually longs), and if the input is any other type we could throw.
Also interesting is that I found that murmurhash also implicitly ignores its option if it is not a long, so that needs changing as well... I didn't check the other hash functions though. In short, I would like to propose to let those functions throw if a non-long arg is passed. That solution however technically breaks BC and thus should go into 8.3... what do y'all think? :)

@Girgias
Copy link
Member

Girgias commented Jan 17, 2023

I'm going to PR the string exception fix later today, because that can stand on its own as a commit and shouldn't impact BC.

As for the string seed: I think it might be a bad idea after all to use zval_get_long because of the implicit behaviour with strings like "1abc" and non-numeric strings (like the concern weltling raised in the previoud comment). Perhaps it is best to be strict about this and only allow LONGs (or floats that are actually longs), and if the input is any other type we could throw. Also interesting is that I found that murmurhash also implicitly ignores its option if it is not a long, so that needs changing as well... I didn't check the other hash functions though. In short, I would like to propose to let those functions throw if a non-long arg is passed. That solution however technically breaks BC and thus should go into 8.3... what do y'all think? :)

This may or may not need a deprecation and could be added to the mass PHP 8.3 deprecation RFC, but at least having an implementation to look at the impact is a good idea.

Girgias pushed a commit that referenced this issue Jan 17, 2023
The initialization routine for XXH3 was not prepared for exceptions from seed.
Fix this by using try_convert_to_string.

For discussion, please see: GH-10305

Closes GH-10352

Signed-off-by: George Peter Banyard <[email protected]>
@nielsdos
Copy link
Member

nielsdos commented Jan 20, 2023

This may or may not need a deprecation and could be added to the mass PHP 8.3 deprecation RFC, but at least having an implementation to look at the impact is a good idea.

I agree with this, adding the current behaviour to the deprecation RFC would be a good idea.
I'm however unsure how to proceed from here. I was thinking of creating two PRs:

  1. A PR targeting master that adds a deprecation warning for the current behaviour.
  2. A (draft) PR targeting master (but for PHP>8.3) where the new behaviour is introduced.

In that way, we can see the impact on current and on future versions of PHP.
Does that sound good to you?

@lolli42
Copy link
Author

lolli42 commented Feb 26, 2024

I hope its ok to ping here. Same result with script from initial post and PHP 8.3.3.

@nielsdos
Copy link
Member

I hope its ok to ping here. Same result with script from initial post and PHP 8.3.3.

Of course it's okay to ping.

The consensus was that we can't make it throw without having it deprecated first. So I added it to the (in-draft) 8.4 deprecation rfc as a todo: https://wiki.php.net/rfc/deprecations_php_8_4#non-numeric_seed_strings_in_xxh3

@lolli42
Copy link
Author

lolli42 commented Apr 27, 2024

Note the same issue exists with xxh128 (the 128 bit variant of 64 bit xxh3), and maybe other hash algos of that family?

@nielsdos
Copy link
Member

Note the same issue exists with xxh128 (the 128 bit variant of 64 bit xxh3), and maybe other hash algos of that family?

You're correct, I changed the stub in the RFC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants