-
Notifications
You must be signed in to change notification settings - Fork 273
Enhance split_string to support splitting on white space [TG-2922] #2071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance split_string to support splitting on white space [TG-2922] #2071
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I like the thorough tests. Consider squashing the commits to have only one function adjustment commit and one unit test commit.
Nit: The |
@svorenova No - it was (I assume) written by @danpoe before he started at Diffblue so I assume can be left unchanged. |
@svorenova Also respectfully disagree with squashing suggestion - I think current structure allows for checking a) correctly migrate existing unit tests b) correctly refactor existing unit tests c) correctly add unit tests for current behaviour d) correctly modify behaviour Instead I propose squashing into:
LMK if that's OK |
Yes, that sounds great, thank you. |
c00350b
to
6609357
Compare
Rebased as suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
expected_results.strip_no_remove = {""}; | ||
|
||
// TODO(tkiley): This is probably wrong, since I'd expect removing empty | ||
// TODO(tkiley): elements to return an empty vector here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we could change this to return an empty vector. I've looked at the places where split_string()
is currently used, and they would also work when returning an empty vector (except an assertion assert(!result.empty())
would need to be removed in util/unwind.cpp
). Also once changed, the current usages of split_string()
can be slightly simplified as some of them have a check for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent. Since this PR is already a pre-req for another I will put together a separate PR to address this.
Specifically requesting a review from @chrisr-diffblue (somewhat arbitrarily). |
src/util/string_utils.cpp
Outdated
assert(result.empty()); | ||
assert(!std::isspace(delim)); | ||
if(!result.empty()) | ||
throw std::invalid_argument("result: vector must be empty"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have used PRECONDITION
here as we don't generally throw exceptions unless it's triggerable by user input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah reasonable - of course the method should just return the vector - if not hard will make as part of subsequent PR for empty elements.
…trip Also improved comments (and moved to doxygen) as I assumed strip would remove whitespace from throughout the string but actually only does the final string Replace asserts in string utils Adding tests for behaviour with whitespace delimiter
6609357
to
a385d9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor comments, but nothing that would block this from my point of view.
src/util/string_utils.cpp
Outdated
{ | ||
throw std::invalid_argument( | ||
"delim can't be a space character if using strip"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason that this condition/throw block can't just be an INVARIANT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - ok, though that smells a little bit of the mechanics of unit testing driving the architecture of the code, but I don't have a hugely strong opinion. I just felt that an INVARIANT, or perhaps actually PRECONDITION now I think more, was more inline with other functions in this file that do things like PRECONDITION(result.empty());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am with @chrisr-diffblue: I think this should be a PRECONDITION
. I think invariant.h
ought to have the facilities to turn that into something that can be caught, and hence could be used for testing error paths in unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree - I think of all the code we write, util
code is the most "library like" and not throwing limits the utility of this function. Imagine Using split_string
on a user input: I now have to duplicate the error handling code if I want to allow the user to try again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @thk123 does make a good point about the 'library like' use-case - perhaps the answer would be to have 'CATCHABLE_PRECONDITION' along with @tautschnig suggestion to enable error path testing? - i.e. in the library use-case Thomas is talking about, it could be caught by the caller to avoid duplicating the error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two different "users" to be distinguished: the developer using the library and the end-user feeding input to what the developer has built. The end-user should not be subjected to internal limitations, but for a developer it's reasonable to assume that they have read the documentation.
It is of course perfectly reasonable to use exceptions for either class of "users," but that's not what the code base does right now. If you think this should change, I'd go via a new issue and/or have some further discussion. Because your proposed code is of course very nice and clean I'd keep it so that a future patch to introduce the exceptions is very low effort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all this I do think that PRECONDITION_STRUCTURED
should be able to support this. There are a few cases of INVARIANT_STRUCTURED
in the code base. (But also invariant.h
says something about the future...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm not clear on what you propose I do with this code for this PR? Do you mean keep the code in this PR, or change this PR to a precondition (and remove the associated unit tests) and stash this version away for a later day?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is what I'd do if it were my PR: if things can be made to work via PRECONDITION_STRUCTURED
, use it. Else stash it away for a later day and drop a note in an issue so that it gets tracked.
@@ -87,7 +106,11 @@ void split_string( | |||
std::string &right, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheeky of me, but as you are touching this function, do you fancy adding some Doxygen? ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will pick this up as part of the follow on tidy up PR as this is blocking another PR #2046
Gah got the code ownership wrong, still need someone to vouch for unit changes. @tautschnig, as you've already looked at this a bit would you mind? |
@tautschnig Unfortunately |
@tautschnig If you're happy (as implied by a 👍) could you approve? Apparently don't yet have sufficient code owner approval. |
ad62682 Merge pull request diffblue#2071 from thk123/refactor/split-string-unit-tests fc8ba88 Revert to aborting precondition for function inputs 3e2ab6f Merge pull request diffblue#2080 from diffblue/java-bytecode-dependency 6ff1eec cbmc: remove dependency on java_bytecode 0bff19b Merge pull request diffblue#2049 from karkhaz/kk-factor-goto-model-processing 79e3b25 Merge pull request diffblue#2084 from tautschnig/has_subtype-test cd45b0b Test has_subtype on recursive data types 85ac315 Merge pull request diffblue#2082 from thomasspriggs/default_dstring_hash 28c2e8b Merge pull request diffblue#2065 from tautschnig/be-constructor afa6023 Merge pull request diffblue#2061 from tautschnig/simplify-extractbits 014d151 Factor out getting & processing goto-model 06b3adc Merge pull request diffblue#2077 from peterschrammel/stable-tmp-var-names 0b3170d Stabilize clinit wrapper function type parameters 3cd8bf4 Temporary vars tests for goto-diff 9f0626c Prefix temporary var names with function id ca678aa More permissive regression tests regarding tmp var suffixes 47951ca Merge pull request diffblue#2079 from romainbrenguier/bugfix/has-subtype-recursion dd73b1a Specify default hash function of `dstringt` to STL. fe8e589 Avoid infinite recursion in has_subtype 00b9bf6 Merge pull request diffblue#2051 from svorenova/generics_tg2717 cd4bfc3 Merge pull request diffblue#2078 from romainbrenguier/bool-literal-in-while-loop 67ea889 Use bool literal in while loop d229ad9 Merge pull request diffblue#2056 from diffblue/fix-regression-cbmc-memcpy1 506faf0 Refactor a function for base existence 617d388 Utility functions for generic types c07e6ca Update generic specialization map when replacing pointers ed26d0a Merge pull request diffblue#2058 from peterschrammel/stable-disjuncts b663734 Simplify extractbits(concatenation(...)) b091560 Typing and refactoring of simplify_extractbits 49ad1bd Merge pull request diffblue#974 from tautschnig/fix-assert-encoding 16e9599 Merge pull request diffblue#2063 from tautschnig/has-subtype 950f58b Merge pull request diffblue#2060 from tautschnig/opt-local-map 4222a94 Regression tests for unstable instanceof and virtual method disjuncts b44589e Make disjuncts in instanceof removal independent of class loading 3afff86 Make disjuncts in virtual method removal independent of class loading a385d9b Allowed split_string to be used on whitespace if not also trying to strip fe4a642 Merge pull request diffblue#2062 from tautschnig/no-has-deref 145f474 Adding tests for empty string edge cases 07009d4 Refactored test to run all combinations 252c24c Migrate old string utils unit tests e87edbf Removing wrong elements from the make file b165c52 make test work on 32-bit Linux b804570 Merge pull request diffblue#2048 from JohnDumbell/improvement/adding_null_object_id 61f14d8 Merge pull request diffblue#1962 from owen-jones-diffblue/owen-jones-diffblue/simplify-replace-java-nondet fdee7e8 Merge pull request diffblue#2059 from tautschnig/generalise-test 4625cc5 Extend global has_subexpr to take a predicate as has_subtype does e9ebd59 has_subtype(type, pred, ns) to search for contained type matching pred 1f1f67f Merge pull request diffblue#1889 from hannes-steffenhagen-diffblue/develop-feature_generate_function_bodies 048c188 Add unit test for java_replace_nondet 0fe48c9 Make remove_java_nondet work before remove_returns bcc4dc4 Use byte_extract_exprt constructor a1814a3 Get rid of thin (and duplicate) has_dereference wrapper 4122a28 Test to demonstrate assert bug on alpine d44bfd3 Also simplify assert structure found on alpine linux c5cde18 Do not generate redundant if statements in assert expansions 4fb0603 Make is_skip publicly available and use constant argument 5832ffd Negative numbers should also pass the test 3c23b28 Consistently disable simplify_exprt::local_replace_map da63652 Merge pull request diffblue#2054 from romainbrenguier/bugfix/clear-equations d77f6a2 Merge pull request diffblue#1831 from NathanJPhillips/feature/class-annotations 60c8296 Clear string_refinement equations (not dependencies) 314ed53 Correcting the value of ID_null_object 751a882 Factor out default CBMC options to static method 6f24009 Can now test for an option being set in optionst 9a8d937 Add to_annotated_type and enable type_checked_cast for annotated_typet ca77b4e Add test for added annotations b06a27d Introduce abstract qualifierst base class e6fb3bf Pretty printing of java_class_typet e22b95b Fix spurious virtual function related keywords 3ac6d17 Add type_dynamic_cast and friends for java_class_typet ce1f4d2 Add annotations to java_class_typet, its methods and fields f84753d Merge pull request diffblue#2042 from hannes-steffenhagen-diffblue/add_deprecate_macro 7a38669 Merge pull request diffblue#2017 from NathanJPhillips/feature/overlay-classes 75a4aec Revert "the deprecation will need to wait until codebase is clean" 67735b5 Disable deprecation warnings by default 0764f77 Merge pull request diffblue#2036 from romainbrenguier/id_array_list 690b606 Merge pull request diffblue#2039 from peterschrammel/fix-duplicate-msg-json-ui bba17d9 the deprecation will need to wait until codebase is clean 822c757 Regression test for redundant JSON message output de0644d Only force end of previous message if there actually is one. 5a637bf Merge pull request diffblue#2037 from hannes-steffenhagen-diffblue/add_deprecate_macro bff456a Merge pull request diffblue#2040 from tautschnig/remove-swp 87ebe90 Remove vim temp file 228c019 Fix duplicate message output in json-ui 0a2c43e Add DEPRECATED to functions documented as \deprecated 47f4b35 interval_sparse_arrayt constructor from array-list 026c4ca Declare an array_list_exprt class 50a2696 Define ID_array_list 513b67a Merge pull request diffblue#2038 from romainbrenguier/bugfix/assign-at-malloc-site c207106 Test with array of strings 60183a3 Assign string at malloc site 116fffd Add DEPRECATED macro to mark deprecated functions and variables 7952f2c Add option to generate function body to goto-instrument 3d4183a Add ability to overlay classes with new definitions of existing methods dbc2f71 Improve code and error message in infer_opaque_type_fields 7c0ea4d Tidied up java_class_loader_limitt git-subtree-dir: cbmc git-subtree-split: ad62682
Split_string now won't error if you give it a whitespace delimiter and don't require it strip whitespace. Migrated the unit tests into Catch and added a few more examples. Made the utility throw exceptions rather than immediately terminate to allow the error cases to be unit tested as well.
The unit tests document the current behaviour. However, in some edge cases I believe the current behaviour is confusing. If someone knows for certain we don't depend on that behaviour I can prepare a PR to "fix" the behaviour.
Review commit by commit.