Skip to content

Enhance split_string to support splitting on white space [TG-2922] #2071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 19, 2018

Conversation

thk123
Copy link
Contributor

@thk123 thk123 commented Apr 16, 2018

Split_string now won't error if you give it a whitespace delimiter and don't require it strip whitespace. Migrated the unit tests into Catch and added a few more examples. Made the utility throw exceptions rather than immediately terminate to allow the error cases to be unit tested as well.

The unit tests document the current behaviour. However, in some edge cases I believe the current behaviour is confusing. If someone knows for certain we don't depend on that behaviour I can prepare a PR to "fix" the behaviour.

Review commit by commit.

@thk123 thk123 requested review from danpoe and majakusber April 16, 2018 11:06
@thk123 thk123 changed the title Enhance split_string to support splitting on white space Enhance split_string to support splitting on white space [TG-2922] Apr 16, 2018
Copy link

@majakusber majakusber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I like the thorough tests. Consider squashing the commits to have only one function adjustment commit and one unit test commit.

@majakusber
Copy link

Nit: The string_utils.cpp file has Daniel Poetzl as author, should that be changed to Diffblue Ltd.?

@thk123
Copy link
Contributor Author

thk123 commented Apr 16, 2018

@svorenova No - it was (I assume) written by @danpoe before he started at Diffblue so I assume can be left unchanged.

@thk123
Copy link
Contributor Author

thk123 commented Apr 16, 2018

@svorenova Also respectfully disagree with squashing suggestion - I think current structure allows for checking a) correctly migrate existing unit tests b) correctly refactor existing unit tests c) correctly add unit tests for current behaviour d) correctly modify behaviour

Instead I propose squashing into:

  • Removing wrong elements from the make file (incidental fix so will keep separate).
  • migrate unit tests
  • refactor unit tests
  • extend unit tests
  • modify function directly to throwing exceptions and adding comments and unit tests for new behaviour

LMK if that's OK

@majakusber
Copy link

Yes, that sounds great, thank you.

@thk123 thk123 force-pushed the refactor/split-string-unit-tests branch from c00350b to 6609357 Compare April 16, 2018 13:20
@thk123
Copy link
Contributor Author

thk123 commented Apr 16, 2018

Rebased as suggested.

Copy link
Contributor

@danpoe danpoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

expected_results.strip_no_remove = {""};

// TODO(tkiley): This is probably wrong, since I'd expect removing empty
// TODO(tkiley): elements to return an empty vector here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we could change this to return an empty vector. I've looked at the places where split_string() is currently used, and they would also work when returning an empty vector (except an assertion assert(!result.empty()) would need to be removed in util/unwind.cpp). Also once changed, the current usages of split_string() can be slightly simplified as some of them have a check for this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. Since this PR is already a pre-req for another I will put together a separate PR to address this.

@thk123
Copy link
Contributor Author

thk123 commented Apr 16, 2018

Specifically requesting a review from @chrisr-diffblue (somewhat arbitrarily).

assert(result.empty());
assert(!std::isspace(delim));
if(!result.empty())
throw std::invalid_argument("result: vector must be empty");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have used PRECONDITION here as we don't generally throw exceptions unless it's triggerable by user input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah reasonable - of course the method should just return the vector - if not hard will make as part of subsequent PR for empty elements.

…trip

Also improved comments (and moved to doxygen) as I assumed strip would
remove
whitespace from throughout the string but actually only does the final
string

Replace asserts in string utils

Adding tests for behaviour with whitespace delimiter
@thk123 thk123 force-pushed the refactor/split-string-unit-tests branch from 6609357 to a385d9b Compare April 16, 2018 14:40
Copy link
Contributor

@chrisr-diffblue chrisr-diffblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor comments, but nothing that would block this from my point of view.

{
throw std::invalid_argument(
"delim can't be a space character if using strip");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that this condition/throw block can't just be an INVARIANT ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the unit test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - ok, though that smells a little bit of the mechanics of unit testing driving the architecture of the code, but I don't have a hugely strong opinion. I just felt that an INVARIANT, or perhaps actually PRECONDITION now I think more, was more inline with other functions in this file that do things like PRECONDITION(result.empty());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am with @chrisr-diffblue: I think this should be a PRECONDITION. I think invariant.h ought to have the facilities to turn that into something that can be caught, and hence could be used for testing error paths in unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree - I think of all the code we write, util code is the most "library like" and not throwing limits the utility of this function. Imagine Using split_string on a user input: I now have to duplicate the error handling code if I want to allow the user to try again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @thk123 does make a good point about the 'library like' use-case - perhaps the answer would be to have 'CATCHABLE_PRECONDITION' along with @tautschnig suggestion to enable error path testing? - i.e. in the library use-case Thomas is talking about, it could be caught by the caller to avoid duplicating the error handling.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two different "users" to be distinguished: the developer using the library and the end-user feeding input to what the developer has built. The end-user should not be subjected to internal limitations, but for a developer it's reasonable to assume that they have read the documentation.

It is of course perfectly reasonable to use exceptions for either class of "users," but that's not what the code base does right now. If you think this should change, I'd go via a new issue and/or have some further discussion. Because your proposed code is of course very nice and clean I'd keep it so that a future patch to introduce the exceptions is very low effort.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all this I do think that PRECONDITION_STRUCTURED should be able to support this. There are a few cases of INVARIANT_STRUCTURED in the code base. (But also invariant.h says something about the future...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm not clear on what you propose I do with this code for this PR? Do you mean keep the code in this PR, or change this PR to a precondition (and remove the associated unit tests) and stash this version away for a later day?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is what I'd do if it were my PR: if things can be made to work via PRECONDITION_STRUCTURED, use it. Else stash it away for a later day and drop a note in an issue so that it gets tracked.

@@ -87,7 +106,11 @@ void split_string(
std::string &right,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheeky of me, but as you are touching this function, do you fancy adding some Doxygen? ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will pick this up as part of the follow on tidy up PR as this is blocking another PR #2046

@thk123
Copy link
Contributor Author

thk123 commented Apr 17, 2018

Gah got the code ownership wrong, still need someone to vouch for unit changes. @tautschnig, as you've already looked at this a bit would you mind?

@thk123
Copy link
Contributor Author

thk123 commented Apr 19, 2018

@tautschnig Unfortunately PRECONDITION_STRUCTURED also abort()s and doesn't seem to be an existing version of the invariant logic that actually throws the exception. (Also, I'm required to make the exception derive from logic_error which is clearly incorrect for invalid arguments. In the interest of getting this PR moving I've changed it to a PRECONDITION. I will create a follow on PR where we can discuss a better way of doing this.

@thk123
Copy link
Contributor Author

thk123 commented Apr 19, 2018

@tautschnig If you're happy (as implied by a 👍) could you approve? Apparently don't yet have sufficient code owner approval.

@thk123 thk123 merged commit ad62682 into diffblue:develop Apr 19, 2018
@thk123 thk123 deleted the refactor/split-string-unit-tests branch April 19, 2018 12:24
smowton added a commit to smowton/cbmc that referenced this pull request May 9, 2018
ad62682 Merge pull request diffblue#2071 from thk123/refactor/split-string-unit-tests
fc8ba88 Revert to aborting precondition for function inputs
3e2ab6f Merge pull request diffblue#2080 from diffblue/java-bytecode-dependency
6ff1eec cbmc: remove dependency on java_bytecode
0bff19b Merge pull request diffblue#2049 from karkhaz/kk-factor-goto-model-processing
79e3b25 Merge pull request diffblue#2084 from tautschnig/has_subtype-test
cd45b0b Test has_subtype on recursive data types
85ac315 Merge pull request diffblue#2082 from thomasspriggs/default_dstring_hash
28c2e8b Merge pull request diffblue#2065 from tautschnig/be-constructor
afa6023 Merge pull request diffblue#2061 from tautschnig/simplify-extractbits
014d151 Factor out getting & processing goto-model
06b3adc Merge pull request diffblue#2077 from peterschrammel/stable-tmp-var-names
0b3170d Stabilize clinit wrapper function type parameters
3cd8bf4 Temporary vars tests for goto-diff
9f0626c  Prefix temporary var names with function id
ca678aa More permissive regression tests regarding tmp var suffixes
47951ca Merge pull request diffblue#2079 from romainbrenguier/bugfix/has-subtype-recursion
dd73b1a Specify default hash function of `dstringt` to STL.
fe8e589 Avoid infinite recursion in has_subtype
00b9bf6 Merge pull request diffblue#2051 from svorenova/generics_tg2717
cd4bfc3 Merge pull request diffblue#2078 from romainbrenguier/bool-literal-in-while-loop
67ea889 Use bool literal in while loop
d229ad9 Merge pull request diffblue#2056 from diffblue/fix-regression-cbmc-memcpy1
506faf0 Refactor a function for base existence
617d388 Utility functions for generic types
c07e6ca Update generic specialization map when replacing pointers
ed26d0a Merge pull request diffblue#2058 from peterschrammel/stable-disjuncts
b663734 Simplify extractbits(concatenation(...))
b091560 Typing and refactoring of simplify_extractbits
49ad1bd Merge pull request diffblue#974 from tautschnig/fix-assert-encoding
16e9599 Merge pull request diffblue#2063 from tautschnig/has-subtype
950f58b Merge pull request diffblue#2060 from tautschnig/opt-local-map
4222a94 Regression tests for unstable instanceof and virtual method disjuncts
b44589e Make disjuncts in instanceof removal independent of class loading
3afff86 Make disjuncts in virtual method removal independent of class loading
a385d9b Allowed split_string to be used on whitespace if not also trying to strip
fe4a642 Merge pull request diffblue#2062 from tautschnig/no-has-deref
145f474 Adding tests for empty string edge cases
07009d4 Refactored test to run all combinations
252c24c Migrate old string utils unit tests
e87edbf Removing wrong elements from the make file
b165c52 make test work on 32-bit Linux
b804570 Merge pull request diffblue#2048 from JohnDumbell/improvement/adding_null_object_id
61f14d8 Merge pull request diffblue#1962 from owen-jones-diffblue/owen-jones-diffblue/simplify-replace-java-nondet
fdee7e8 Merge pull request diffblue#2059 from tautschnig/generalise-test
4625cc5 Extend global has_subexpr to take a predicate as has_subtype does
e9ebd59 has_subtype(type, pred, ns) to search for contained type matching pred
1f1f67f Merge pull request diffblue#1889 from hannes-steffenhagen-diffblue/develop-feature_generate_function_bodies
048c188 Add unit test for java_replace_nondet
0fe48c9 Make remove_java_nondet work before remove_returns
bcc4dc4 Use byte_extract_exprt constructor
a1814a3 Get rid of thin (and duplicate) has_dereference wrapper
4122a28 Test to demonstrate assert bug on alpine
d44bfd3 Also simplify assert structure found on alpine linux
c5cde18 Do not generate redundant if statements in assert expansions
4fb0603 Make is_skip publicly available and use constant argument
5832ffd Negative numbers should also pass the test
3c23b28 Consistently disable simplify_exprt::local_replace_map
da63652 Merge pull request diffblue#2054 from romainbrenguier/bugfix/clear-equations
d77f6a2 Merge pull request diffblue#1831 from NathanJPhillips/feature/class-annotations
60c8296 Clear string_refinement equations (not dependencies)
314ed53 Correcting the value of ID_null_object
751a882 Factor out default CBMC options to static method
6f24009 Can now test for an option being set in optionst
9a8d937 Add to_annotated_type and enable type_checked_cast for annotated_typet
ca77b4e Add test for added annotations
b06a27d Introduce abstract qualifierst base class
e6fb3bf Pretty printing of java_class_typet
e22b95b Fix spurious virtual function related keywords
3ac6d17 Add type_dynamic_cast and friends for java_class_typet
ce1f4d2 Add annotations to java_class_typet, its methods and fields
f84753d Merge pull request diffblue#2042 from hannes-steffenhagen-diffblue/add_deprecate_macro
7a38669 Merge pull request diffblue#2017 from NathanJPhillips/feature/overlay-classes
75a4aec Revert "the deprecation will need to wait until codebase is clean"
67735b5 Disable deprecation warnings by default
0764f77 Merge pull request diffblue#2036 from romainbrenguier/id_array_list
690b606 Merge pull request diffblue#2039 from peterschrammel/fix-duplicate-msg-json-ui
bba17d9 the deprecation will need to wait until codebase is clean
822c757 Regression test for redundant JSON message output
de0644d Only force end of previous message if there actually is one.
5a637bf Merge pull request diffblue#2037 from hannes-steffenhagen-diffblue/add_deprecate_macro
bff456a Merge pull request diffblue#2040 from tautschnig/remove-swp
87ebe90 Remove vim temp file
228c019 Fix duplicate message output in json-ui
0a2c43e Add DEPRECATED to functions documented as \deprecated
47f4b35 interval_sparse_arrayt constructor from array-list
026c4ca Declare an array_list_exprt class
50a2696 Define ID_array_list
513b67a Merge pull request diffblue#2038 from romainbrenguier/bugfix/assign-at-malloc-site
c207106 Test with array of strings
60183a3 Assign string at malloc site
116fffd Add DEPRECATED macro to mark deprecated functions and variables
7952f2c Add option to generate function body to goto-instrument
3d4183a Add ability to overlay classes with new definitions of existing methods
dbc2f71 Improve code and error message in infer_opaque_type_fields
7c0ea4d Tidied up java_class_loader_limitt

git-subtree-dir: cbmc
git-subtree-split: ad62682
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants