Skip to content

Added split operations to IsSequence, EqSequence #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 23, 2015

Conversation

richsmith92
Copy link
Contributor

No description provided.

@@ -411,6 +410,12 @@ class (Monoid seq, MonoTraversable seq, SemiSequence seq, MonoPointed seq) => Is
unsafeIndex :: seq -> Index seq -> Element seq
unsafeIndex = indexEx


-- | 'splitWhen' splits a sequence into components delimited by separators, where the
-- predicate returns True for a separator element.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a comment about whether the separators are retained or discarded? A concrete example would be useful. At least I personally get confused about this whenever I use split-style functions.

@snoyberg
Copy link
Owner

In addition to the documentation comments, would you be able to add a test case for this, to codify the semantics?

@richsmith92
Copy link
Contributor Author

I'll add following (from ByteString docs):

The resulting components do not contain the separators. Two adjacent separators result in an empty component in the output.

And will work on test case as well

@gregwebs
Copy link
Collaborator

I find the naming of these unintuitive (even if it may be consistent with some of the underlying libraries).
Our current naming scheme uses on to signify a function for groupOn (in that case it is returning an Ord). So splitOn and splitWhen would mean the same thing by that convention

I think split from the patch can be renamed to splitBy or splitAt or splitElem.
Perhaps splitOn from the patch can be renamed to splitSeq ?

@MaxGabriel
Copy link
Collaborator

IIRC the different splitting operators have slightly different edge case behavior (IIRC it was how they handle being given an empty string as the separator, or how splitting on a string consisting of just the separator works). I don't remember for sure but it might be worth double checking.

@richsmith92
Copy link
Contributor Author

Considering the naming, Data.List.Split, ByteString and Text provide three different naming schemes. I followed the first, except I added lacking split :: Element seq -> seq -> [seq]. I find myself using split most of time, especially for CSV/TSV parsing, so I judged it deserves the shortest name.

I think the most intuitive short name for seq -> seq -> [seq] must be splitSeq. Following the same logic, I could rename split to splitElem, but then nice short split name is unused.

To summarize:

splitWhen is good name already
splitOn should be renamed to splitSeq (better ideas?)
split should either stay same or be renamed to splitElem

@gregwebs
Copy link
Collaborator

@w3rs I agree with your explanation. Do you use classy-prelude? One option is to re-export splitElem as split in classy-prelude. I am still wondering if there is something more intuitive: splitAt captures it for me, except that at often refers to integer indexed.

@richsmith92
Copy link
Contributor Author

Do you use classy-prelude? One option is to re-export splitElem as split in classy-prelude.

@gregwebs yes, it makes sense.

splitAt captures it for me, except that at often refers to integer indexed.

This name is actually already taken with this purpose:

splitAt :: Index seq -> seq -> (seq, seq)

@gregwebs
Copy link
Collaborator

I am favoring splitElem, but I will go with consensus (or better ideas that come up).

Rename `split` to `splitElem`, `splitOn` to `splitSeq`.

Split functions now follow `Data.List.Split` rules for edge
cases. These rules are also encoded in test cases.
@richsmith92
Copy link
Contributor Author

I pushed the new commit with improvements. It also has new intercalate function for IsSequence (I needed it as inverse for splitSeq)

@gregwebs
Copy link
Collaborator

Looking good! If you want a shorter name, you could use splitEq instead of splitElem

@richsmith92
Copy link
Contributor Author

@gregwebs I think I can bear a couple extra chars after all.

@snoyberg
Copy link
Owner

Looks good to me, I'm happy to merge thanks for all the improvements.

Can you both confirm that this is ready to go?

@richsmith92
Copy link
Contributor Author

@snoyberg since we couldn't find much better names than splitElem and splitSeq, I'm fine with what we have.

--
-- 'splitElem' can be considered a special case of 'splitSeq'
--
-- > 'splitSeq' (singleton sep) === 'splitElem sep'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot use '' with >. It produces invalid documentation. You must wrap the statement like:

@
'splitSeq' (singleton sep) === 'splitElem' sep
@

@richsmith92
Copy link
Contributor Author

@kbillings good catch! I had no success with stack haddock and forgot to ask others to review the docs. Will remove single quotes from lines with >

@gregwebs
Copy link
Collaborator

thanks a lot!

gregwebs added a commit that referenced this pull request Aug 23, 2015
Added split operations to IsSequence, EqSequence
@gregwebs gregwebs merged commit de9149f into snoyberg:master Aug 23, 2015
snoyberg added a commit that referenced this pull request Aug 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants