Copyright	(c) 2009 2010 2011 2012 Bryan O'Sullivan (c) 2009 Duncan Coutts (c) 2008 2009 Tom Harper (c) 2021 Andrew Lelechenko
License	BSD-style
Maintainer	[email protected]
Portability	GHC
Safe Haskell	Trustworthy
Language	Haskell2010

Data.Text

Contents

Strict vs lazy types
Acceptable data
Definition of character
Fusion
Types
Creation and elimination
Basic interface
Transformations
- Case conversion
- Justification
Folds
- Special folds
Construction
Substrings
Predicates
- View patterns
Searching
Indexing
Zipping
Low level operations
Orphan instances

Description

A time and space-efficient implementation of Unicode text. Suitable for performance critical use, both in terms of large data quantities and high speed.

Note: Read below the synopsis for important notes on the use of this module.

This module is intended to be imported qualified, to avoid name clashes with Prelude functions, e.g.

import qualified Data.Text as T

To use an extended and very rich family of functions for working with Unicode text (including normalization, regular expressions, non-standard encodings, text breaking, and locales), see the text-icu package.

Synopsis

data Text
pack :: String -> Text
unpack :: Text -> String
singleton :: Char -> Text
empty :: Text
cons :: Char -> Text -> Text
snoc :: Text -> Char -> Text
append :: Text -> Text -> Text
uncons :: Text -> Maybe (Char, Text)
unsnoc :: Text -> Maybe (Text, Char)
head :: HasCallStack => Text -> Char
last :: HasCallStack => Text -> Char
tail :: HasCallStack => Text -> Text
init :: HasCallStack => Text -> Text
null :: Text -> Bool
length :: Text -> Int
compareLength :: Text -> Int -> Ordering
map :: (Char -> Char) -> Text -> Text
intercalate :: Text -> [Text] -> Text
intersperse :: Char -> Text -> Text
transpose :: [Text] -> [Text]
reverse :: Text -> Text
replace :: HasCallStack => Text -> Text -> Text -> Text
toCaseFold :: Text -> Text
toLower :: Text -> Text
toUpper :: Text -> Text
toTitle :: Text -> Text
justifyLeft :: Int -> Char -> Text -> Text
justifyRight :: Int -> Char -> Text -> Text
center :: Int -> Char -> Text -> Text
foldl :: (a -> Char -> a) -> a -> Text -> a
foldl' :: (a -> Char -> a) -> a -> Text -> a
foldl1 :: HasCallStack => (Char -> Char -> Char) -> Text -> Char
foldl1' :: HasCallStack => (Char -> Char -> Char) -> Text -> Char
foldr :: (Char -> a -> a) -> a -> Text -> a
foldr' :: (Char -> a -> a) -> a -> Text -> a
foldr1 :: HasCallStack => (Char -> Char -> Char) -> Text -> Char
concat :: [Text] -> Text
concatMap :: (Char -> Text) -> Text -> Text
any :: (Char -> Bool) -> Text -> Bool
all :: (Char -> Bool) -> Text -> Bool
maximum :: HasCallStack => Text -> Char
minimum :: HasCallStack => Text -> Char
isAscii :: Text -> Bool
scanl :: (Char -> Char -> Char) -> Char -> Text -> Text
scanl1 :: (Char -> Char -> Char) -> Text -> Text
scanr :: (Char -> Char -> Char) -> Char -> Text -> Text
scanr1 :: (Char -> Char -> Char) -> Text -> Text
mapAccumL :: forall a. (a -> Char -> (a, Char)) -> a -> Text -> (a, Text)
mapAccumR :: forall a. (a -> Char -> (a, Char)) -> a -> Text -> (a, Text)
replicate :: Int -> Text -> Text
unfoldr :: (a -> Maybe (Char, a)) -> a -> Text
unfoldrN :: Int -> (a -> Maybe (Char, a)) -> a -> Text
take :: Int -> Text -> Text
takeEnd :: Int -> Text -> Text
drop :: Int -> Text -> Text
dropEnd :: Int -> Text -> Text
takeWhile :: (Char -> Bool) -> Text -> Text
takeWhileEnd :: (Char -> Bool) -> Text -> Text
dropWhile :: (Char -> Bool) -> Text -> Text
dropWhileEnd :: (Char -> Bool) -> Text -> Text
dropAround :: (Char -> Bool) -> Text -> Text
strip :: Text -> Text
stripStart :: Text -> Text
stripEnd :: Text -> Text
splitAt :: Int -> Text -> (Text, Text)
breakOn :: HasCallStack => Text -> Text -> (Text, Text)
breakOnEnd :: HasCallStack => Text -> Text -> (Text, Text)
break :: (Char -> Bool) -> Text -> (Text, Text)
span :: (Char -> Bool) -> Text -> (Text, Text)
spanM :: Monad m => (Char -> m Bool) -> Text -> m (Text, Text)
spanEndM :: Monad m => (Char -> m Bool) -> Text -> m (Text, Text)
group :: Text -> [Text]
groupBy :: (Char -> Char -> Bool) -> Text -> [Text]
inits :: Text -> [Text]
tails :: Text -> [Text]
splitOn :: HasCallStack => Text -> Text -> [Text]
split :: (Char -> Bool) -> Text -> [Text]
chunksOf :: Int -> Text -> [Text]
lines :: Text -> [Text]
words :: Text -> [Text]
unlines :: [Text] -> Text
unwords :: [Text] -> Text
isPrefixOf :: Text -> Text -> Bool
isSuffixOf :: Text -> Text -> Bool
isInfixOf :: Text -> Text -> Bool
stripPrefix :: Text -> Text -> Maybe Text
stripSuffix :: Text -> Text -> Maybe Text
commonPrefixes :: Text -> Text -> Maybe (Text, Text, Text)
filter :: (Char -> Bool) -> Text -> Text
breakOnAll :: HasCallStack => Text -> Text -> [(Text, Text)]
find :: (Char -> Bool) -> Text -> Maybe Char
elem :: Char -> Text -> Bool
partition :: (Char -> Bool) -> Text -> (Text, Text)
index :: HasCallStack => Text -> Int -> Char
findIndex :: (Char -> Bool) -> Text -> Maybe Int
count :: HasCallStack => Text -> Text -> Int
zip :: Text -> Text -> [(Char, Char)]
zipWith :: (Char -> Char -> Char) -> Text -> Text -> Text
copy :: Text -> Text
unpackCString# :: Addr# -> Text
unpackCStringAscii# :: Addr# -> Text
measureOff :: Int -> Text -> Int

Strict vs lazy types

This package provides both strict and lazy Text types. The strict type is provided by the Data.Text module, while the lazy type is provided by the Data.Text.Lazy module. Internally, the lazy Text type consists of a list of strict chunks.

The strict Text type requires that an entire string fit into memory at once. The lazy Text type is capable of streaming strings that are larger than memory using a small memory footprint. In many cases, the overhead of chunked streaming makes the lazy Text type slower than its strict counterpart, but this is not always the case. Sometimes, the time complexity of a function in one module may be different from the other, due to their differing internal structures.

Each module provides an almost identical API, with the main difference being that the strict module uses Int values for lengths and counts, while the lazy module uses Int64 lengths.

Acceptable data

A Text value is a sequence of Unicode scalar values, as defined in §3.9, definition D76 of the Unicode 5.2 standard. As such, a Text cannot contain values in the range U+D800 to U+DFFF inclusive. Haskell implementations admit all Unicode code points (§3.4, definition D10) as Char values, including code points from this invalid range. This means that there are some Char values (corresponding to Surrogate category) that are not valid Unicode scalar values, and the functions in this module must handle those cases.

Within this module, many functions construct a Text from one or more Char values. Those functions will substitute Char values that are not valid Unicode scalar values with the replacement character "�" (U+FFFD). Functions that perform this inspection and replacement are documented with the phrase "Performs replacement on invalid scalar values". The functions replace invalid scalar values, instead of dropping them, as a security measure. For details, see Unicode Technical Report 36, §3.5.)

Definition of character

This package uses the term character to denote Unicode code points.

Note that this is not the same thing as a grapheme (e.g. a composition of code points that form one visual symbol). For instance, consider the grapheme "ä". This symbol has two Unicode representations: a single code-point representation U+00E4 (the LATIN SMALL LETTER A WITH DIAERESIS code point), and a two code point representation U+0061 (the "A" code point) and U+0308 (the COMBINING DIAERESIS code point).

Fusion

Starting from text-1.3 fusion is no longer implicit, and pipelines of transformations usually allocate intermediate Text values. Users, who observe significant changes to performances, are encouraged to use fusion framework explicitly, employing Data.Text.Internal.Fusion and Data.Text.Internal.Fusion.Common.

Types

data Text Source #

A space efficient, packed, unboxed Unicode text type.

Instances

Instances details

Data Text Source #	This instance preserves data abstraction at the cost of inefficiency. We omit reflection services for the sake of data abstraction. This instance was created by copying the updated behavior of `Data.Set.Set` and `Data.Map.Map`. If you feel a mistake has been made, please feel free to submit improvements. The original discussion is archived here: could we get a Data instance for Data.Text.Text? The followup discussion that changed the behavior of `Set` and `Map` is archived here: Proposal: Allow gunfold for Data.Map, ...
Instance details Defined in Data.Text Methods gfoldl :: (forall d b. Data d => c (d -> b) -> d -> c b) -> (forall g. g -> c g) -> Text -> c Text # gunfold :: (forall b r. Data b => c (b -> r) -> c r) -> (forall r. r -> c r) -> Constr -> c Text # toConstr :: Text -> Constr # dataTypeOf :: Text -> DataType # dataCast1 :: Typeable t => (forall d. Data d => c (t d)) -> Maybe (c Text) # dataCast2 :: Typeable t => (forall d e. (Data d, Data e) => c (t d e)) -> Maybe (c Text) # gmapT :: (forall b. Data b => b -> b) -> Text -> Text # gmapQl :: (r -> r' -> r) -> r -> (forall d. Data d => d -> r') -> Text -> r # gmapQr :: forall r r'. (r' -> r -> r) -> r -> (forall d. Data d => d -> r') -> Text -> r # gmapQ :: (forall d. Data d => d -> u) -> Text -> [u] # gmapQi :: Int -> (forall d. Data d => d -> u) -> Text -> u # gmapM :: Monad m => (forall d. Data d => d -> m d) -> Text -> m Text # gmapMp :: MonadPlus m => (forall d. Data d => d -> m d) -> Text -> m Text # gmapMo :: MonadPlus m => (forall d. Data d => d -> m d) -> Text -> m Text #
IsString Text Source #	Performs replacement on invalid scalar values: `>>>` `:set -XOverloadedStrings``>>>` `"\55555" :: Text`"\65533"
Instance details Defined in Data.Text Methods fromString :: String -> Text #
Monoid Text Source #
Instance details Defined in Data.Text Methods mempty :: Text # mappend :: Text -> Text -> Text # mconcat :: [Text] -> Text #
Semigroup Text Source #	Since: 1.2.2.0
Instance details Defined in Data.Text Methods (<>) :: Text -> Text -> Text # sconcat :: NonEmpty Text -> Text # stimes :: Integral b => b -> Text -> Text #
IsList Text Source #	Performs replacement on invalid scalar values: `>>>` `:set -XOverloadedLists``>>>` `['\55555'] :: Text`"\65533" Since: 1.2.0.0
Instance details Defined in Data.Text Associated Types type Item Text # Methods fromList :: [Item Text] -> Text # fromListN :: Int -> [Item Text] -> Text # toList :: Text -> [Item Text] #
Read