Copyright | (c) 2009 2010 2011 2012 Bryan O'Sullivan (c) 2009 Duncan Coutts (c) 2008 2009 Tom Harper (c) 2021 Andrew Lelechenko |
---|---|
License | BSD-style |
Maintainer | [email protected] |
Portability | GHC |
Safe Haskell | Trustworthy |
Language | Haskell2010 |
Data.Text
Description
A time and space-efficient implementation of Unicode text. Suitable for performance critical use, both in terms of large data quantities and high speed.
Note: Read below the synopsis for important notes on the use of this module.
This module is intended to be imported qualified
, to avoid name
clashes with Prelude functions, e.g.
import qualified Data.Text as T
To use an extended and very rich family of functions for working with Unicode text (including normalization, regular expressions, non-standard encodings, text breaking, and locales), see the text-icu package.
Synopsis
- data Text
- pack :: String -> Text
- unpack :: Text -> String
- singleton :: Char -> Text
- empty :: Text
- cons :: Char -> Text -> Text
- snoc :: Text -> Char -> Text
- append :: Text -> Text -> Text
- uncons :: Text -> Maybe (Char, Text)
- unsnoc :: Text -> Maybe (Text, Char)
- head :: HasCallStack => Text -> Char
- last :: HasCallStack => Text -> Char
- tail :: HasCallStack => Text -> Text
- init :: HasCallStack => Text -> Text
- null :: Text -> Bool
- length :: Text -> Int
- compareLength :: Text -> Int -> Ordering
- map :: (Char -> Char) -> Text -> Text
- intercalate :: Text -> [Text] -> Text
- intersperse :: Char -> Text -> Text
- transpose :: [Text] -> [Text]
- reverse :: Text -> Text
- replace :: HasCallStack => Text -> Text -> Text -> Text
- toCaseFold :: Text -> Text
- toLower :: Text -> Text
- toUpper :: Text -> Text
- toTitle :: Text -> Text
- justifyLeft :: Int -> Char -> Text -> Text
- justifyRight :: Int -> Char -> Text -> Text
- center :: Int -> Char -> Text -> Text
- foldl :: (a -> Char -> a) -> a -> Text -> a
- foldl' :: (a -> Char -> a) -> a -> Text -> a
- foldl1 :: HasCallStack => (Char -> Char -> Char) -> Text -> Char
- foldl1' :: HasCallStack => (Char -> Char -> Char) -> Text -> Char
- foldr :: (Char -> a -> a) -> a -> Text -> a
- foldr' :: (Char -> a -> a) -> a -> Text -> a
- foldr1 :: HasCallStack => (Char -> Char -> Char) -> Text -> Char
- concat :: [Text] -> Text
- concatMap :: (Char -> Text) -> Text -> Text
- any :: (Char -> Bool) -> Text -> Bool
- all :: (Char -> Bool) -> Text -> Bool
- maximum :: HasCallStack => Text -> Char
- minimum :: HasCallStack => Text -> Char
- isAscii :: Text -> Bool
- scanl :: (Char -> Char -> Char) -> Char -> Text -> Text
- scanl1 :: (Char -> Char -> Char) -> Text -> Text
- scanr :: (Char -> Char -> Char) -> Char -> Text -> Text
- scanr1 :: (Char -> Char -> Char) -> Text -> Text
- mapAccumL :: forall a. (a -> Char -> (a, Char)) -> a -> Text -> (a, Text)
- mapAccumR :: forall a. (a -> Char -> (a, Char)) -> a -> Text -> (a, Text)
- replicate :: Int -> Text -> Text
- unfoldr :: (a -> Maybe (Char, a)) -> a -> Text
- unfoldrN :: Int -> (a -> Maybe (Char, a)) -> a -> Text
- take :: Int -> Text -> Text
- takeEnd :: Int -> Text -> Text
- drop :: Int -> Text -> Text
- dropEnd :: Int -> Text -> Text
- takeWhile :: (Char -> Bool) -> Text -> Text
- takeWhileEnd :: (Char -> Bool) -> Text -> Text
- dropWhile :: (Char -> Bool) -> Text -> Text
- dropWhileEnd :: (Char -> Bool) -> Text -> Text
- dropAround :: (Char -> Bool) -> Text -> Text
- strip :: Text -> Text
- stripStart :: Text -> Text
- stripEnd :: Text -> Text
- splitAt :: Int -> Text -> (Text, Text)
- breakOn :: HasCallStack => Text -> Text -> (Text, Text)
- breakOnEnd :: HasCallStack => Text -> Text -> (Text, Text)
- break :: (Char -> Bool) -> Text -> (Text, Text)
- span :: (Char -> Bool) -> Text -> (Text, Text)
- spanM :: Monad m => (Char -> m Bool) -> Text -> m (Text, Text)
- spanEndM :: Monad m => (Char -> m Bool) -> Text -> m (Text, Text)
- group :: Text -> [Text]
- groupBy :: (Char -> Char -> Bool) -> Text -> [Text]
- inits :: Text -> [Text]
- tails :: Text -> [Text]
- splitOn :: HasCallStack => Text -> Text -> [Text]
- split :: (Char -> Bool) -> Text -> [Text]
- chunksOf :: Int -> Text -> [Text]
- lines :: Text -> [Text]
- words :: Text -> [Text]
- unlines :: [Text] -> Text
- unwords :: [Text] -> Text
- isPrefixOf :: Text -> Text -> Bool
- isSuffixOf :: Text -> Text -> Bool
- isInfixOf :: Text -> Text -> Bool
- stripPrefix :: Text -> Text -> Maybe Text
- stripSuffix :: Text -> Text -> Maybe Text
- commonPrefixes :: Text -> Text -> Maybe (Text, Text, Text)
- filter :: (Char -> Bool) -> Text -> Text
- breakOnAll :: HasCallStack => Text -> Text -> [(Text, Text)]
- find :: (Char -> Bool) -> Text -> Maybe Char
- elem :: Char -> Text -> Bool
- partition :: (Char -> Bool) -> Text -> (Text, Text)
- index :: HasCallStack => Text -> Int -> Char
- findIndex :: (Char -> Bool) -> Text -> Maybe Int
- count :: HasCallStack => Text -> Text -> Int
- zip :: Text -> Text -> [(Char, Char)]
- zipWith :: (Char -> Char -> Char) -> Text -> Text -> Text
- copy :: Text -> Text
- unpackCString# :: Addr# -> Text
- unpackCStringAscii# :: Addr# -> Text
- measureOff :: Int -> Text -> Int
Strict vs lazy types
This package provides both strict and lazy Text
types. The
strict type is provided by the Data.Text module, while the lazy
type is provided by the Data.Text.Lazy module. Internally, the
lazy Text
type consists of a list of strict chunks.
The strict Text
type requires that an entire string fit into
memory at once. The lazy Text
type is capable of
streaming strings that are larger than memory using a small memory
footprint. In many cases, the overhead of chunked streaming makes
the lazy Text
type slower than its strict
counterpart, but this is not always the case. Sometimes, the time
complexity of a function in one module may be different from the
other, due to their differing internal structures.
Each module provides an almost identical API, with the main
difference being that the strict module uses Int
values for
lengths and counts, while the lazy module uses Int64
lengths.
Acceptable data
A Text
value is a sequence of Unicode scalar values, as defined
in
§3.9, definition D76 of the Unicode 5.2 standard.
As such, a Text
cannot contain values in the range U+D800 to
U+DFFF inclusive. Haskell implementations admit all Unicode code
points
(§3.4, definition D10)
as Char
values, including code points from this invalid range.
This means that there are some Char
values
(corresponding to Surrogate
category) that are not valid
Unicode scalar values, and the functions in this module must handle
those cases.
Within this module, many functions construct a Text
from one or
more Char
values. Those functions will substitute Char
values
that are not valid Unicode scalar values with the replacement
character "�" (U+FFFD). Functions that perform this
inspection and replacement are documented with the phrase
"Performs replacement on invalid scalar values". The functions replace
invalid scalar values, instead of dropping them, as a security
measure. For details, see
Unicode Technical Report 36, §3.5.)
Definition of character
This package uses the term character to denote Unicode code points.
Note that this is not the same thing as a grapheme (e.g. a
composition of code points that form one visual symbol). For
instance, consider the grapheme "ä". This symbol has two
Unicode representations: a single code-point representation
U+00E4
(the LATIN SMALL LETTER A WITH DIAERESIS
code point),
and a two code point representation U+0061
(the "A
" code
point) and U+0308
(the COMBINING DIAERESIS
code point).
Fusion
Starting from text-1.3
fusion is no longer implicit,
and pipelines of transformations usually allocate intermediate Text
values.
Users, who observe significant changes to performances,
are encouraged to use fusion framework explicitly, employing
Data.Text.Internal.Fusion and Data.Text.Internal.Fusion.Common.
Types
A space efficient, packed, unboxed Unicode text type.
Instances
Data Text Source # | This instance preserves data abstraction at the cost of inefficiency. We omit reflection services for the sake of data abstraction. This instance was created by copying the updated behavior of
The original discussion is archived here: could we get a Data instance for Data.Text.Text? The followup discussion that changed the behavior of |
Defined in Data.Text Methods gfoldl :: (forall d b. Data d => c (d -> b) -> d -> c b) -> (forall g. g -> c g) -> Text -> c Text # gunfold :: (forall b r. Data b => c (b -> r) -> c r) -> (forall r. r -> c r) -> Constr -> c Text # dataTypeOf :: Text -> DataType # dataCast1 :: Typeable t => (forall d. Data d => c (t d)) -> Maybe (c Text) # dataCast2 :: Typeable t => (forall d e. (Data d, Data e) => c (t d e)) -> Maybe (c Text) # gmapT :: (forall b. Data b => b -> b) -> Text -> Text # gmapQl :: (r -> r' -> r) -> r -> (forall d. Data d => d -> r') -> Text -> r # gmapQr :: forall r r'. (r' -> r -> r) -> r -> (forall d. Data d => d -> r') -> Text -> r # gmapQ :: (forall d. Data d => d -> u) -> Text -> [u] # gmapQi :: Int -> (forall d. Data d => d -> u) -> Text -> u # gmapM :: Monad m => (forall d. Data d => d -> m d) -> Text -> m Text # gmapMp :: MonadPlus m => (forall d. Data d => d -> m d) -> Text -> m Text # gmapMo :: MonadPlus m => (forall d. Data d => d -> m d) -> Text -> m Text # | |
IsString Text Source # | Performs replacement on invalid scalar values:
|
Defined in Data.Text Methods fromString :: String -> Text # | |
Monoid Text Source # | |
Semigroup Text Source # | Since: 1.2.2.0 |
IsList Text Source # | Performs replacement on invalid scalar values:
Since: 1.2.0.0 |
Read |