text-2.0.2: An efficient packed Unicode text type.
Copyright(c) 2009 2010 2011 2012 Bryan O'Sullivan
(c) 2009 Duncan Coutts
(c) 2008 2009 Tom Harper
(c) 2021 Andrew Lelechenko
LicenseBSD-style
Maintainer[email protected]
PortabilityGHC
Safe HaskellTrustworthy
LanguageHaskell2010

Data.Text

Description

A time and space-efficient implementation of Unicode text. Suitable for performance critical use, both in terms of large data quantities and high speed.

Note: Read below the synopsis for important notes on the use of this module.

This module is intended to be imported qualified, to avoid name clashes with Prelude functions, e.g.

import qualified Data.Text as T

To use an extended and very rich family of functions for working with Unicode text (including normalization, regular expressions, non-standard encodings, text breaking, and locales), see the text-icu package.

Synopsis

Strict vs lazy types

This package provides both strict and lazy Text types. The strict type is provided by the Data.Text module, while the lazy type is provided by the Data.Text.Lazy module. Internally, the lazy Text type consists of a list of strict chunks.

The strict Text type requires that an entire string fit into memory at once. The lazy Text type is capable of streaming strings that are larger than memory using a small memory footprint. In many cases, the overhead of chunked streaming makes the lazy Text type slower than its strict counterpart, but this is not always the case. Sometimes, the time complexity of a function in one module may be different from the other, due to their differing internal structures.

Each module provides an almost identical API, with the main difference being that the strict module uses Int values for lengths and counts, while the lazy module uses Int64 lengths.

Acceptable data

A Text value is a sequence of Unicode scalar values, as defined in §3.9, definition D76 of the Unicode 5.2 standard. As such, a Text cannot contain values in the range U+D800 to U+DFFF inclusive. Haskell implementations admit all Unicode code points (§3.4, definition D10) as Char values, including code points from this invalid range. This means that there are some Char values (corresponding to Surrogate category) that are not valid Unicode scalar values, and the functions in this module must handle those cases.

Within this module, many functions construct a Text from one or more Char values. Those functions will substitute Char values that are not valid Unicode scalar values with the replacement character "�" (U+FFFD). Functions that perform this inspection and replacement are documented with the phrase "Performs replacement on invalid scalar values". The functions replace invalid scalar values, instead of dropping them, as a security measure. For details, see Unicode Technical Report 36, §3.5.)

Definition of character

This package uses the term character to denote Unicode code points.

Note that this is not the same thing as a grapheme (e.g. a composition of code points that form one visual symbol). For instance, consider the grapheme "ä". This symbol has two Unicode representations: a single code-point representation U+00E4 (the LATIN SMALL LETTER A WITH DIAERESIS code point), and a two code point representation U+0061 (the "A" code point) and U+0308 (the COMBINING DIAERESIS code point).

Fusion

Starting from text-1.3 fusion is no longer implicit, and pipelines of transformations usually allocate intermediate Text values. Users, who observe significant changes to performances, are encouraged to use fusion framework explicitly, employing Data.Text.Internal.Fusion and Data.Text.Internal.Fusion.Common.

Types

data Text Source #

A space efficient, packed, unboxed Unicode text type.

Instances

Instances details
Data Text Source #

This instance preserves data abstraction at the cost of inefficiency. We omit reflection services for the sake of data abstraction.

This instance was created by copying the updated behavior of Data.Set.Set and Data.Map.Map. If you feel a mistake has been made, please feel free to submit improvements.

The original discussion is archived here: could we get a Data instance for Data.Text.Text?

The followup discussion that changed the behavior of Set and Map is archived here: Proposal: Allow gunfold for Data.Map, ...

Instance details

Defined in Data.Text

Methods

gfoldl :: (forall d b. Data d => c (d -> b) -> d -> c b) -> (forall g. g -> c g) -> Text -> c Text #

gunfold :: (forall b r. Data b => c (b -> r) -> c r) -> (forall r. r -> c r) -> Constr -> c Text #

toConstr :: Text -> Constr #

dataTypeOf :: Text -> DataType #

dataCast1 :: Typeable t => (forall d. Data d => c (t d)) -> Maybe (c Text) #

dataCast2 :: Typeable t => (forall d e. (Data d, Data e) => c (t d e)) -> Maybe (c Text) #

gmapT :: (forall b. Data b => b -> b) -> Text -> Text #

gmapQl :: (r -> r' -> r) -> r -> (forall d. Data d => d -> r') -> Text -> r #

gmapQr :: forall r r'. (r' -> r -> r) -> r -> (forall d. Data d => d -> r') -> Text -> r #

gmapQ :: (forall d. Data d => d -> u) -> Text -> [u] #

gmapQi :: Int -> (forall d. Data d => d -> u) -> Text -> u #

gmapM :: Monad m => (forall d. Data d => d -> m d) -> Text -> m Text #

gmapMp :: MonadPlus m => (forall d. Data d => d -> m d) -> Text -> m Text #

gmapMo :: MonadPlus m => (forall d. Data d => d -> m d) -> Text -> m Text #

IsString Text Source #

Performs replacement on invalid scalar values:

>>> :set -XOverloadedStrings
>>> "\55555" :: Text
"\65533"
Instance details

Defined in Data.Text

Methods

fromString :: String -> Text #

Monoid Text Source # 
Instance details

Defined in Data.Text

Methods

mempty :: Text #

mappend :: Text -> Text -> Text #

mconcat :: [Text] -> Text #

Semigroup Text Source #

Since: 1.2.2.0

Instance details

Defined in Data.Text

Methods

(<>) :: Text -> Text -> Text #

sconcat :: NonEmpty Text -> Text #

stimes :: Integral b => b -> Text -> Text #

IsList Text Source #

Performs replacement on invalid scalar values:

>>> :set -XOverloadedLists
>>> ['\55555'] :: Text
"\65533"

Since: 1.2.0.0

Instance details

Defined in Data.Text

Associated Types

type Item Text #

Methods

fromList :: [Item Text] -> Text #

fromListN :: Int -> [Item Text] -> Text #

toList :: Text -> [Item Text] #

Read