Word 97-2007 Binary File Format (Doc) Specification
Word 97-2007 Binary File Format (Doc) Specification
Unless otherwise noted, the example companies, organizations, products, domain names,
e-mail addresses, logos, people, places and events depicted herein are fictitious, and no
association with any real company, organization, product, domain name, email address, logo,
person, place or event is intended or should be inferred.
Microsoft, Windows, Windows NT, Windows Server, and Windows Vista are either registered
trademarks or trademarks of Microsoft Corporation in the United States and/or other
countries.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 3 of 210
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 4 of 210
Table of Contents
Table of Contents ....................................................................................................... 4
Note ......................................................................................................................... 8
Additions to Word 2007............................................................................................... 8
Word and .Doc Files.................................................................................................... 8
Definitions ............................................................................................................... 10
Naming Conventions ................................................................................................. 20
Format of the Summary Info Stream in a Word File ...................................................... 21
Format of the Main Stream in a Word Non-Complex File ................................................ 21
Format of the Main Stream in a Complex File ............................................................... 22
Format of the Table Stream ....................................................................................... 22
Format of the Data Stream ........................................................................................ 27
Format of the Custom XML Storage (Added in Word 2007) ............................................ 28
FIB ......................................................................................................................... 28
Text ....................................................................................................................... 29
Character and Paragraph Formatting Properties ........................................................... 32
Bin Tables ............................................................................................................... 34
Style Sheet ............................................................................................................. 34
STSHI ....................................................................................................... 36
Introduced in Word 2003: ........................................................................... 37
STD .......................................................................................................... 38
List Tables ............................................................................................................... 47
LST Records and the rglst .............................................................................. 47
List Names and the sttbListNames ................................................................... 48
LFO Records and the pllfo .............................................................................. 48
Paragraph List Formatting .............................................................................. 48
SPRM Definitions ...................................................................................................... 49
Paragraph SPRMs .......................................................................................... 50
Character SPRMs .......................................................................................... 56
Picture SPRMs .............................................................................................. 62
Section SPRMs .............................................................................................. 63
Table sprms ................................................................................................. 66
Complex SPRMs ............................................................................................ 72
Complex Paragraph SPRMs .......................................................................... 72
Complex Character SPRMs ........................................................................... 75
Complex Picture SPRMs ............................................................................... 78
Complex Section SPRMs .............................................................................. 78
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 5 of 210
Note
Many of the structures written into Microsoft® Office Word 2007 .doc files differ slightly from
the corresponding structures Word uses internally. The file-specific version of a structure is
typically named by adding a preceding or (more often) trailing ―F‖. For example, Word uses a
PLC (PLex of Cps (Character positions)) internally, but writes to files a PLCF (PLex of Cps in
File). Many discussions in this document use the name of the internal structure when the
file-specific structure is what is really being referred to. The reader should remember that the
name of a seemingly undefined structure type may simply be missing a leading or trailing ―F‖.
The majority of this document describes the contents of the main stream and the table stream.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 10 of 210
Definitions
API (Application Programming Interface):
A set of libraries, functions, definitions, etc. providing an interface to a programming
environment or model.
bin table
Each FKP can be viewed as a bucket or bin that contains the properties of a certain range
of FCs in the Word file. In Word files, a PLC, the plcfbte (PLex of FCs containing Bin
Table Entries) is maintained. It records the association between a particular range of FCs
and the PN (Page Number) of the FKP that contains the properties for that FC range in the
file. In a complex (fast-saved) Word document, FKP pages are intermingled with pages of
text in a random pattern which reflects the history of past fast saves. In a complex
document, a plcfbteChpx which records the location of every CHPX FKP must be stored
and a plcfbtePapx which records the location of every PAPX FKP must be stored. In a
non-complex, full-saved document, all of the CHPX FKPS are recorded in consecutive
512-byte pages with the FKPs recorded in ascending FC order, as are all of the PAPX FKPS.
A plcfbteLvcx serves the same purpose for LVCX FKPS.
In a full save document, the plcfbte‘s may not have been able to expand during the save
process due to a lack of RAM. In this situation, the plcfbte‘s are interspersed with the
property pages in a linked list of FBD pages.
bookmark
A bookmark associates a user defined name to a range of text within a document. A
bookmark is frequently used as an operand in field code instructions within a field. A
bookmark is represented by three parallel data structures, the sttbBkmk, the plcbkf and
the plcbkl. The sttbBkmk is a string table which contains the name of each defined
bookmark. The plcbkf records the beginning CP position of each bookmark. The plcbkl
records the limit CP position that delimits the end of a bookmark. Since bookmarks may be
nested within one another to any level, the BKF structure stored in the plcbkf consists of
a single index that identifies which plcbkl marks the end of the bookmark. The BKL
structure is not written to the file and the plcbkl contains only CPs.
character style
A named character property exception that can be associated with any number of runs of
text in a Word document‘s text stream. When a run of text is tagged with a particular
character style, a chpx defined for the character style is applied to the character
properties defined for the paragraph style of the paragraph that contains the text. This
means that the character style can change one or more of the character property field
settings specified by the paragraph style of a paragraph to a particular setting without
changing the value of any other field.
CHP (CHaracter Properties)
The data structure describing the character properties of a run of text.
CHPX (Character Property EXception)
A data structure describing how a particular CHP differs from a reference CHP. In Word 6.0,
the CHPX simply consists of a grpprl applied to the reference CHP to produce the
originally encoded CHP. By applying a CHPX to the character properties (CHP) inherited by
a particular paragraph from its style, it is possible to reconstitute the CHP for the portion
of the character run that intersects that paragraph.
COLORREF
Used to specify an explicit RGB color, the COLORREF value has the following hexadecimal
form:
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 11 of 210
0x00bbggrr
The low-order byte contains a value for the relative intensity of red, the second byte
contains a value for green, and the third byte contains a value for blue. The high-order byte
must be zero. The maximum value for a single byte is 0xFF.
The intensity for each argument is in the range 0 through 255. If all three intensities are
zero, the result is black. If all three intensities are 255, the result is white.
CP (Character Position):
A four-byte integer specifying the position coordinate of a character of text within the
logical text stream of a document.
Custom XML Datastore (Added in Word 2007):
The custom XML data store specifies custom defined XML files contained in the binary
Microsoft Word 97 format or the Office Open XML Formats.
data stream:
The stream within a Word .doc file containing various data that anchors to characters in the
main stream. For example, binary data describing in-line pictures and/or form fields.
docfile:
An OLE 2.0 compatible multi-stream file. Word files are .doc files.
document:
A named, multi-linked list of data structures, representing an ordered stream of text with
properties produced by a user of Microsoft Word.
DOP (DOcument Properties)
The data structure describing properties that apply to the document as a whole.
Dxas
embedded object
The native data for embedded objects (OBJs) is stored similarly to pictures ( PICs). To
locate the native data for Embedded objects, scan the plc of field codes for the mother,
header, footnote and annotation, textbox and header textbox documents
(fib.PlcffldMom/Hdr/Ftn/Atn/Txbx/HdrTxbx). For each separator field, get the
chp.
If chp.fSpec=1 and chp.fObj=1, then this separator field has an associated embedded
object. The file location of the object data is stored in chp.fcObj. At the specified location
an object header is stored followed by the native data for the object. See the _OBJHEADER
structure.
If chp.fOle2=1, then this separator field has an associated OLE2 object. The fcPic will
be a unique integer that specifies the name of the object‘s sub-storage instead of an offset
into the data stream.
fast-saved (or complex) file:
A Word file in which the physical order of characters stored in the file does not match the
logical order of characters in the document that the file represents. A piece table must be
stored in the file to describe the text stream of the document. Due to Unicode compression
to code page 1252, all files (simple and complex) now contain a piece table.
FC( File Character position):
A four-byte integer which is the byte offset of a character (or other object) from the
beginning of a stream of the .doc file. Before a file has been edited (i.e. in a full saved Word
document), CPs can be transformed into FCs by adding the FC coordinate of the beginning
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 12 of 210
of a document's text stream to the CP. After a file has been edited (i.e. in a fast-saved
Word document), the mapping from CP to FC is recorded in the piece table (see below).
FIB (File Information Block):
The header of a Word file. Begins at offset 0 in the file. This gives the beginning offset and
lengths of the document's text stream and subsidiary data structures within the file. Also
stores other file status information.
field
A field is a two-part structure that may be recorded in the CP stream of a document. The
first part of the structure contains field codes which instruct Word to insert text into the
second part of the structure, the field result. Fields in Word are used to insert text from an
external file or to quote another part of a document, to mark index and table of contents
entries and produce indexes and tables of contents, maintain DDE links to other programs,
to produce dates, times, page numbers, sequence numbers, etc. There are 91 different
field types.
A field begin mark delimits the beginning of a field and precedes any of the field codes
stored in the field. The end of the field codes and the beginning of the field result is marked
with the field separator and the field result and the field itself are terminated by a field
end mark.
The CP locations of the field begin mark, field separator, and field end mark are recorded
in plcfld data structures that are maintained for the main document and all of the
subdocuments of the main document whenever a field is inserted or edited. A field can be
dead, in which case it has no field separator, no field result, and no entry in the plcfld.
(See the definition of the FLD structure for a list of possible dead field code strings.) An
array of two-byte FLD structures is stored in the plcfld in a 1-to-1 correspondence with
the recorded CP entries. An FLD associated with a field begin mark records the type of
the field. An FLD associated with the field end mark records the current status of the field
(i.e. whether the result is dirty or has been edited, whether the result has been locked,
etc.)
Fields may be nested. Twenty (20) levels of nesting are permitted.
FKP (Formatted disK Page):
A data structure that fits in one 512-byte page that encodes either the character properties
or the paragraph properties of a certain portion of a Word .doc file. An FKP consists of four
components:
1) a count of the number of runs or paragraphs described by the page.
2) an array of FCs recorded in ascending order demarcating the boundaries between runs
or paragraphs that are recorded adjacent to one another in the Word file.
3) In character FKPs an array of offsets within the FKP in one to one correspondence with
the array of FCs that locate the properties of the run that begins at a particular FC.
In LVC FKPs an array of offsets within the FKP in 1-to-1 correspondence with the array of
FCs that locate the LVCXs that describe the run that begins at a particular FC.
In paragraph FKPs an array of BX structures follows the array of FCs in 1 to 1
correspondence with the array of FCs. Each BX begins with an offset that locates the
properties of the paragraph that begins at a particular FC. The remainder of the BX
contains a PHE structure that encodes information about the height of the paragraph that
begins at that FC.
4) a group of CHPXs if the FKP stores character properties, a group of PAPXs if the FKP
stores paragraph and table properties, or a group of LVCXs if the FKP stores paragraph
level and numbering cache information.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 13 of 210
LVLF
List Level on File
main stream:
The stream within a Word .doc file containing the bulk of Word‘s binary data.
object storage:
A storage containing binary data for an embedded OLE 2.0 object. Multiple instances are
referred to as ―storages‖.
OLE 2.0:
Object Linking and Embedding 2.0.
Office Drawing object
An Office Drawing object is represented in the document stream as a special character, an
ASCII 8, which has chp.fSpec=1 for the run of text containing the character. Only main
documents and header documents contain Office Drawing objects. The native data for an
Office Drawing object may be obtained by taking the CP for the special character and using
this to find the corresponding entry in the plcspa. An entry in this plc consists of a FSPA
structure, which is described elsewhere in this document.
Office Drawing objects can have text attached to them. Text for the textboxes is stored
separately in the textbox subdocument of the main or header document. The textbox
subdocument contains a plctxbxs where the text from CP n to CP n+1 in the
subdocument is the text which is contained in a textbox as specified in the TXBXS structure
for this nth entry in the plctxbxs. Textboxes can be linked in chains of up to 32 textboxes.
Ordering of textboxes in the subdocument is completely unrelated to the document
structure due to the nature of textbox linking. To find the text for a given Office Drawing
object, the TXID property (a long: high word is itxbxs+1, low word is the sequence
number) must be fetched from the Office art data for the shape. This contains an index
(itxbxs) into plctxbxs and a sequence number in the chain of linked textboxes. The text
for the entire chain of linked textboxes is stored from the CP itxbxs to CP itxbxs+1 of
plctxbxs. The plctxbxBkd describes the ―page table‖ within textbox stories (where the
textboxes in each linked textbox chain are thought of as ―pages‖). So, for each entry in the
plctxbxs there is a corresponding entry in the plctxbxBkd at the same CP, and there
may be additional entries in the plctxbxBkd to describe the breaks from one textbox to
the next in linked textbox chains.
page (or sector):
A 512 byte segment of the main stream within a Word .doc file that begins on a 512-byte
boundary. (bytes 0-511 are in page 0, bytes 512-1023 are in page 1, etc.). In Word data
structures, an unsigned two-byte integer page number is given the acronym PN (for Page
Number).
PAP (PAragraph Properties)
The data structure which describes the properties of a particular paragraph.
PAPX (PAragraph Property EXception)
A data structure describing how a particular paragraph‘s properties differ from the
paragraph properties of the style assigned to the paragraph. By applying a PAPX to the
paragraph properties (PAP) inherited by a particular paragraph from its style, it is possible
to reconstitute the PAP for that paragraph. The PAPX contains an ISTD (a style code to
identify the style in control of the paragraph and a grpprl which specifies how the style's
paragraph properties must be changed to produce the paragraph properties of the
paragraph.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 15 of 210
paragraph
A contiguous sequence of characters within the text stream of a document that is delimited
by a paragraph mark, cell mark, row mark, or a section mark (these are special characters
described later in this document).
paragraph style
A named set of character and paragraph properties that can be associated with any
number of paragraphs in a Word document's text stream. A paragraph style provides a
set of character and paragraph property defaults for the text of any paragraph tagged with
that style. When a new paragraph is created and given a particular style, newly typed text
is set to the character and paragraph properties of that style unless the user makes an
exception to the paragraph style definition by performing other editing operations.
picture
A picture is represented in the document text stream as a special character, an ASCII 1
whose CHP has the fSpec bit set to 1. The file location of the picture in the Word binary file
is stored in the character‘s CHP in chp.fcPic. The fcPic is a byte offset into the data
stream. Beginning at the position recorded in chp.fcPic, a header data structure, the
PIC, will be stored. If the picture is a reference to a TIFF file, a Picture file or an Office
shape file, the name of the file is recorded immediately following the PIC in a Pascal style
string. If the picture is an Office shape, a Window's metafile or a bitmap, the shape,
metafile or bitmap will immediately follow the PIC. Pictures that are a reference to an
Office shape file will include both the filename and the shape in that order. Pictures inserted
with Word 97 and later versions are in the new Office shape format (documented
elsewhere). However, pictures can be copied from older files into newer ones and their old
format will persist until the picture is edited or displayed.
Some files (including all files created by Word for the Macintosh) may store Macintosh PICT
pictures as well. In this case, the PIC structure is immediately followed by a standard
Windows metafile depicting a large ―x‖, so that older readers expecting only a metafile
after the PIC will just display this ―x‖. If a reader detects this standard ―x‖ metafile, it can
extract the sizes of the standard ―x‖ metafile and the Macintosh PICT picture that follows it
from an early portion of this ―x‖ metafile. See Appendix B for a discussion of this technique.
piece table:
The piece table is a data structure that describes the logical sequence of characters in a
Word document and records recent changes to the formatting of a Word document. It is
stored in a Word file as a PLCF named the plcfpcd (PLex of Cps containing Piece
Descriptors). The piece table relates a logical character number, called a CP (Character
Position), to a physical location within a Word file (an FC). The array of CPs in the plcfpcd
defines a partitioning of the Word document into disjoint pieces. The second array is an
array of PCDs (Piece Descriptors) which is in 1-to-1 correspondence to the array of CPs
that records the physical location in the Word file where the corresponding piece begins. To
find the physical location of a particular logical character in a Word document, take the CP
coordinate of that character within the document and find the piece that contains that
character. This is done by finding the index of the largest CP in the array of CPs that is less
than the character CP. Then reference the PCD with that index in the array of PCDs. The FC
stored in the PCD gives the position of the beginning of the piece in the file. Finally, add the
offset of the desired character from the beginning of its piece to the FC of the beginning of
the piece. This gives a ―virtual‖ file offset of the character. If the second most significant bit
is clear, then this indicates the actual file offset of the Unicode character (two bytes). If the
second most significant bit is set, then the actual address of the codepage-1252
compressed version of the Unicode character (one byte), is actually at the offset indicated
by clearing this bit and dividing by two.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 16 of 210
PL
The SEP for a particular section may be constructed if a CP of a character in that section is
known. First search the array of CPs in the PLCSED for the index of the largest CP that is
less than or equal to the CP of the character. Use this index to locate the SED in the
plcfsed which describes the section. The FC stored in the SED is the offset from the
beginning of the Word file at which the SEPX is stored. If the stored FC is equal to
0xFFFFFFFF, then the SEP for the section is exactly equal to the standard SEP (see SEP
structure definition). Otherwise, read the SEPX into memory and create a copy of the
standard SEP. Finally, apply the sprms stored in the SEPX to the standard SEP to produce
the SEP for a section.
SPLS
structures contain CP coordinates whose 0 point is the beginning of the subdocument text
stream instead of the beginning of the main document text stream.
In full-saved documents, a simple calculation with values stored in the FIB produces the
file offset of the beginning of the subdocument text streams (if they exist). The length of
these streams is also stored.
In fast-saved documents, the piece tables of subdocuments are concatenated to the
end of the main document piece table. In this case, to identify the beginning of
subdocument text, you must sum the length of the main document text stream with the
lengths of any subdocument text streams stored ahead of the subdocument (information
stored in the FIB) and treat this sum as a CP coordinate. To retrieve the text of the
subdocument, you must do lookups in the piece table, starting with the piece that contains
the beginning CP coordinate, to find the physical location of each piece of the subdocument
text stream.
summary information stream:
The stream within a Word .doc file containing the document summary information.
table row:
A contiguous sequence of paragraphs within the text stream of a document that is
partitioned into subsequences of paragraphs called cells. The last paragraph of each cell is
terminated by a special paragraph mark called a cell mark. Following the cell mark that
ends the last cell of a table row, the table row is terminated by a special paragraph mark
called a row mark. When Word displays a table row, it assigns a rectangular shaped
display area to each cell in the row. All of the cell display area‘s tops are aligned at the
same vertical position on a page. The leftmost display area in a table row is assigned to the
0th cell of the row; the next display area to the right is assigned to the 1st cell of the row,
etc. The text of the cell is wrapped to fit its display area. As more text is added to the cell,
the cell display area extends downward. A set of table properties that determine how many
cells are in a row, where the horizontal boundaries of cell display areas are, and what
borders are drawn around each cell in the table is stored for the row mark that marks the
end of the table row.
table stream:
The stream within a Word .doc file containing the various plcf‘s and tables that describe
a document‘s structures.
TAP (TAble Properties):
The data structure which describes the properties of a single table row. The information in
the TAP for a table row is stored in a Word file as a list of sprms that modify a TAP which
has been cleared to zeros. This list of table sprms is appended to the grpprl of paragraph
sprms that is recorded in the PAPX for the row mark that delimits the end of a table row.
UPE (Universal Property Expansion)
Describes the ―end-result‖ of property formatting, i.e. what the style looks like. The UPE
structure is a non-zero prefix of a UPD structure.
UPX (Universal Property eXception)
Describes the difference in formatting of a style as compared to its based-on style.
XCHAR( eXtended CHARacter set):
A data type which defines a ―character‖. Each XCHAR corresponds to a character in the
document, where ―character‖ is defined as a glyph, regardless of whether it is a single-byte
or double-byte character. With Word6 (East Asian), Word95 (East Asian), Word97/all and
future versions of Word, this is defined as a 16-bit integer corresponding to the Unicode
character code of the glyph.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 19 of 210
XST
Note In this document, bit 0 is the low-order bit. Structures are described as they would be
declared in C for the Intel architecture. When numbering bytes in a word from low offset
towards high offset, two-byte integers have their least significant eight bits stored in byte 0
and most significant eight bits in byte 1. If bit 31 is the most significant bit in a four-byte
integer, bits 31 through 24 are stored in byte 3 of a four-byte integer, bits 23 through 16 are
stored in byte 2, bits 15 through 8 will be stored in byte 1, and bits 7 through 0 are stored in
byte 0.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 20 of 210
Naming Conventions
The field names in Word data structures usually consist of a prefix of lower case characters
followed by an optional upper case modifier. The following tags are used in the lower case
prefix of field names to document the data type of the field:
b Used to name a 1 byte integer value
c Prefix used to signify that an integer value is a count of some number of objects.
(e.g. a cb is a count of bytes, a cl is a count of lines, ccol is a count of columns,
a cpe is a count of picture elements.)
cp Used to name a variable that contains a character position within the document.
Always a 4 byte quantity.
dxa Used to name a variable that contains the horizontal distance of an object
measured from some reference point expressed in twips. (e.g. pap.dxaLeft is
the distance of the left boundary of a paragraph measured from the left margin
of the page). See ―xa‖ for definition of twip.
dxp Used to name a variable that contains the horizontal distance of an object
measured from some reference point expressed in Macintosh pixel units (1/72‖).
(e.g. dxpSpace)
dya Used to name a variable that contains the vertical distance of an object
measured from some reference point expressed in twips. (e.g. pap.dyaAbs is
the vertical distance of the top of a paragraph from a reference frame declared
in the pap). See ―xa‖ for definition of twip.
dyp Used to name a variable that contains the vertical distance of an object
measured from some reference point expressed in Macintosh pixel units (1/72‖).
f Used to name a flag (a variable containing a Boolean value). Usually the object
referred to will contain either 1 (fTrue, TRUE) or 0 (fFalse, FALSE). (e.g.
fWidowControl, fShadow)
fc Used to name a variable that contains an offset from the beginning of a file.
Always a 4 byte quantity.
grp Prefix used to name an array of bytes that contains one or more copies of a
variable length data structure with the instances of the data structure stored one
after the other in the array. (e.g. a grpprl is an array of bytes that stores a
group of prls.)
grpf Prefix used to name an integer or byte value whose bits are used as flags. (e.g.
grpfIhdt is a group of flags that records the types of headers that are stored
for a particular section of a document).
i Prefix used to signify that an integer value is used as an index into an array.
(e.g. itbd is an index into rgtbd, itc is an index into rgtc.)
l Used to name a 4 byte integer value ( a long). (e.g. lcb)
rg Prefix used to signify that the data structure being defined is an array. (e.g. rgb
(an array of bytes), rgcp (an array of CPs), rgfc (an array of FCs), rgfoo (an
array of foos).
w Used to name a 2 byte integer value (a short ).
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 21 of 210
Group of SEPXs
Section Property Exceptions (SEPXs) immediately follow the FKPs and are concatenated
one after the other. SEPXs are no longer guaranteed to start on a page boundary if it would
span a boundary when placed immediately after the preceding SEPX.
plcupcRgbuse
Undocumented undo / versioning data
plcupcUsp
Undocumented undo / versioning data
plcfwkb (work book document partition table)
Written immediately after the previously recorded table, if the document is a master
document.
plflfo (more list formats)
Written immediately after the end of the plcflst and its accompanying data, if there are
any lists defined in the document. This consists first of a PL of LFO records, followed by the
allocated data (if any) hanging off the LFOs. The allocated data consists of the array of
LFOLVLFs for each LFO (and each LFOLVLF is immediately followed by some LVLs).
pms (print merge state)
Written immediately after the previously recorded table, if information about the print /
mail merge state is recorded for the document
prDrvr (printer driver information)
Written immediately after the previously recorded table, if a print environment is recorded
for the document.
prEnvLand (print environment in landscape mode)
Written immediately after the previously recorded table, if a landscape mode print
environment is recorded for this document.
prEnvPort (print environment in portrait mode)
Written immediately after the previously recorded table, if a portrait mode print
environment is recorded for this document.
routeSlip (mailer routing slip)
Written immediately after the previously recorded table, if this document has a mailer
routing slip.
sttbAutoCaption (auto caption string table)
Written immediately after the previously recorded table, if the document contains auto
captions.
sttbCaption (caption title string table)
Written immediately after the previously recorded table, if the document contains
captions.
sttbfAssoc (table of associated strings)
Table of associated strings.
sttbfAtnBkmk (table of annotation bookmark string names)
Written immediately after the previously recorded table, if the document contains
annotations with bookmarks.
sttbfBkmk (table of bookmark name strings)
Written immediately after the previously recorded table, if the document contains
bookmarks.
sttbFnm (filename reference string table)
Written immediately after the previously recorded table, if the document references other
documents.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 27 of 210
FIB
The FIB contains a "magic word" and pointers to the various other parts of the file, as well as
information about the length of the file. The FIB starts at the beginning of the file. The FIB is
defined in the structure definition section of this document.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 29 of 210
Text
The text of the file starts at fib.fcMin and is usually set to the next 128 byte boundary after
the end of the FIB. The text in a Word document is ASCII text with the following restrictions
(ASCII codes given in decimal):
Paragraph ends are stored as ASCII 13 (a single <Carriage Return> character). No other
occurrences of this character sequence are allowed.
Hard line breaks which are not paragraph ends are stored as ASCII 11. Other line break
or word wrap information is not stored.
Hyphens
Breaking hyphens are stored as ASCII 45 (normal hyphen code).
Non-required hyphens are ASCII 31.
Non-breaking hyphens are stored as ASCII 30.
Non-breaking spaces are stored as 160.
Normal spaces are ASCII 32.
Page breaks and Section marks are ASCII 12 (normal form feed); if there's an entry in
the section table, it's a section mark, otherwise it's a page break.
Column breaks are stored as ASCII 14.
Tab characters are ASCII 9 (normal).
Fields
Field begin mark which delimits the beginning of a field is ASCII 19.
Field end mark which delimits the end of a field is ASCII 21.
Field separator ,which marks the boundary between the preceding field code text and
following field expansion text within a field, is ASCII 20.
Field escape character is the '\' character which also serves as the formula mark.
The cell mark which delimits the end of a cell in a table row is stored as ASCII 7 and has
the fInTable paragraph property set to fTrue (pap.fInTable==1).
The row mark which delimits the end of a table row is stored as ASCII 7 and has the
fInTable paragraph property and fTtp paragraph property set to fTrue
(pap.fInTable==1 && pap.fTtp==1).
The following ASCII codes are treated as "special" characters when they have the character
property special on (chp.fSpec==1):
ASCII code Special character
0 Current page number
1 Picture
2 Auto numbered footnote reference.
3 Footnote separator character
4 Footnote continuation character
5 Annotation reference
6 Line number
7 Hand Annotation picture (Generated in Pen Windows)
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 30 of 210
Note The end of a section is also the end of a paragraph. The last character of a section is a
section mark which stands in place of the paragraph mark normally required to end a
paragraph. An exception is made for the last character of a document which is always a
paragraph mark although the end of a document is always an implicit end of section.
If !fib.fComplex, the document text stream is represented by the text beginning at
fib.fcMin up to (but not including) fib.fcMac. Otherwise, the document is represented by
the piece table stored in the file in the data beginning at fib.fcClx.
The document text stream includes text that is part of the main document, plus any text that
exists for the footnote, header, macro, or annotation subdocuments. The sizes of the main
document and the header, footnote, macro and annotation subdocuments are stored in the
fib, in variables:
fib.ccpText, fib.ccpFtn fib.ccpHdr
fib.ccpMcr fib.ccpEdn fib.ccpTxbx
fib.ccpHdrTxbox fib.ccpAtn
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 31 of 210
In Word documents, the fundamental unit of text for which character exception information is
kept is the run of exception text, a contiguous sequence of characters stored on disk that all
have the same exception properties with respect to their underlying style character properties.
Each run would have an entry recorded in a CHPX FKP. If a user never changed the character
properties inherited from the styles used in the document and did a complete save of the
document, although each of those styles may have different properties, the entire document
stream would be one large run of exception text and one CHPX would suffice to describe the
character properties of the entire document.
The fundamental unit of text for which paragraph properties are recorded is the paragraph.
Every paragraph has an entry recorded in a PAPX FKP.
The CHPX FKP and the PAPX FKP have similar physical structures. An FKP is a 512-byte data
structure that is stored in one page of a Word file. At offset 511 is a 1-byte count named crun,
which is a count of runs of exception text for CHPX FKPs and which is a count of paragraphs in
PAPX FKPs. Beginning at offset 0 of the FKP is an array of crun+1 FCs, named rgfc, which
records the beginning and limit FCs of crun runs of exception text or paragraphs.
For CHPX FKPs, immediately following fkp.rgfc is a byte array of crun word offsets to CHPXs
from the beginning of the FKP. This byte array, named rgb, is in 1-to-1 correspondence with
the rgfc.
For PAPX FKPSs, immediately following the fkp.rgfc is an array of 13 byte entries called
BXs. This array called the rgbx is in 1-to-1 correspondence with the rgfc. The first byte of the
ith BX entry contains a single byte field which gives the word offset of the PAPX that belongs
to the paragraph whose beginning in FC space is rgfc[i] and whose limit is rgfc[i+1] in FC
space. The last 12 bytes of the ith BX entry contain a PHE structure that stores the current
paragraph height of the paragraph whose beginning in FC space is rgfc[i] and whose limit is
rgfc[i+1] in FC space.
The fact that the offset to properties stored in the rgb or rgbx is a word offset implies that
CHPXs and PAPXs are stored in FKPs beginning on word boundaries. Since the values stored in
the rgb/rgbx allow random access throughout the FKP, space within an FKP can be conserved
by storing the offset of the same physical CHPX/PAPX in rgb/rgbx entries when several runs
or paragraphs in the FKP have the same properties. Word uses this optimization.
An rgb or rgbx[].b value of 0 is used in another optimization. When an rgb or rgbx[].b
value of 0 is stored in an FKP, it means that instead of referring to a particular CHPX/PAPX in
the FKP the 0 value signals the reader to construct a commonly encountered predefined set of
properties.
For CHPX FKPs a 0 rgb value means the properties of the run of text were exactly equal to the
character properties inherited from the style of the paragraph it was in. For PAPX FKPs, a 0
rgbx[].b value means the paragraph‘s properties were exactly equal to the paragraph
properties of the Normal style (stc==0) and the paragraph contained 1 line of 240 pixels, with
a column width of 7980 dxas.
When new entries are added to an FKP, there must be unallocated space in the middle of the
FKP equal to 5 bytes for CHPXs (size of an FC plus size of one-byte word offset) or 11 bytes for
PAPXs (size of an FC plus the size of a seven byte BX entry), plus the size of the new CHPX or
PAPX if the property being added is not already recorded in the FKP and is not the property
coded with a 0 rgb/rgbx[].b value. To add a new property in a CHPX FKP, existing rgb
entries are moved four bytes to the right in the FKP. To add a new property in a PAPX FKP,
existing rgbx entries are moved four bytes to the right in the FKP. The new FC is added at the
end of the rgfc. The new CHPX or PAPX is recorded on a 2-byte boundary before the
previously recorded properties are stored at the end of the block. The word offset of the
beginning of the CHPX or PAPX is stored as the last entry of the relocated rgb/rgbx[].b, and
finally, the crun stored at offset 511 is incremented. In Word ‘97, PAPXs can be generated
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 34 of 210
which are too large to fit in an FKP. In such a case, the grpprl of the PAPX is written to the
data stream and a PAPX is stored in an FKP with that grpprl replaced by a sprmPHugePapx.
Bin Tables
A bin table (plcfbte) partitions the total extent of the Word file that contains text characters
into a set of contiguous intervals marked by an fcFirst and an fcLim. The fcFirst for the
nth interval would be plcfbte.rgfc[n] and the fcLim for the nth interval would be
plcfbte.rgfc[n+1]. Associated with each interval is a BTE. A BTE holds a four-byte PN
(page number) which identifies the FKP page in the file which contains the formatting
information for that interval. A CHPX FKP further partitions an interval into runs of exception
text. A PAPX FKP in a non-complex, full-saved file, partitions the text within intervals into
paragraphs. If a file is in complex format (was fast-saved), the PAPX FKP only records the FCs
within the text preceded by a paragraph mark. Even though a sequence of text may be
between two paragraph end marks, it may reside in a paragraph different from the one defined
by the next paragraph end mark, because the text may have been moved by the user into a
different paragraph. In the logical text stream represented by the document's piece table, the
paragraph mark that follows the moved text is stored in a non-adjacent physical location in the
file.
Style Sheet
A style sheet is a collection of styles. In Word, each document has its own style sheet.
A style is a collection of formatting information with a name. Word 6.0 and later versions
support paragraph and character styles. Versions of Word prior to 6.0 support only paragraph
styles. Character styles have just character formatting. Paragraph styles have both character
and paragraph formatting. The style sheet establishes a correspondence between a style code
and a style definition.
Note: the storage and behavior of styles has changed considerably since WinWord 2.x,
beginning with nFib 63. Some of the differences are:
Character styles are supported.
The style code is called an istd, rather than an stc.
The istd is a short integer, where the stc was a byte.
The range of the istd is 0-4095, where 4095 is the null style. The range of the stc was
0-256, with 222 as the null style.
PAPX's have a short istd at the beginning, rather than a byte stc.
CHPX's are a grpprl, not a CHP.
Many other changes...
The styles for a document (both paragraph and character styles) are stored in an array in each
document. [The DOD.hplhqstd is a handle to a plex (array) of hq's (handles) to std's (style
descriptions] When new styles are created, they are added to the end of the array. The array
can have unused slots. Some slots at the beginning of the array are reserved for specific
styles, whether they were created yet or not. [Istd (slot) 0 is Normal. Istd 1-9 are Heading 1-9.
Istd 10 is Default Paragraph Font. Istd 11-14 are reserved. So the first non-fixed index is 15
(see stshi.istdMaxFixedWhenSaved.] Paragraph and character styles are stored in the same
array. Each document has a separate array, so the same style will usually [Those styles in
fixed locations in the style sheet will have the same istd's in all documents] have a different
istd in two different documents. Thus style matching between documents must be done by
name (or by sti if the styles are built-in).
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 35 of 210
Styles are usually referred to using an istd. The istd is an index into an array of STDs (STyle
Descriptions). A (doc, istd) pair uniquely identifies a style because it tells which style is in
which array.
Parts of a style (for more information, see the STD structure below):
sti: A style identifier. Built-in styles have a unique sti to indicate which built-in style they
reference. User-defined styles use stiUser.
stk: The type of style, either paragraph or character.
istdBase: The style that this style is based on.
istdNext: The style that should be applied after this one.
stzName: The name of a style, unique within its style sheet.
UPX: The difference between this style and the one it is based on.
UPE: The properties of this style (a PAP, CHP, and/or grpprl).
Every paragraph has a paragraph style. Every character has a character style. The default
paragraph style is Normal (stiNormal, istdNormal). The default character style is Default
Paragraph Font (stiNormalChar, istdNormalChar).
The formatting of a paragraph (the PAP) and a character (the CHP) depend on the paragraph
and character styles applied to them, as well as any additional formatting stored in the FKPs.
The PAP and CHP are constructed in a layered fashion:
For a PAP:
1. An initial PAP is determined by getting the PAP from the paragraph's style.
2. Any paragraph formatting stored in the file (the FKP PAPX's) is then applied to that PAP.
For a CHP:
1. An initial CHP is determined by getting the CHP from the paragraph's style.
2. Properties from the character's style (the UPX.chpx.grpprl) are then applied to that
CHP.
3. Any character formatting stored in the file (the FKP CHPX's) is then applied to that CHP.
Note: the resulting PAP and CHP have fields that indicate what style was applied: PAP.istd,
CHP.istd.
Stylesheet File Format
The style sheet (STSH) is stored in the file in two parts, a STSHI and then an array of STDs. The
STSHI contains general information about the following style sheet, including how many styles
are in it. After the STSHI, each style is written as an STD. Both the STSHI and each STD are
preceded by a ushort that indicates their length.
Field Size Comment
1
cbStshi 2 bytes Size of the following STSHI structure
STSHI (cbStshi) Stylesheet Information
1
For early versions of Word 6.0 files (versions prior to nFib 67), this field was not written. The cbStshi to use for
those file versions is 4 bytes.
Then for each style in the style sheet (stshi.cstd), the following is stored:
cbStd 2 bytes Size of the following STD structure
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 36 of 210
STSHI
The STSHI structure, which stores style sheet information has the following format:
typedef struct _STSHI
{
ushort cstd; // Count of styles in stylesheet
ushort cbSTDBaseInFile; // Length of STD Base as stored in a file
BF fStdStylenamesWritten : 1; // Are built-in stylenames stored?
BF : 15; // Spare flags
ushort stiMaxWhenSaved; // Max sti known when this file was written
ushort istdMaxFixedWhenSaved; // How many fixed-index istds are there?
ushort nVerBuiltInNamesWhenSaved; // Current version of built-in stylenames
FTC rgftcStandardChpStsh[iftcCompositeMax]; /* rgftc used by
StandardChpStsh for this document */
ushort cbLSD; /* size of each lsd in mpstilsd. The count of lsd's
is stiMaxWhenSaved */
LSD mpstilsd[stiMax]; /* latent style data
(stiMax == stiMaxWhenSaved upon save!) */
} STSHI;
The cb preceding the STSHI in the file is the length of the STSHI as stored in the file. The
current definition of the STSHI structure might be longer or shorter than that stored in the file,
the style sheet reader routine needs to take this into account.
stshi.cstd: The number of styles in this style sheet. There will be stshi.cstd (cbSTD, STD)
pairs in the file following the STSHI. Note: styles can be empty, i.e. cbSTD==0.
stshi.cbSTDBaseInFile: The STD structure (see below) is divided into a fixed-length "base",
and a variable length part. The stshi.cbSTDBaseInFile indicates the size in bytes of the
fixed-length base of the STD as it was written in this file. If the STD base is grown in a future
version, the file format doesn't change, because the style sheet reader can discard parts it
doesn't know about, or use defaults if the file's STD is not as large as it was expecting.
(Currently, stshi.cbSTDBaseInFile is 8.)
stshi.fStdStylenamesWritten: Previous versions of Word did not store the style name if the
style was a built-in style; Word 6.0 stores the style name for compatibility with future versions.
Note: the built-in style names may need to be "regenerated" if the file is opened in a different
language or if stshi.nVerBuiltInNamesWhenSaved doesn't match the expected value.
stshi.stiMaxWhenSaved: This indicates the last built-in style known to the version of Word
that saved this file.
stshi.istdMaxFixedWhenSaved: Each array of styles has some fixed-index styles at the
beginning. This indicates the number of fixed-index positions reserved in the style sheet when
it was saved.
stshi.nVerBuiltInNamesWhenSaved: Since built-in style names are saved with the
document, this provides a way to see if the saved names are the same "version" as the names
in the version of Word that is loading the file. If not, the built-in style names need to be
"regenerated", i.e. the old names need to be replaced with the new.
stshi.rgftcStandardChpStsh: This is a list of the default fonts for this style sheet. The first is
for ASCII characters (0-127), the second is for East Asian characters, and the third is the
default font for non-East Asian, non-ASCII text. See notes on sprmCRgftcX for details.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 37 of 210
fLocked indicates the style is currently locked, meaning it cannot be used in the document as
a result of the Document Protection feature. The index into mpstilsd corresponds to the
index of the style that the LSD structure affects (see std.sti below).
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 38 of 210
STD
Each individual style description is stored in an STD structure as follows:
typedef struct _STD
{ // Base part of STD:
ushort sti : 12; /* invariant style identifier */
ushort fScratch : 1; /* spare field for any temporary use, always
reset back to zero! */
ushort fInvalHeight : 1; /* PHEs of all text with this style are wrong */
ushort fHasUpe : 1; /* UPEs have been generated */
ushort fMassCopy : 1; /* std has been mass-copied; if unused at save time,
style should be deleted */
ushort stk : 4; /* style kind */
ushort istdBase : 12; /* base style */
ushort cupx : 4; /* number of UPXs (and UPEs) */
ushort istdNext : 12; /* next style */
ushort bchUpe; /* offset to end of upx's, start of upe's */
ushort fAutoRedef : 1; /* auto redefine style when appropriate */
ushort fHidden: 1; /* hidden from UI? */
UPE plus the exceptions in the UPX.) A cb of zero indicates an empty slot in the style array, i.e.
no style has that istd. Note: the STD structure may be longer or shorter than the one stored
in the file; stshi.cbSTDBaseInFile indicates the length of the base of the STD (up to
stzName) as stored in the file. The style sheet reader routine must take this into account.
The variable-length part of the STD has three variable-length subparts, the xstzName, the
grupx, and the grupe. Since this doesn‘t fit well into a C structure declaration, some
processing is needed to figure out where one part stops and the next part begins. An important
note is that all variable-length parts and subparts of the STD begin on EVEN-BYTE OFFSETS
within the STD, even if the length of the preceding variable-length part was odd.
std.sti: The sti is an identifier of which built-in style this is, or stiUser for a user-defined
style. An sti is intended to be permanent throughout versions of Word, although new sti's
may be added in new versions. The sti definitions are:
#define stiNormalPara 0 // 0x0000
#define stiHeading1 1 // 0x0001
#define stiHeading2 2 // 0x0002
#define stiHeading3 3 // 0x0003
#define stiHeading4 4 // 0x0004
#define stiHeading5 5 // 0x0005
#define stiHeading6 6 // 0x0006
#define stiHeading7 7 // 0x0007
#define stiHeading8 8 // 0x0008
#define stiHeading9 9 // 0x0009
#define stiHeadingFirst stiHeading1
#define stiHeadingLast stiHeading9
The following Table and List styles were added in Word 2002:
#define stiNormalList 107 // 0x0071 list style
#define stiOutlineList1 108 // 0x0072 list style (1 / a / i)
#define stiOutlineList2 109 // 0x0073 list style (1 / 1.1 / 1.1.1)
#define stiOutlineList3 110 // 0x0074 list style (Article / Section)
#define stiListStyleFirst stiNormalList // First default list style
#define stiListStyleLast stiOutlineList3 // Last default list style
std.xstzName: The name of the style, including aliases. The name is stored as an xstz
(preceded by a length byte, followed by a null-terminator.) A style name can contain multiple
"aliases", separated by commas. Aliases are alternate names for the same style (e.g. a style
named "a,b,c" has three aliases, and can be referred to by "a", "b", or "c", or any
combination.) WinWord 2.x did not have aliases, but Word 5.x for the Macintosh did. If a style
is a built-in style, the built-in style name is always stored first.
All names (and aliases) must be unique within a style sheet (e.g. styles "a,b" and "b,c" should
not exist in the same style sheet, as "b" matches multiple style names.)
A style name (including all its aliases and comma separators) can be up to 253 characters long.
So the xstz format of that name can be up to 255 characters. Style names are case sensitive.
The built-in style names (corresponding to each sti listed previously) are defined for each
language version of Word. For English USA documents, the names are:
1 / 1.1 / 1.1.1 1 / a / i Article / Section
Balloon Text Block Text Body Text
Body Text 2 Body Text 3 Body Text First Indent
Body Text First Indent 2 Body Text Indent Body Text Indent 2
Body Text Indent 3 Caption Closing
Comment Reference Comment Subject Comment Text
Date Default Paragraph Font Document Map
E-mail Signature Emphasis Endnote Reference
Endnote Text Envelope Address Envelope Return
FollowedHyperlink Footer Footnote Reference
Footnote Text Header Heading 1
Heading 2 Heading 3 Heading 4
Heading 5 Heading 6 Heading 7
Heading 8 Heading 9 HTML Acronym
HTML Address HTML Cite HTML Code
HTML Definition HTML Keyboard HTML Preformatted
HTML Sample HTML Typewriter HTML Variable
Hyperlink Index 1 Index 2
Index 3 Index 4 Index 5
Index 6 Index 7 Index 8
Index 9 Index Heading Line Number
List List 2 List 3
List 4 List 5 List Bullet
List Bullet 2 List Bullet 3 List Bullet 4
List Bullet 5 List Continue List Continue 2
List Continue 3 List Continue 4 List Continue 5
List Number List Number 2 List Number 3
List Number 4 List Number 5 Macro Text
Message Header No List Normal
Normal (Web) Normal Indent Note Heading
Page Number Plain Text Salutation
Signature Strong Subtitle
Table 3D effects 1 Table 3D effects 2 Table 3D effects 3
Table Classic 1 Table Classic 2 Table Classic 3
Table Classic 4 Table Colorful 1 Table Colorful 2
Table Colorful 3 Table Columns 1 Table Columns 2
Table Columns 3 Table Columns 4 Table Columns 5
Table Contemporary Table Elegant Table Grid
Table Grid 1 Table Grid 2 Table Grid 3
Table Grid 4 Table Grid 5 Table Grid 6
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 44 of 210
std.cupx: This is the number of UPXs in the std.grupx array. See below.
std.grupx: This is an array [More accurately a ―group‖, because each of the elements (UPXs)
in the array is variable-length] of variable-length UPXs, with std.cupx UPXs in the array. This
array begins after the variable-length xstzName field, at the next even-byte offset within the
STD. A UPX (Universal Property eXception) describes the difference in formatting of this style
as compared to its based-on style. The UPX structure looks like this:
typedef union _UPX
{
struct
{
uchar grpprl[cbMaxGrpprlStyleChpx];
} chpx;
struct
{
ushort istd;
uchar grpprl[cbMaxGrpprlStylePapx];
} papx;
struct
{
uchar grpprl[cbMaxGrpprlForTaps * 8]; // enough for 8 full cnf's
} tapx;
#ifdef STYLERM
UPDRM rm;
#endif //STYLERM
uchar rgb[1];
} UPX;
Each UPX stored in a file is not a complete UPX, rather it is a UPX with all trailing zero bytes
lopped off, and preceded by a ushort length field. So it is stored like:
Field Size Comment
cbUPX 2 bytes Size of the following UPX structure
UPX (cbUPX) Nonzero prefix of a UPX structure
Each UPX begins on an even-byte offset within the STD, even if the length of the previous UPX
(cbUPX) was odd.
The meaning of each UPX depends on the style type (std.stk). For a paragraph style,
std.cupx=2. The first UPX is a paragraph UPX (UPX.papx) and the second UPX is a character
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 45 of 210
UPX (UPX.chpx). For a character style, std.cupx=1, and that UPX is a character UPX
(UPX.chpx). Note that new UPXs may be added in the future, so std.cupx might be larger
than expected. Any UPXs past those expected should be discarded. For a list style,
std.cupx=1. The UPX is a paragraph UPX (UPX.papx). For a table style, std.cupx=3. The
first UPX is a table UPX (UPX.tapx), the second UPX is a paragraph UPX (UPX.papx), and the
third UPX is a character UPX (UPX.chpx). In addition, each style type can contain an additional
UPX containing revision mark information, which is not documented.
The grpprl within each UPX contains the differences of this property type for this style from
the UPE of that property type for the based on style. For example, if two paragraph styles, A
and B, were identical except that B was bold where A was not, and B was based on A, B would
have two UPXs, where the paragraph UPX would have an empty grpprl [Note that the
UPX.papx contains both a grpprl and an istd. Even if the grpprl is empty, the istd is still
needed.], and the character UPX would have a bold sprm in the grpprl. Thus B looks just like
A (since B is based on A), with the exception that B is bold.
std.grupe: This is an array (group) of variable-length UPEs. These are not stored in the
file! Rather, they are constructed using the std.istdBase and std.grupx fields. A UPE
(Universal Property Expansion) describes the ―end-result‖ of the property formatting, i.e. what
the style looks like. The UPE structure is the non-zero prefix of a UPD structure. The UPD
structure looks like this:
typedef union _UPD
{
PAP pap;
CHP chp;
TAPS taps;
struct
{
ushort istd;
uchar cbGrpprl;
uchar grpprl[cbMaxGrpprlStyleChpx];
} chpx;
struct
{
ushort istd;
uchar cbGrpprl;
uchar grpprl[cbMaxGrpprlStylePapx];
} papx;
#ifdef STYLERM
UPDRM rm;
#endif //STYLERM
} UPD;
The std.grupe and std.grupx arrays are similar: there is one UPE for each UPX, and
internally they are stored similarly (a length ushort followed by a non-zero prefix). Note: UPEs
are not stored in the file. The meaning of each UPE depends on the style type (std.sgc). For
a paragraph style, the first UPE is a PAP (UPE.pap) and the second UPE is a CHP (UPE.chp).
For a character style, the first UPE is a CHPX (UPE.chpx). List styles have one UPE, which is
a PAPX (UPE.papx). For a table style the first UPE is a table UPE (UPE.taps), the second UPE
is a paragraph UPE (UPE.pap), and the third UPE is a character UPE (UPE.chp). In addition,
each style type can contain an additional UPE containing revision mark information, which is
not documented.
The UPEs for a style are constructed by taking the UPEs from the based-on style, and applying
the UPXs to them. If the UPEs for the based-on style haven‘t yet been constructed, that style‘s
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 46 of 210
UPE needs to be constructed first. Eventually by following the based-on chain, a style will be
based on the null style (istdNil). The UPEs for the null style are predefined:
The UPE.pap for the null style is all zeros, except fWidowControl which is 1, dyaLine
which is 240, and fMultLinespace which is 1.
The UPE.chp for the null style is all zeros, except istd which is 10 (istdNormalChar),
hps which is 20, lid which is 0x0400, and ftc which is set to the
STSHI.ftcStandardChpStsh.
The UPE.chpx for the null style has an istd of zero, a cbGrpprl of zero (and an empty
grpprl).
So, for a paragraph style, the first UPE is a UPE.pap. It can be constructed by starting with the
first UPE from the based-on style (std.istdBase), and then applying the first UPX
(UPX.papx) in std.grupx to that UPE. To apply a UPX.papx to a UPE.pap, set
UPE.pap.istd equal to UPX.papx.istd, and then apply the UPX.papx.grpprl to
UPE.pap. Similarly, the second UPE is a UPE.chp. It can be constructed by starting with the
second UPE from the based-on style, and then applying the second UPX (UPX.chpx) in
std.grupx to that UPE. To apply a UPX.chpx to a UPE.chp, apply the UPX.chpx.grpprl to
UPE.chp. Note: a UPE.chp for a paragraph style should always have
UPE.chp.istd==istdNormalChar.
For a character style, the first (and only) UPE (a UPE.chpx) can be constructed by starting
with the first UPE from the based-on style (std.istdBase), and then applying the first UPX
(UPX.chpx) in std.grupx to that UPE. To apply a UPX.chpx to a UPE.chpx, take the
grpprl in UPE.chpx.grpprl (which has a length of UPE.chpx.cbGrpprl) and merge the
grpprl in UPX.chpx.grpprl into it. Merging grpprls can be difficult, but for character
styles it is easy because no prls in character style grpprls should interact with each other.
Each prl from the source (the UPX.chpx.grpprl) should be inserted into the destination
(the UPE.chpx.grpprl) so the sprm of each prl is in increasing order, and any prls with
the same sprm are replaced by the prl in the source. UPE.chpx.cbGrpprl is then set to the
length of resulting grpprl, and UPE.chpx.istd is set to the style‘s istd.
For a list style, the first (and only) UPE (a UPE.papx) can be constructed by starting with the
first UPE from the based-on style (std.istdBase), and then applying the first UPX
(UPX.papx) in std.grupx to that UPE. To apply a UPX.papx to a UPE.papx, take the
grpprl in UPE.papx.grpprl (which has a length of UPE.papx.cbGrpprl) and merge the
grpprl in UPX.papx.grpprl into it. Merging grpprls can be difficult. Each prl from the
source (the UPX.papx.grpprl) should be inserted into the destination (the
UPE.papx.grpprl) so the sprm of each prl is in increasing order, and any prls with the
same sprm are replaced by the prl in the source. UPE.papx.cbGrpprl is then set to the
length of resulting grpprl, and UPE.papx.istd is set to the style‘s istd.
So, for a table style, the first UPE is a UPE.taps. It can be constructed by starting with the first
UPE from the based-on style (std.istdBase), and then applying the first UPX (UPX.tapx) in
std.grupx to that UPE. To apply a UPX.tapx to a UPE.taps, set UPE.taps.istd equal to
UPX.tapx.istd, and then apply the UPX.tapx.grpprl to UPE.taps. The second UPE is a
UPE.pap. It can be constructed by starting with the first UPE from the based-on style
(std.istdBase), and then applying the first UPX (UPX.papx) in std.grupx to that UPE. To
apply a UPX.papx to a UPE.pap, set UPE.pap.istd equal to UPX.papx.istd, and then
apply the UPX.papx.grpprl to UPE.pap. Similarly, the third UPE is a UPE.chp. It can be
constructed by starting with the second UPE from the based-on style, and then applying the
second UPX (UPX.chpx) in std.grupx to that UPE. To apply a UPX.chpx to a UPE.chp,
apply the UPX.chpx.grpprl to UPE.chp. Note: a UPE.chp for a table style should always
have UPE.chp.istd==istdNormalChar.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 47 of 210
List Tables
Word 97 and later versions store paragraph numbering information very differently from Word
6.0. In Word 6.0, all information for a paragraph was stored in that paragraph‘s pap.anld. In
Word 97 and later versions, the pap only contains two values: a short ilfo and a byte ilvl,
which indicate which list the paragraph belongs to and which level of that list it is part of,
respectively. The ilfo is actually an index into one of the document‘s list tables: the pllfo,
and the paragraph gets most of its information about appearance from the list tables.
There are three list tables in a word document: the rglst, the hpllfo, and the
hsttbListNames. They are described below in greater detail, and the precise formats of
several of these structures (the LSTF, LVLF, LFO, and LFOLVL) are listed in the appendix.
SPRM Definitions
A sprm is an instruction to modify one or more properties within one of the property defining
data structures (CHP, PAP, TAP, SEP, or PIC). A sprm is a two-byte opcode at offset 0 which
identifies the operation to be performed. If necessary information for the operation can always
be expressed with a fixed length parameter, the fixed length parameter is recorded
immediately after the opcode beginning at offset 2. The length of a fixed length sprm is always
2 plus the size of the sprm‘s parameter. If the parameter for the sprm is variable length, the
count of bytes of the following parameter is stored in the byte at offset 2, followed by the
parameter at offset 3.
Three sprms -- sprmPChgTabs, sprmTDefTable, and sprmTDefTable10 -- can be longer
than 256 bytes. The method for calculating the length of sprmPChgTabs is recorded below
with the description of the sprm. For sprmTDefTable and sprmTDefTable10, the length of
the parameter plus 1 is recorded in the two bytes beginning at offset 2.
For all other variable length sprms, the total length of the sprm is the count recorded at offset
2 plus three (2 for the sprm + 1 for the count byte). The parameter immediately follows the
count.
The sprm value encodes information on the size of the operand, the type of sprm (PAP, CHP,
etc), and whether the sprm requires special handling (in cases where a property value isn‘t
simply replaced).
Sprm bits
(0 = low) Value Details
0-8 ispmd Unique identifier within sgc group
9 fSpec sprm requires special handling
10-12 sgc sprm group; type of sprm (PAP, CHP, etc)
13-15 spra Size of sprm argument (see following table for values)
When parsing a grpprl, you can use the sprm‘s spra value to determine how many bytes are
used by that sprm; it also enables you to skip over sprms you don‘t handle.
Unless otherwise noted, when a sprm is applied to a property the sprm's parameter changes
the old value of the property in question to the value stored in the sprm parameter.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 50 of 210
Paragraph SPRMs
Parameter
Name sprm Property modified Parameter size
In Word 2000,
justification is relative to
text direction (left is left
for left-to-right text and
right for right-to-left
text).
Parameter
Name sprm Property modified Parameter size
Parameter
Name sprm Property modified Parameter size
sprmPBrcLeft70 0x4425 change pap left border for BRC70 word (2 bytes)
Word 95 and earlier
versions
sprmPBrcLeft80 0x6425 change pap left border for BRC80 long (4 bytes)
Word 97 and later
versions
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 53 of 210
Parameter
Name sprm Property modified Parameter size
sprmPBrcTop70 0x4424 change pap top border for BRC70 word (2 bytes)
Word 95 or earlier
versions
sprmPBrcTop80 0x6424 change pap top border for BRC80 long (4 bytes)
Word 97 and later
versions
Parameter
Name sprm Property modified Parameter size
Parameter
Name sprm Property modified Parameter size
Parameter
Name sprm Property modified Parameter size
Character SPRMs
Parameter
Name Sprm Property modified Parameter size
Applies to xchSdtBegin
(―<‖) and xchSdtEnd
(―>‖) characters to
signify that they are
―vanished‖ (hidden).
Parameter
Name Sprm Property modified Parameter size
sprmCFFtcAsciSymb 0x2A10
sprmCKcd 0x2A34
Parameter
Name Sprm Property modified Parameter size
Parameter
Name Sprm Property modified Parameter size
Parameter
Name Sprm Property modified Parameter size
Parameter
Name Sprm Property modified Parameter size
Parameter
Name Sprm Property modified Parameter size
Picture SPRMs
Parameter
Name Sprm Property modified Parameter size
Parameter
Name Sprm Property modified Parameter size
sprmPicSpare4 0xce06
sprmCFOle2WasHere 0xce07
Section SPRMs
Parameter
Name Sprm Property modified Parameter size
Parameter
Name Sprm Property modified Parameter size
Parameter
Name Sprm Property modified Parameter size
Table sprms
Parameter
Name Sprm Property modified Parameter size
sprmTDiagLine80 0xd62a set BRC80 values for complex (see below) variable length
diagonal line in table cell
(East Asian)
sprmTHTMLProps 0x740C
Parameter
Name Sprm Property modified Parameter size
Parameter
Name Sprm Property modified Parameter size
sprmTCellWidth 0xd635 change width tc.wWidth complex (see below) variable length
and tc.ftsWidth
Parameter
Name Sprm Property modified Parameter size
Parameter
Name Sprm Property modified Parameter size
Parameter
Name Sprm Property modified Parameter size
Parameter
Name Sprm Property modified Parameter size
Complex SPRMs
Complex Paragraph SPRMs
sprmPIstdPermute (opcode 0xC601) is a complex sprm which is applied to a piece when the
style codes of paragraphs within a piece must be mapped to other style codes. It has the
following format:
Field Size Comment
istdFirst unsigned short Index of first style in range to which permutation stored in rgistd applies
istdLast unsigned short Index of last style in range to which permutation stored in rgistd applies
rgistd[] unsigned short Array of istd entries that records the mapping of istds for text copied from a
source document to istds that exist in the destination document after the text
was pasted
The sprm is three bytes long and consists of the sprm code and a one byte two‘s complement
value.
If pap.stc is < 1 or > 9, sprmPIncLvl has no effect. Otherwise, if the value stored in the
byte has its highest order bit off, the value is a positive difference which should be added to
pap.istd and pap.lvl and then pap.stc should be set to min(pap.istd, 9). If the byte
value has its highest order bit on, the value is a negative difference which should be sign
extended to a word and then subtracted from pap.istd and pap.lvl. Then pap.stc should
be set to max(1, pap.istd). sprmPIncLvl is only stored in grpprls linked to a piece table.
sprmPIlfo (opcode 0x460B) sets the pap.ilfo. Its argument, an ilfo, is an index into the
document‘s hpllfo, which contains the list data for that paragraph, describing the
appearance of the automatic number at the beginning of the paragraph. A value of zero means
the paragraph is not numbered, and a value of 2047 indicates the paragraph came from a
pre-Word 97 file so the formatting information is still stored in the pap.anld and the
paragraph should be converted to Word 97 format.
sprmPIlvl (opcode (0x260A) sets the pap.ilvl. It takes an index (0 through 8) to indicate
which level of a multilevel list this paragraph belongs to. For simple (one-level lists) or
unnumbered paragraphs, this value should always be zero.
sprmPAnld80 (opcode 0xC63E) is currently only used for compatibility with pre-Word 97
docs. It sets the pap.anld, which before Word 97 described the automatic number at the
beginning of any numbered paragraph. It is used only long enough to put the data into the
document‘s list table (rglst) and set the pap.ilfo to point to the proper entry in the list
table. The pap.anld is only relevant if pap.ilfo==2047 (see sprmPIlfo above).
The sprmPChgTabsPapx (opcode 0xC60D) is a complex sprm that describes changes in tab
settings from the underlying style. It is only stored as part of PAPXs stored in FKPs and in the
STSH. It has the following format:
Field Size Comment
rgdxaDel int[itbdDelMax] Array of tab positions for which tabs should be deleted
rgdxaAdd int[itbdAddMax] Array of tab positions for which tabs should be added
When sprmPChgTabsPapx is interpreted, the rgdxaDel of the sprm is applied first to the
pap that is being transformed. This is done by deleting from the pap the rgdxaTab entry and
rgtbd entry of any tab whose rgdxaTab value is equal to one of the rgdxaDel values in the
sprm. It is guaranteed that the entries in pap.rgdxaTab and the sprm‘s rgdxaDel and
rgdxaAdd are recorded in ascending dxa order. Then the rgdxaAdd and rgtbdAdd entries
are merged into the pap‘s rgdxaTab and rgtbd arrays so the resulting pap rgdxaTab is
sorted in ascending order with no duplicates.
sprmPNest80 (opcode 0x4610) causes its operand, a two-byte dxa value to be added to
pap.dxaLeft for Word 97. If the result of the addition is less than 0, 0 is stored into
pap.dxaLeft. It is used to shift the left indent of a paragraph to the right or left. sprmPNest
is only stored in grpprls linked to a piece table.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 74 of 210
sprmPNest (opcode 0x465f) is the Word 2000 version. The difference is the dxaLeft in Word
2000 is logical (it is left indent for Left-to-right text but right indent for Right-to-left text).
sprmPDyaLine (opcode 0x6412) moves a 4 byte LSPD structure into pap.lspd. Two short
fields are stored in this data structure. The first short in the structure is named lspd.dyaLine
and the second is named lspd.fMultLinespace. When lspd.fMultLinespace is 0, the
magnitude of lspd.dyaLine specifies the amount of space provided for lines in the
paragraph in twips. When lspd.dyaLine is positive, Word ensures that AT LEAST the
magnitude of lspd.dyaLine is reserved on the page for each line displayed in the paragraph.
If the height of a line becomes greater than lspd.dyaLine, the size calculated for that line is
reserved on the page. When lspd.dyaLine is negative, Word ensures that EXACTLY the
magnitude of lspd.dyaLine (-lspd.dyaLine) is reserved on the page for each line
displayed in the paragraph. When lspd.fMultLinespace is 1, Word reserves for each line
the (maximal height of the line*lspd.dyaLine)/240.
The sprmPChgTabs (opcode 0xC615) is a complex sprm which describes changes to tab
settings for any paragraph within a piece. It is only stored as part of a grpprl linked to a piece
table. It has the following format:
Field Size Comment
rgdxaDel int[itbdDelMax] Array of tab positions for which tabs should be deleted
rgdxaAdd int[itbdAddMax] Array of tab positions for which tabs should be added
itbdDelMax and itbdAddMax are defined to be equal to 50. This means that the largest
possible instance of sprmPChgTabs is 354. When the length of the sprm is >= 255, the cch
field will be set equal to 255. When cch==255, the actual length of the sprm can be calculated
as follows: length=2+itbdDelMax*4+itbdAddMax*3.
When sprmPChgTabs is interpreted, the rgdxaDel of the sprm is applied first to the pap that
is being transformed. This is done by deleting from the pap the rgdxaTab entry and rgtbd
entry of any tab whose rgdxaTab value is within the interval
[rgdxaDel[i]-rgdxaClose[i], rgdxaDel[i]+rgdxaClose[i]]. It is guaranteed that
the entries in pap.rgdxaTab and the sprm‘s rgdxaDel and rgdxaAdd are recorded in
ascending dxa order. Then the rgdxaAdd and rgtbdAdd entries are merged into the pap‘s
rgdxaTab and rgtbd arrays so the resulting pap rgdxaTab is sorted in ascending order with
no duplicates.
sprmPPc (opcode 0x261B) is a complex sprm 3 bytes long which describes changes in the
pap.pcHorz and pap.pcVert. It is able to change both fields‘ contents in parallel. It has the
following format:
b10 b16 Field Type Size Bitfield Comments
2 2 short :4 F0 Reserved
istdFirst unsigned short Index of first style in range to which permutation stored in rgistd applies
istdLast unsigned short Index of last style in range to which permutation stored in rgistd applies
rgistd[] unsigned short Array of istd entries that records the mapping of istds for text copied from a
source document to istds that exist in the destination document after the text
was pasted
When Word interprets this sprm, if hpsSize != 0 then chp.hps is set to hpsSize. If cInc
!= 0, the cInc is interpreted as a 7 bit two‘s complement number and the procedure described
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 77 of 210
below for interpreting sprmCHpsInc is followed to increase or decrease the chp.hps by the
specified number of levels. If hpsPos!=128, then chp.hpsPos is set equal to hpsPos. If
fAdjust is on , hpsPos!=128 and hpsPos!=0 and the previous value of chp.hpsPos==0,
then chp.hps is reduced by one level following the method described for sprmCHpsInc. If
fAdjust is on, hpsPos==0 and the previous value of chp.hpsPos!=0, then the chp.hps
value is increased by one level using the method described below for sprmCHpsInc.
sprmCHpsInc(opcode 0x2A44) is a three-byte sprm consisting of the sprm opcode and a
one-byte parameter. Word keeps an ordered array of the font sizes that are defined for the
fonts recorded in the system file with each font size transformed into an hps. The parameter
is a one-byte two‘s complement number. Word uses this number to calculate an index in the
font size array to determine the new hps for a run. When Word interprets this sprm and the
parameter is positive, it searches the array of font sizes to find the index of the smallest entry
in the font size table that is greater than the current chp.hps. It then adds the parameter
minus 1 (-1) to the index and maxes this with the index of the last array entry. It uses the
result as an index into the font size array and assigns that entry of the array to chp.hps.
When the parameter is negative, Word searches the array of font sizes to find the index of the
entry that is less than or equal to the current chp.hps. It then adds the negative parameter
to the index and does a min of the result with 0. The result of the min function is used as an
index into the font size array and that entry of the array is assigned to chp.hps.
sprmCHpsInc is stored only in grpprls linked to piece table entries.
sprmCHpsPosAdj (opcode 0x2A46) causes the hps of a run to be reduced the first time text
is superscripted or subscripted and causes the hps of a run to be increased when
superscripting/subscripting is removed from a run. The one byte parameter of this sprm is the
new hpsPos value to be stored in chp.hpsPos. If hpsPos!=0 (meaning that the text is to be
super/subscripted), Word first examines the current value of chp.hpsPos to see if it is equal
to 0. If so, Word uses the algorithm described for sprmCHpsInc to decrease chp.hps by one
level. If the new hpsPos==0 (meaning the text is not super/subscripted), Word examines the
current chp.hpsPos to see if it is not equal to 0. If it is not (which means text is being restored
to normal position), Word uses the sprmCHpsInc algorithm to increase chp.hps by one level.
After chp.hps is adjusted, the parameter value is stored in chp.hpsPos. sprmCHpsPosAdj
is stored only in grpprls linked to piece table entries.
The parameter of sprmCMajority (opcode 0xCA47) is itself a list of character sprms which
encodes a criterion under which certain fields of the chp are to be set equal to the values
stored in a style‘s CHP. Bytes 0 and 1 of sprmCMajority contains the opcode, byte 2 contains
the length of the following list of character sprms. Word begins interpretation of this sprm by
applying the stored character sprm list to a standard chp. That chp has
chp.istd=istdNormalChar. chp.hps=20, chp.lid=0x0400, and chp.ftc=4. Word then
compares fBold, fItalic, fStrike, fOutline, fShadow, fSmallCaps, fCaps, ftc, hps,
hpsPos, kul, cv, and ico in the original CHP with the values recorded for these fields in the
generated CHP. If a field in the original CHP has the same value as the field stored in the
generated CHP, then that field is reset to the value stored in the style‘s CHP. If the two copies
differ, then the original CHP value is left unchanged. sprmCMajority is stored only in
grpprls linked to piece table entries.
sprmCHpsInc1 (opcode 0xCA4A) is used to increase or decrease chp.hps by increments of
1. This sprm is interpreted by adding the two byte increment stored as the opcode of the sprm
to chp.hps. If this result is less than 8, the chp.hps is set to 8. If the result is greater than
32766, the chp.hps is set to 32766.
sprmCMajority50 (opcode 0xCA4C) has the same format as sprmCMajority and is
interpreted in the same way.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 78 of 210
4 Horizontal, @-font
3 3 itcFirst byte The index of the first cell that is to have its borders
changed.
4 4 itcLim byte Index of the cell that follows the last cell to have
its borders changed
5 5 short :4 F0 Reserved
This sprm changes the brc fields selected by the fChange* flags in the sprm to the brc value
stored in the sprm, for every tap.rgtc entry whose index is greater than or equal to
itcFirst and less than itcLim.sprmTSetBrc is stored only in grpprls linked to piece
table entries.
sprmTSetBrc (opcode 0xD62F) works in the same manner as sprmTSetBrc80 but uses the
new BRC structure introduced in 2000.
sprmTInsert (opcode 0x7621) inserts new cell definitions in an existing table‘s cell structure.
Bytes 0 and 1 of the sprm contain the opcode. Byte 2 is the index within tap.rgdxaCenter
and tap.rgtc at which the new dxaCenter and tc values are inserted. This index is named
itcInsert. Byte 3 contains a count of the cell definitions to add to the tap, named ctc. Bytes
4 and 5 contain the width of the cells to add, named dxaCol. If there are already cells defined
at the index where cells are to be inserted, tap.rgdxaCenter entries at or above this index
must move to the entry ctc higher and must be adjusted by adding ctc*dxaCol to the value
stored. The contents of tap.rgtc at or above the index must be moved 10*ctc bytes higher
in tap.rgtc. If itcInsert is greater than the original tap.itcMac, itcInsert - tap.ctc
columns beginning with index tap.itcMac must be added of width dxaCol (loop from itcMac
to itcMac+itcInsert-tap.ctc adding dxaCol to the rgdxaCenter value of the previous
entry and storing the sum as dxaCenter of the new entry), whose TC entries are cleared to
zeros. Beginning with index itcInsert, ctc columns of width dxaCol must be added by
constructing new tap.rgdxaCenter and tap.rgtc entries with the newly defined rgtc
entries cleared to zeros. Finally, the sum of the number of cells added to the tap is added to
tap.itcMac. sprmTInsert is stored only in grpprls linked to piece table entries.
sprmTDelete (opcode 0x5622) deletes cell definitions from an existing table‘s cell structure.
Bytes 0 and 1 of the sprm contain the opcode. Byte 2 contains the index of the first cell to
delete, named itcFirst. Byte 3 contains the index of the cell that follows the last cell to be
deleted, named itcLim. sprmTDelete causes any rgdxaCenter and rgtc entries whose
index is greater than or equal to itcLim to move to the entry that is itcLim-itcFirst
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 81 of 210
lower, and causes tap.itcMac to decrease by the number of cells deleted. sprmTDelete is
stored only in grpprls linked to piece table entries.
sprmTDxaCol (opcode 0x7623) changes the width of cells whose index is within a certain
range to be a certain value. Bytes 0 and 1 of the sprm contain the opcode. Byte 2 contains the
index of the first cell whose width is to change, named itcFirst. Byte 3 contains the index of
the cell that follows the last cell whose width is to change, named itcLim. Bytes 4 and 5
contain the new width of the cell, named dxaCol. This sprm causes the itcLim-itcFirst
entries of tap.rgdxaCenter to be adjusted so tap.rgdxaCenter[i+1] =
tap.rgdxaCenter[i]+dxaCol. Any tap.rgdxaCenter entries that exist beyond itcLim
are adjusted to take into account the amount added to or removed from the previous columns.
sprmTDxaCol is stored only in grpprls linked to piece table entries.
sprmTMerge (opcode 0x5624) merges the display areas of cells within a specified range.
Bytes 0 and 1 of the sprm contain the opcode. Byte 2 contains the index of the first cell to
merge, named itcFirst. Byte 3 contains the index of the cell that follows the last cell to
merge, named itcLim. This sprm causes tap.rgtc[itcFirst].fFirstMerged to be set
to 1. Cells in the range whose index is greater than itcFirst and less than itcLim have
tap.rgtc[].fMerged set to 1. sprmTMerge is stored only in grpprls linked to piece table
entries.
sprmTSplit (opcode 0x5625) splits the display areas of merged cells into their originally
assigned display areas. Bytes 0 and 1 of the sprm contain the opcode. Byte 2 contains the
index of the first cell to split, named itcFirst. Byte 3 contains the index of the cell that
follows the last cell to split, named itcLim. This sprm clears tap.rgtc[].fFirstMerged
and tap.rgtc[].fMerged for all rgtc entries >= itcFirst and < itcLim. sprmTSplit is
stored only in grpprls linked to piece table entries.
sprmTSetBrc10 (opcode 0xD626) has the same format as sprmTSetBrc but uses the old
BRC10 structure.
sprmTSetShd80 (opcode 0x7627) allows the Word 97 style shading definitions ( SHD80s)
within a tap to be set to new values. Bytes 0 and 1 of the sprm contain the opcode. Byte 2
contains the index of the first cell whose shading is to change, named itcFirst. Byte 3
contains the index of the cell that follows the last cell whose shading is to change, named
itcLim. Bytes 4 and 5 contain the SHD80 structure, named shd80. This sprm causes the
itcLim-itcFirst entries of tap.rgshd to be set to shd. sprmTSetShd is stored only in
grpprls linked to piece table entries.
sprmTSetShdOdd80 (opcode 0x7628) is identical to sprmTSetShd80, but it only changes
the rgshd for odd indices between itcFirst and itcLim. sprmTSetShdOdd80 is stored
only in grpprls linked to piece table entries.
sprmTSetShd (opcode 0xd62d) is identical to sprmTSetShd80 but uses the Word 2000 style
shading structure (SHD) and changes shading structures in tap.rgtc[].shd (so the indices
are indices into the rgtc).
sprmTSetShdOdd (opcode 0xd62e) is identical to sprmTSetShd, but it only changes the
rgshd for odd indices between itcFirst and itcLim.
sprmTVertMerge (opcode 0xD62B) changes the vertical cell merge properties for a cell in the
tap.rgtc[]. Bytes 0 and 1 of the sprm contain the opcode. Byte 2 contains the index of the
cell whose vertical cell merge properties are to change. Byte 3 sets the new vertical cell merge
properties for the cell, a 0 clears both fVertMerge and fVertRestart, a 1 sets fVertMerge
and clears fVertRestart, and a 3 sets both flags. sprmTVertMerge is stored only in
grpprls linked to piece table entries.
sprmTVertAlign (opcode 0xD62C) changes the vertical alignment property in the
tap.rgtc[]. Bytes 0 and 1 of the sprm contain the opcode. Byte 2 contains the index of the
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 82 of 210
first cell whose shading is to change, named itcFirst. Byte 3 contains the index of the cell
that follows the last cell whose shading is to change, named itcLim. This sprm causes the
vertAlign properties of the itcLim-itcFirst entries of tap.rgtc[] to be set to the new
vertical alignment property contained in Byte 4. sprmTVertAlign is stored only in grpprls
linked to piece table entries.
sprmTCellPadding (0xd632), sprmTCellPaddingDefault (0xd634),
sprmTCellPaddingOuter (0xd638), sprmTCellSpacing (0xd631),
sprmTCellSpacingDefault (0xd633), sprmTCellSpacingOuter (0xd637) all have the same
parameter, a CSSA, which always has a length of 6 (size stored in the first byte).
A CSSA looks like this:
struct {
uchar itcFirst;
uchar itcLim
uchar grfbrc;
uchar ftsWidth;
short wWidth;
} CSSA;
The itcFirst and itcLim specify the indexes, respectively, of the first cell affected by the
SPRM and the first cell NOT affected by the SPRM, counting from 0. So if they are 1 and 3, that
means the 2nd and 3rd cell are affected, but not the 4th (index 3) or 1st (index 0). For the
"Default" sprms, the itcs are always 0 and 1, and affect the entire table.
The grfbrc is a bit field that specifies which borders of the affected cells are affected: 0x01
means top, 0x02 means Left, 0x04 means Bottom, and 0x08 means Right.
sprmTCellWidth (0xd635) has the following parameter (size stored in the first byte):
uchar itcFirst;
uchar itcLim
uchar ftsWidth;
short wWidth;
The ftsWidth and wWidth specify the desired width and width units for all cells with index >=
itcFirst and < itcLim.
sprmTTableWidth (0xf614), sprmTWidthAfter (0xf618), sprmTWidthBefore (0xf617),
sprmTWidthIndent (0xf661) all change an ftsWidth and a wWidth value. The first byte is
the ftsWidth and the remaining two bytes are the wWidth value.
sprmTPc (0x360d) behaves just like sprmPPc, but it applies to absolutely positioned tables.
sprmTDiagLine80 (0xd62a). Contains 2 BRC80s for the two diagonal lines for each column.
sprmTDiagLine (0xd630) is the same as sprmTDiagLine80, but with 2 BRC80s instead.
sprmTBrcBottomCv (0xd61c) contains the size of the parameter and an array of itcMax
(64) COLORREFS, one for each TC.rgbrc[ibrcBottom].cv in
tap.rgtc[itcMax].rgbrc[ibrcBottom].cv.
sprmTBrcLeftCv (0xd61b). Same as sprmTBrcBottomCv, but changing cv for
rgtc[itcMax].rgbrc[ibrcLeft].
sprmTBrcRightCv (0xd61d) Same as sprmTBrcBottomCv, but changing cv for
rgtc[itcMax].rgbrc[ibrcRight].
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 83 of 210
or
clxt = 2 clxtPlcfpcd
lcb Count of bytes in piece table
plcfpcd Piece table
The entire CLX would look like this, depending on the number of grpprls:
clxtGrpprl
cb
grpprl (0th grpprl)
clxtGrpprl
cb
grpprl (1st grpprl)
...
clxtPlcfpcd
cb
plcfpcd
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 84 of 210
When the prm in pcds stored in the plcfpcd, contain an igrpprl (index to a grpprl), the
index stored is the order in which that grpprl was stored in the CLX.
with fkp.rgbx.phe are moved into the local PAP. The process thus far has created a PAP that
describes what the paragraph properties of the paragraph were at the last full save. Now apply
any paragraph sprms that were linked to the piece that contains the paragraph‘s paragraph
mark. If pcd.prm.fComplex==0, pcd.prm contains 1 sprm which should only be applied to
the local PAP if it is a paragraph sprm. If pcd.prm.fComplex==1, pcd.prm.igrpprl is the
index of a grpprl in the CLX. If that grpprl contains any paragraph sprms, they should be
applied to the local PAP. After applying all of the sprms for the piece, the local PAP contains the
correct paragraph property values.
When there are n footnotes, the plcffndRef structure consists of n+1 CP entries followed by
n integer flags, named fAuto. The ith CP in the plcffndRef corresponds to the ith fAuto
flag. The CP entries give the locations of footnote references within the main text address
space. The n+1th CP entry contains the value fib.ccpText+fib.ccpFtn+fib.ccpHdr+1.
The fAuto flag contains 1 whenever the footnote reference name is auto-generated by Word.
When a footnote reference name is automatically generated by Word, Word generates the
name by adding 1 to the index number of the reference in the plcffndRef and translating
that number to ASCII text. When the footnote reference is auto generated, the character at
the main text CP position for the footnote reference should be a footnote reference character
(ASCII 5) which has a chp recorded with chp.fSpec=1.
The number of footnotes stored in a Word binary file can be found by
(fib.cbPlcffndTxt/4)-1.
The plcfhdd contains an entry for each kind of header or footer. (The grpfIhdt is no longer
used to find entries in the plcfhdd.) Indices in the plcfhdd are as follows:
0 Header for even pages
1 Header for odd pages
2 Footer for even pages
3 Footer for odd pages
4 Header for first page of section
5 Footer for first page of section
6 Footnote separator
7 Footnote continuation separator
8 Footnote continuation notice
9 Endnote separator
10 Endnote continuation separator
11 Endnote continuation notice
Page Table
Page table information is optional data which is not always stored in a Word binary file. It may
be stored for the main text, footnote text and endnote text. The fib contains three FCPGD
structures (fcpgdMother, fcpgdFtn, fcpgdEdn) which point to where the data is stored.
Each fcpgd points to a PLF of PGD structures and a PLCF of BKD structures. The PLF of PGD
descriptors contains n entries where n is the number of pages in the associated text stream.
The PLC of BKDs contains >= n entries where each entry describes a single break (page break
or otherwise) within the text stream. Each BKD is associated with a PGD and contains an ipgd
which is an index into the PLF of PGDs. To find the CP range of a given page, traverse the BKDs
searching for the first and last BKD which refer to the given page. The CP range of these BKDs
is the CP range of the page.
If a Word document is edited in any way, the fcpgds in the fib should be filled with 0s.
Glossary Files
A Word glossary file is a normal Word binary file with two supplemental files, the sttbfglsy,
the sttbglsystyle and the plcfglsy, also stored in the file. The sttbfglsy contains a list
of the names of glossary entries, the sttbglsystyle contains a list of the style names for
every auto text entry, and the plcfglsy contains a table of beginning positions within the
text address space of the file of the text of glossary entries.
The sttbfglsy begins with an integer count of bytes of the size of the sttbfglsy (includes
the size of the integer count of bytes). If there are n glossary entries defined, n Pascal-type
strings (string preceded by length byte) will follow, concatenated one after the other. Each
string storing one glossary entry name. The collection of glossary entry names must be sorted
in case-insensitive ascending order (i.e. a and A are treated as equal). Also the names date
and time must be included in the list of names. The name of the ith glossary entry is the ith
name defined in the sttbfglsy. The extra field in each entry contains an index on the
sttbglsystyle that indicates the style name of the first paragraph in plcfglsy.
The sttbglsystyle is not sorted and has no duplicates. Each entry has an extra field
indicating how many auto text entries have that style.
If there are n glossary entries, the plcfglsy, will consist of n+2 CP entries. The ith CP entry
will contain the location of the beginning of the text for the ith glossary entry. The i+1st CP
entry will contain the limit CP of the ith glossary entry. The character at a CP position of limit
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 89 of 210
Routing Slip
A routing slip is stored in the main document stream as an RS (Routing Slip) structure followed
by a set of variable length data. After the RS structure are 4 null terminated strings. Each
string is preceded by a short integer containing the string length (including the null
terminator). The strings are: the subject, the message text, status and title. Following these
strings are a variable number (rs.cRecip) of Routing Recipient (RR) records. Each RR record
is immediately followed by a variable number (rr.cb) of bytes containing private data, which
is in turn followed by a null terminated string containing the recipient name.
Auto Summary
For a document for which AutoSummary View is active (specified in the ASUMYI), the
plcfasumy records the result of the last AutoSummary analysis. Each ASUMY in the PLCF
gives the AutoSummary level for the text starting at the corresponding CP. The level must be
non-negative and no greater than the upper bound specified in the ASUMYI. The ASUMYI
specifies the current summary view level. In emphasize view mode, all text at and below the
current summary view level is highlighted. In reduce view mode, all text above the current
summary view level is hidden.
ibstAssocFileNext 0 Unused
Both stMergeField and stCompInfo are variable length character arrays preceded by a
length byte.
Structure Definitions
AnnoTation Reference Descriptor for Word 2000
(ATRDPre10)
Word 2000 Annotation Reference Descriptor, now part of a modified Descriptor structure along
with newly introduced properties.
b10 b16 Field Type Size Bitfield Comments
1 center
2 right justify
3 left and right justify
fTableBreak uns :1 0100 When 1, this indicates that this is a table break.
short
fColumnBreak uns :1 0200 When 1, this indicates that this is a column break.
short
2 2 itcFirst uns :7 007F When bkf.fCol==1, this is the index to the first
short column of a table column bookmark
fPub uns :1 0080 When 1, this indicates that this bookmark is marking
short the range of a Macintosh Publisher section
itcLim uns :7 7F00 When bkf.fCol==1, this is the index to limit column of
short a table column bookmark
4 4 grpfBrc ulong 4
0 none
1 single
2 thick
3 double
5 hairline
6 dot
7 dash large gap
8 dot dash
9 dot dot dash
10 triple
11 thin-thick small gap
12 thick-thin small gap
13 thin-thick-thin small gap
14 thin-thick medium gap
15 thick-thin medium gap
16 thin-thick-thin medium gap
17 thin-thick large gap
18 thick-thin large gap
19 thin-thick-thin large gap
20 wave
21 double wave
22 dash small gap
23 dash dot stroked
24 emboss 3D
25 engrave 3D
codes 64 – 230 represent border art types and are used
only for page borders
dptSpace long :5 1F0000 Width of space to maintain between border and text within
border
Stored in points.
fShadow long :1 200000 When 1, border is drawn with shadow. Must be 0 when BRC
is a substructure of the TC
0 0 dptLineWidth short :8 00FF Width of a single line in 1/8 pt, max of 32 pt.
0 none
1 single
2 thick
3 double
5 hairline
6 dot
7 dash large gap
8 dot dash
9 dot dot dash
10 triple
11 thin-thick small gap
12 thick-thin small gap
13 thin-thick-thin small gap
14 thin-thick medium gap
15 thick-thin medium gap
16 thin-thick-thin medium gap
17 thin-thick large gap
18 thick-thin large gap
19 thin-thick-thin large gap
20 wave
21 double wave
22 dash small gap
23 dash dot stroked
24 emboss 3D
25 engrave 3D
codes 64 – 230 represent border art types and
are used only for page borders
0 Auto
1 Black
2 Blue
3 Cyan
4 Green
5 Magenta
6 Red
7 Yellow
8 White
9 DkBlue
10 DkCyan
11 DkGreen
12 DkMagenta
13 DkRed
14 DkYellow
15 DkGray
16 LtGray
dptSpace short :5 1F00 Width of space to maintain between border and
text within border
Stored in points.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 100 of 210
0 none
1 single
2 thick
3 double
fShadow short :1 2000 When 1, border is drawn with shadow
0 Auto
1 Black
2 Blue
3 Cyan
4 Green
5 Magenta
6 Red
7 Yellow
8 White
9 DkBlue
10 DkCyan
11 DkGreen
12 DkMagenta
13 DkRed
14 DkYellow
15 DkGray
16 LtGray
dxpSpace short :5 1F00 Width of space to maintain between border and
text within border. Stored in points.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 101 of 210
The seven types of border lines that WinWord 1.0 supports are coded with different sets of
values for dxpLine1Width, dxpSpaceBetween, and dxpLine2 Width.
The border lines and their brc10 settings follow:
Line type dxpLine1Width dxpSpaceBetween dxpLine2Width
no border 0 0 0
When the no border settings are stored in the BRC, brc.fShadow and brc.dxpSpace should
be set to 0.
cbBRC10 (count of bytes of BRC10) is 2.
Character Properties (CHP)
The CHP is never stored in Word files. It is the result of decompression operations applied to
CHPXs. For this reason no offsets are shown into the structure. It can be reconstructed at will.
The CHPX is stored in CHPX FKPS and within the STSH.
Note When a CHPX is stored in an FKP it is prefixed by a one-byte count of bytes that records
the size of the non-zero prefix of the CHPX. Since the count of bytes must begin on an even
boundary within the FKP followed by the non-zero prefix, it's guaranteed that the int and FC
fields of the CHPX are aligned on an odd-byte boundary. The best technique for reconstituting
the CHPX is to move the non-zero prefix to the beginning of a local instance of a CHPX that has
been cleared to zeros.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 102 of 210
fBold uns long :1 Text is bold when 1 , and not bold when 0.
fRMarkDel uns long :1 When 1, text has been deleted and will be displayed with strikethrough
when revision marked text is to be displayed
fSmallCaps uns long :1 Displayed with small caps when 1, no small caps when 0
fVanish uns long :1 When 1, text has ―hidden‖ format, and is not displayed unless fPagHidden
is set in the DOP
fRMark uns long :1 When 1, text is newly typed since the last time revision marks have been
accepted and will be displayed with an underline when revision marked
text is to be displayed
fSpec uns long :1 Character is a Word special character when 1, not a special character when
0
fObj uns long :1 Embedded object when 1, not an embedded object when 0
fShadow uns long :1 Character is drawn with a shadow when 1; drawn without shadow when 0
fLowerCase uns long :1 Character is displayed in lower case when 1. No case transformation is
performed when 0. This field may be set to 1 only when chp.fSmallCaps
is 1.
fData uns long :1 When 1, chp.fcPic points to an FFDATA, the data structure binary data
used by Word to describe a form field. The bit chp.fData may only be 1
when chp.fSpec is also 1 and the special character in the document
stream that has this property is a chPicture (0x01).
fOle2 uns long :1 When 1, chp.lTagObj specifies a particular object in the object stream
that specifies the particular OLE object in the stream that should be
displayed when the chPicture fSpec character that is tagged with the
fOle2 is encountered. The bit chp.fOle2 may only be 1 when chp.fSpec
is also 1 and the special character in the document stream that has this
property is a chPicture (0x01).
fEmboss uns long :1 Text is embossed when 1 and not embossed when 0
fImprint uns long :1 Text is engraved when 1 and not engraved when 0
fDStrike uns long :1 Displayed with double strikethrough when 1, no double strikethrough
when 0
fComplexScripts uns long :1 Complex Scripts text that requires special processing to display and
process.
fBiDi uns long :1 Complex Scripts right-to-left text that requires special processing to
display and process (character reordering; contextual shaping; display of
combining characters and diacritics; specialized justification rules; cursor
positioning).
fWebHidden uns long :1 Text should be hidden in Web View when set to 1
0 Auto
1 Black
2 Blue
3 Cyan
4 Green
5 Magenta
6 Red
7 Yellow
8 White
9 DkBlue
10 DkCyan
11 DkGreen
12 DkMagenta
13 DkRed
14 DkYellow
15 DkGray
16 LtGray
pctCharWidth ushort 2 Character scale
Lid lid 2 Language ID (calculated) (see LID table below for values)
rglid[clidChpMax] lid 6 Array of language IDs: convenient index into the next three properties.
LidDefault lid 2 Default language ID (see LID table below for values)
LidFE lid 2 East Asian Language ID (see LID table below for values)
lidBi lid 2 Complex Scripts language ID (see LID table below for values)
0 None
1 Dot
2 Comma
3 Circle
4 Under Dot
fUndetermine uns char :1 Character is undetermined when set to 1
0 means no super/subscripting
1 means text in run is superscripted
2 means text in run is subscripted
fSpecSymbol uns char :1 Used by Word internally
0 none
1 single
2 by word
3 double
4 dotted
5 hidden
6 thick
7 dash
8 dot (not used)
9 dot dash
10 dot dot dash
11 wave
20 kulDottedHeavy
23 kulDashedHeavy
25 kulDotDashHeavy
26 kul2DotDashHeavy
27 kulWaveHeavy
39 kulDashLong
43 kulWaveDouble
55 kulDashLongHeavy
hres uchar 1 Hyphenation rule
0 No hyphenation
1 Normal hyphenation
2 Add letter before hyphen
3 Change letter before hyphen
4 Delete letter before hyphen
5 Change letter after hyphen
6 Delete letter before the hyphen and change the letter preceding the
deleted character
chHres uchar 1 The character that will be used to add or change a letter when chp.ysr is
2,3, 5 or 6
hpsKern ushort 2 Kerning distance for characters in run recorded in half points
ibstRMark ibst 2 Index to author IDs stored in hsttbfRMark. Used when text in run was
newly typed when revision marking was enabled.
0 no animation
1 Las Vegas lights
2 background blink
3 sparkle text
4 marching ants
5 marching red ants
6 shimmer
fDblBdr uns char :1 Used internally by Word
ufel ushort 2 Collection properties represented by itypFELayout and copt (East Asian
layout properties)
0x00 – none
0x01 – Tatenakayoko
0x02 – Warichu
0x04 – Kumimoji
0xFF – All
fTNY uns char :1 Tatenakayoko: Horizontal–in-vertical (range of text in a direction
perpendicular to the text flow) is used when set to 1
fWarichu uns char :1 Two lines in one (text in the group is displayed as two half-height lines
within a line) when set to 1
ftcSym ftc 2 When chp.fSpec is 1 and the character recorded for the run in the
document stream is chSymbol (0x28), chp.ftcSym identifies the font code
of the symbol font that will be used to display the symbol character
recorded in chp.xchSym
xchSym xchar 2 When chp.fSpec==1 and the character recorded for the run in the
document stream is chSymbol (0x28), the character stored chp.xchSym
will be displayed using the font specified in chp.ftcSym
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 107 of 210
lTagObj ulong 4 An object ID for an OLE object, only set if chp.fSpec and chp.fOle2 are
both true, and chp.fObj.
0 No hyphenation
1 Normal hyphenation
2 Add letter before hyphen
3 Change letter before hyphen
4 Delete letter before hyphen
5 Change letter after hyphen
6 Delete letter before the hyphen and change the letter preceding the
deleted character
chHresOld ulong :8 The character that will be used to add or change a letter when chp.hresi
is 2,3, 5 or 6
ibstRMarkDel ibst 2 Index to author IDs stored in hsttbfRMark. Used when text in run was
deleted when revision marking was enabled.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 108 of 210
dttmRMark dttm 4 Date/time at which this run of text was entered/modified by the author
(Only recorded when revision marking is on.)
dttmRMarkDel dttm 4 Date/time at which this run of text was deleted by the author (Only
recorded when revision marking is on.)
istd ushort 2 Index to character style descriptor in the stylesheet that tags this run of
text. When istd is istdNormalChar (10 decimal), characters in run are
not affected by a character style. If chp.istd contains any other value,
chpx of the specified character style are applied to CHP for this run before
any other exceptional properties are applied.
idslRMReason ushort 2 An index to strings displayed as reasons for actions taken by Word‘s
AutoFormat code
idslRMReasonDel ushort 2 An index to strings displayed as reasons for actions taken by Word‘s
AutoFormat code
fHighlight uns short :1 When 1, characters are highlighted with color specified by
chp.icoHighlight.
fChsDiff uns short :1 Pre-Unicode files, char's char set different from FIB char set
fPropRMark uns short 2 When 1, properties have been changed with revision marking on
ibstPropRMark ibst 2 Index to author IDs stored in hsttbfRMark. Used when properties have
been changed when revision marking was enabled.
dttmPropRMark dttm 4 Date/time at which properties of this were changed for this run of text by
the author. (Only recorded when revision marking is on.)
fConflictOrig uchar 1 When chp.wConflict!=0, this is fTrue when text is part of the original
version of text. When fFalse, text is alternative introduced by
reconciliation operation.
wConflict ushort 2 When != 0, index number that identifies all text participating in a particular
conflict incident.
IbstConflict ibst 2 Who made this change for this side of the conflict.
fDispFldRMark byte 1 When 1, the number for a ListNum field is being tracked in
xstDispFldRMark. If that number is different from the current value, the
number has changed.
ibstDispFldRMark ibst 2 Index to author IDs stored in hsttbfRMark. Used when ListNum field
numbering has been changed when revision marking was enabled.
dttmDispFldRMark dttm 4 The date for the ListNum field number change.
xstDispFldRMark xstdispfld 32 The string value of the ListNum field when revision mark tracking began.
fcObjp fc 4 Offset in the data stream indicating the location of OLE object data.
0 lbrNone
1 lbrLeft
2 lbrRight
3 lbrBoth
iuhi long 4 Unknown HTML element
bTransNoProof0 uchar 1 Used internally to handle translating Word 97 lid sprms into no proofing
and Word 2000 sprms.
bTransNoProof1 uchar 1 Ued internally to handle translating Word 97 lid sprms into no proofing
and Word 2000 sprms.
rsidProp RSID 4 Save ID for last time this CHP was revised: a random number associated
with character formatting which improves the accuracy of Word‘s
document merge feature.
rsidText RSID 4 Save ID for last time this text was revised: a random number associated
with the insertion of text which improves the accuracy of Word‘s document
merging.
rsidRMDel RSID 4 Save ID for last time this revision-mark-deleted text was revised: a
random number associated with the tracked deletion of text which
improves the accuracy of Word‘s document merging.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 110 of 210
fHasOldProps uchar 1 Used for character property revision marking. The chp at the time
fHasOldProps is set to 1, the is the old chp.
hplcnf HPL 4 Conditional character formatting for table styles. No language properties
are stored here.
Table of LIDs
1 1 short :8 Reserved
fPMHMainDoc short :1 0004 0 1 when doc is a main doc for Print Merge
Helper, 0 when not; default=0
Default
b10 b16 Field Type Size Bitfield value comment
fLinkStyles short :1 4000 When 1, Word will merge styles from its
template
short :1 Reserved
short :1 Reserved
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 119 of 210
Default
b10 b16 Field Type Size Bitfield value comment
fConvMailMergeE :1 0040
sc
fSupressTopSpac :1 0080 Compatibility option: when 1, suppress
ing extra line spacing at top of page
F000 Reserved
Default
b10 b16 Field Type Size Bitfield value comment
:2 6000 Reserved
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 121 of 210
Default
b10 b16 Field Type Size Bitfield value comment
iGutterPos short :1 8000 Gutter position for this doc: 0 => side;
1 => top
In a file with nFib < 103—for example, documents created with Word 6.0 for Windows—the
DOP would end here. This DOP would have a cbDOP of 84, and a cwDOP of 42.
Files with nFib >= 103, the compatibility options (copts) section at offset 8 was copied here
and expanded. Options marked ―(see above)‖ hold the same value that the same-named field
in the old copts section above had in files with nFib < 103.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 122 of 210
:4 0000F000 (reserved)
:1 00100000 (reserved)
410 19A reserved short :1 0001 Always set to zero when writing
files
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 123 of 210
504 1F8 fMaybeTentativeListInDoc uns :1 00000004 When set to 1, doc may have a
short tentative list in it
504 1F8 fMaybeFitText uns :1 00000006 When set to 1, doc may have fit
short text
504 1F8 fRelyOnCSS_WebOpt uns :1 00000200 When set to 1, rely on CSS for
short formatting
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 125 of 210
504 1F8 fRelyOnVML_WebOpt uns :1 00000400 When set to 1, Rely on VML for
short displaying graphics in browsers
506 1FA fUseLongFileNames_WebOpt uns :1 00000002 Use long file names for
short supporting files
506 1FA fWebOptionsInit uns :1 00001000 When set to 1, the web options
short have been filled in
540 21C verCompatPreW10 uns :16 0000FFFF HTML I/O compatibility level
long
550 226 istdTableDflt uns :16 Default table style for the
long document
596 254 fReadingModeInkLockDown uns :1 0001 Reading mode: ink lock down
short
8 8 dyGridDisplay short :7 007F The number of grid squares (in the y direction) between
each gridline drawn on the screen. 0 means don‘t display
any gridlines in the y direction.
dxGridDisplay short :7 7F00 The number of grid squares (in the x direction) between
each gridline drawn on the screen. 0 means don‘t display
any gridlines in the y direction.
fFollowMargin short :1 8000 If true, the grid will start at the left and top margins and
s ignore xaGrid and yaGrid
char :3 E0 Reserved
fZombieEmbed char :1 02 ==1 when result still believes this field is an EMBED or
LINK field
fResultDirty char :1 04 ==1 when user has edited or formatted the result. == 0
otherwise.
fResultEdited char :1 08 ==1 when user has inserted text into or deleted text
from the result
Since dead fields have no entry in the plcffld, the string in the field code must be used to
determine the field type. All versions of Word ‘97 use English field code strings, except French,
German, and Spanish versions of Word. The strings for all languages for all possible dead fields
are listed below.
flt value English string French string German string Spanish string Field type
4 XE EX XE E Index entry
11 RD RD RD RD Document reference
76 Macro
20 14 fHdr uns :1 0001 1 in the undo doc when shape is from the header
short doc, 0 otherwise (undefined when not in the undo
doc)
wrk uns :4 1E00 Text wrapping mode type (valid only for wrapping
short modes 2 and 4
0 wrap both sides
1 wrap only on left
2 wrap only on right
3 wrap only on largest side
fRcaSimple uns :1 2000 When set, temporarily overrides bx, by, forcing the
short xaLeft, xaRight, yaTop, and yaBottom fields to all
be page relative.
6 6 panose PANOSE
16 10 fs FONTSIGNATURE
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 141 of 210
The FCPGD structure, referenced in the FIB, used internally by Word. This modified version of
the above structure was introduced in Word 2003:
Bitfield Bitfield
Decimal Hex Name Type Size Mask Comments Introduced
0 0x0000 fcPgd long File position where data begins. Word 2003
4 0x0004 lcbPgd ulong Size of data. Ignore fc if lcb is zero. Word 2003
8 0x0008 fcBkd long File position where data begins. Word 2003
12 0x000C lcbBkd ulong Size of data. Ignore fc if lcb is zero. Word 2003
Bitfield Bitfield
Decimal Hex Name Type Size Mask Comments Introduced
20 0x0014 lcbAfd ulong Size of data. Ignore fc if lcb is zero. Word 2003
1118 045E lcbSttbfBkmkFcc Ulong Count of bytes for the Word 2002
above data
1122 0462 fcPlcfBkfFcc FC Offset in table stream of Word 2002
fcc bookmark plc of
cpFirsts
This is internal bookmark
information used by
Word's styles and
formatting feature to
keep track of formatting
use.
1126 0466 lcbPlcfBkfFcc Ulong Count of bytes for the Word 2002
above data
1130 046A fcPlcfBklFcc FC Offset in table stream of Word 2002
fcc bookmark plc of
cpLims
This is internal bookmark
information used by
Word's styles and
formatting feature to
keep track of formatting
use.
1134 046E lcbPlcfBklFcc Ulong Count of bytes for the Word 2002
above data
1138 0472 fcSttbfbkmkBPRep FC Offset in table stream of Word 2002
airs file repair bookmark sttb
This is internal bookmark
information used by
Word's styles and
formatting feature to
keep track of formatting
use.
1142 0476 lcbSttbfbkmkBPRe Ulong Count of bytes for the Word 2002
pairs above data
1146 047A fcPlcfbkfBPRepai FC Offset in table stream of Word 2002
rs file repair bookmark plc
of cpFirsts
This is internal bookmark
information used by
Word's file repair feature
to track repaired
document portions.
1150 047E lcbPlcfbkfBPRepa Ulong Count of bytes for the Word 2002
irs above data
1154 0482 fcPlcfbklBPRepai FC Offset in table stream of Word 2002
rs file repair bookmark plc
of cpLims
This is internal bookmark
information used by
Word's file repair feature
to track repaired
document portions.
1158 0486 lcbPlcfbklBPRepa Ulong Count of bytes for the Word 2002
irs above data
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 162 of 210
1366 556 lcbPlcflvcOldInl ulong Count of bytes for the Word 2003
ine above data.
1370 55A fcPlcflvcNew FC LVC PLC (New View) Word 2003
This is an internal
information cache used
by Word.
1374 55E lcbPlcflvcNew ulong Count of bytes for the Word 2003
above data.
1378 562 fcPlcflvcNewInli FC LVC PLC (New Inline View) Word 2003
ne This is an internal
information cache used
by Word.
1382 566 lcbPlcflvcNewInl ulong Count of bytes for the Word 2003
ine above data.
1386 56A rgpgdbkd[3] FCPGD This is an internal Word 2003
information cache used
by Word.
1386 56A fcpgdMother FCPGD This is an internal Word 2003
information cache used
by Word.
1410 582 fcpgdFtn FCPGD This is an internal Word 2003
information cache used
by Word.
1434 59A fcpgdEdn FCPGD This is an internal Word 2003
information cache used
by Word.
1458 5B2 fcAfd FC This is internal revision Word 2003
mark view information
used by Word.
1462 5B6 lcbAfd ulong Count of bytes for the Word 2003
above data.
1466 5BA cswNew Ushort The number of entries in Word 2003
rgswNew[]
1468 5Bc rgswNew[] Ushort Index to the following Word 2003
properties
1468 5Bc nFib Ushort The actual nFib, moved Word 2003
here because some
readers assumed they
couldn't read any format
with nFib > some
constant
1470 5BE cQuickSavesNew Ushort Because of the above, we Word 2003
need to use cQuickSaves
to prevent Word 97 from
quick saving to Word
2000 files
1
Note: when ccpFtn==0 and ccpHdr==0 and ccpMcr==0 and ccpAtn==0 and ccpEdn==0 and ccpTxbx==0 and
ccpHdrTxbx==0, then fib.fcMac=fib.fcMin+ fib.ccpText. If either ccpFtn!=0 or ccpHdd!=0 or ccpMcr!=0 or
ccpAtn!=0 or ccpEdn!=0 or ccpTxbx!=0 or ccpHdrTxbx==0, then
fib.fcMac=fib.fcMin+fib.ccpText+fib.ccpFtn+fib.ccpHdd+fib.ccpMcr+fib.ccpAtn+fib.ccpEdn+
fib.ccpTxbx+fib.ccpHdrTxbx+1. The single character stored beginning at file position fib.fcMac-1 must always be
a CR character (ASCII 13).
The CHP is never stored in a Word file. It is derived by expanding stored CHPXs.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 168 of 210
The PAP is never stored in a Word file. It is derived by expanding stored PAPXs.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 169 of 210
Hyphenation (HRESI)
Substructure of the CHP. Referenced elsewhere in this document.
b10 b16 Field Type Size Bitfield Comment
0x01 Checked
0x02 The numbering sequence or
format is unsupported (includes tab &
size)
0x04 The list text is not "#."
0x080 Something other than a period is
used
0x10 First line indent mismatch
0x20 The list tab and the dxaLeft don't
match (need table?)
0x40 The hanging indent falls beneath
the number (need plain text)
0x80 A built-in HTML bullet
8 0x08 rgistd[9] array 18 Array of shorts containing the istd‘s linked to each
level of the list, or istdNil (4095) if no style is
linked.
26 0x1A fSimpleLis uns char :1 0x01 True if this is a simple (one-level) list; false if this is
t a multilevel (nine-level) list.
fRestartHd uns char :1 0x02 Word 6.0 compatibility option: true if the list
n should start numbering over at the beginning of
each section
fAutoNum uns char :1 0x04 To emulate Word 6.0 numbering: true if Auto
numbering
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 171 of 210
fPreRTF uns char :1 0x08 When 1, this list was there before we started
reading RTF
fHybrid uns char :1 0x10 When 1, list is a hybrid multilevel/simple
(UI=simple, internal=multilevel)
reserved uns char :3 0xE0 Reserved
0x01 Checked
0x02 The numbering sequence or format is
unsupported (includes tab & size)
0x04 The list text is not "#."
0x080 Something other than a period is used
0x10 First line indent mismatch
0x20 The list tab and the dxaLeft don't match
(need table?)
0x40 The hanging indent falls beneath the
number (need plain text)
0x80 A built-in HTML bullet
12 0xC clfolvl uns char 1 Count of levels whose format is overridden (see
LFOLVL)
0x01 Checked
0x02 The numbering sequence or format is
unsupported (includes tab & size)
0x04 The list text is not "#."
0x080 Something other than a period is used
0x10 First line indent mismatch
0x20 The list tab and the dxaLeft don't match
(need table?)
0x40 The hanging indent falls beneath the
number (need plain text)
0x80 A built-in HTML bullet
15 reserved uns char 1 Reserved
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 172 of 210
fFormattin uns long :1 0x20 True if the formatting is overridden (in which case
g the LFOLVL should contain a pointer to a LVL)
0x01 Checked
0x02 The numbering sequence or format is
unsupported (includes tab & size)
0x04 The list text is not "#."
0x080 Something other than a period is used
0x10 First line indent mismatch
0x20 The list tab and the dxaLeft don't match
(need table?)
0x40 The hanging indent falls beneath the
number (need plain text)
0x80 A built-in HTML bullet
reserved uns long :18 Reserved
26 1A Spare short 2
28 1C PNBR int [9] 36 Numeric value for each level place holder in
NUMRM.xst.
0 0 * short :4 000F
fTableBreaks short :1 0080 Table breaks have been calculated for this page
fColumnBreaks short :1 0200 Column breaks have been calculated for this
page
fNewPage short :1 0800 Page has never been valid since created, must
recalculate the bounds of this page. If this is the
last page, this PGD may really represent many
pages
2 2 short Reserved
If the PHE is stored in a PAP whose fTtp field is set (non-zero), the following structure is used:
b10 b16 Field Type Size Bitfield Comments
4 4 dxaCol long
If there is no paragraph height information stored for a paragraph, all of the fields in the PHE
are set to 0. If a paragraph contains more than 127 lines, the clMac, dylLine variant cannot
be used, so fDiffLines must be set to 1 and the total size of the paragraph stored in
dylHeight. If a paragraph height is greater than 32767 twips, the height cannot be
represented by a PHE so all fields of the PHE must be set to 0.
If a new Word file is created, the PHE of every papx fkp entry created to describe the
paragraphs of the file should be set to 0. If a Word file is altered in place (a character of the file
changed to a new character or a property changed), the paragraph containing the change
must have its papx.phe field set to 0. If this paragraph is in a table row, the PHE in the papx
at the end of the row (indicated by fInTable) must also be set to 0.
0 single
1 thick
2 double
3 shadow
brcp uchar 1 Rectangle border codes
0 none
1 border above
2 border below
15 box around
16 bar to left of paragraph
ilvl uchar 1 When non-zero, list level for this paragraph
pcVert uns char :2 Vertical position code. Specifies coordinate frame to use
when paragraphs are absolutely positioned.
0 vertical position coordinates are relative to margin
1 coordinates are relative to page
2 coordinates are relative to text. This means: relative
to where the next non-APO text would have been placed if
this APO did not exist.
pchorz uns char :2 Horizontal position code. Specifies coordinate frame to
use when paragraphs are absolutely positioned.
0 horizontal position coordinates are relative to
column.
1 coordinates are relative to margin
2 coordinates are relative to page
unused uns char :8 Unused
0 = Exact
1 = At Least
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 177 of 210
fKinsoku uns char 1 When 1, apply Kinsoku rules when performing line
wrapping
fAutoSpaceDE uns char 1 When 1, auto space East Asian and alphabetic characters
fAtuoSpaceDN uns char 1 When 1, auto space East Asian and numeric characters
0 Hanging
1 Centered
2 Roman
3 Variable
4 Auto
fVertical short :1 Used internally by Word
fInnerTableCell uchar 1 When 1, the end of paragraph mark is really an end of cell
mark for a nested table cell
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 178 of 210
fOpenTch uchar 1 Ensure the Table Cell char doesn't show up as zero height
0 left justify
1 center
2 right justify
3 left and right justify
4 distributed
5 Medium
6 List tab
7 High
8 Low
9 Thai distributed
Justification in Word 2000 and above is relative to text
direction (for example, left is left for left-to-right text and
right for right-to-left text).
fNoAllowOverlap uchar 1 When 1, absolutely positioned paragraph cannot overlap
with another paragraph
fPropRMark Uns short 2 When 1, properties have been changed with revision
marking on
itbdMac short 2 Number of tabs stops defined for paragraph. Must be >=
0 and <= 64.
numrm NUMRM 128 Paragraph numbering revision mark data (see NUMRM)
rsid RSID (long) 4 Save ID for last time this PAP was revised
fHasOldProps uchar 1 Used for paragraph property revision marking. The pap at
the time fHasOldProps is set to 1, the is the old pap.
yfti YFTI 13 information about the last table autofit conditional results
0 0 cw byte Count of words for this byte and the following data in
PAPX. The first byte of a PAPX is a count of words when
PAPX is stored in an FKP. If this value is 0, it is a ‗pad‘ byte
and the count is stored in the following byte, Count of
words is used because PAPX in an FKP can contain
paragraph and table sprms.
1/2 1/2 istd uns short Index to style descriptor of the style from which the
paragraph inherits its paragraph and character properties
3/4 3/4 grppr character A list of the sprms that encode the differences between
l array PAP for a paragraph and the PAP for the style used. When
a paragraph bound is also the end of a table row, the PAPX
also contains a list of table sprms which express the
difference of table row's TAP from an empty TAP that has
been cleared to zeros. The table sprms are recorded in the
list after all of the paragraph sprms. See sprms definitions
for list of sprms that are used in PAPXs.
For calculating papx.cw when storing in an FKP: For even-sized grpprls, the grpprl plus
the istd and cw bytes will be an even number of bytes, so we store the count of words for all
three elements in papx.cw. For odd-sized grpprls, the three elements will be an odd number
of bytes, which can‘t be represented with a count of words; so, we store a ‗pad‘ byte of 0 at the
beginning (in the normal cw location), followed by a count that is the size of the grpprl and
istd byte only (since that‘s an even number of bytes). In either case, papx.cw is immediately
followed by the istd and grpprl.
6 6 mfp.mm short
8 8 mfp.xExt short
10 A mfp.yExt short
12 C mfp.hMF short
If a Windows metafile is stored immediately following the PIC structure, the mfp is a Window's
METAFILEPICT structure. See
http://msdn2.microsoft.com/en-us/library/ms649017(VS.85).aspx for more information
about the METAFILEPICT structure and
http://download.microsoft.com/download/0/B/E/0BE8BDD7-E5E8-422A-ABFD-4342ED7AD8
86/WindowsMetafileFormat(wmf)Specification.pdf for Windows Metafile Format specification.
When the data immediately following the PIC is a TIFF filename, mfp.mm==98 If a bitmap is
stored after the pic, mfp.mm==99.
When the PIC describes a bitmap, mfp.xExt is the width of the bitmap in pixels and
mfp.yExt is the height of the bitmap in pixels.
when scaling bitmaps, dxaGoal and dyaGoal may be ignored if the operation would cause the
bitmap to shrink or grow by a non -power-of-two factor.
b10 b16 Field Type Size Bitfield Comments
For all of the Crop values, a positive measurement means the specified border was moved
inward from its original setting and a negative measurement means the border was moved
outward from its original setting.
b10 b16 Field Type Size Bitfield Comments
36 24 dxaCropLeft short The amount the picture has been cropped on the
left in twips
38 26 dyaCropTop short The amount the picture has been cropped on the
top in twips
40 28 dxaCropRight short The amount the picture has been cropped on the
right in twips
42 2A dyaCropBottom short The amount the picture has been cropped on the
bottom in twips
0 0 fNoParaLast Uns short :1 0001 When 1, means that piece contains no end of
paragraph marks
* Uns short :1
0 rgfc FC[] The size of PLCF is cb and the size of the structure stored in plc is cbStruct,
then the number of structure instances stored in PLCF, iMac is given by (cb
-4)/(4 + cbStruct). The number of FCs stored in the PLCF will be iMac + 1.
igrpprl short :15 FFFE Index to a grpprl stored in CLX portion of file
0 0 fRouted short When 1, document has been routed to at least one recipient
4 4 fTrackStatus short When 1, a status message is sent to the originator each time
the document is forwarded to a recipient on the routing list
fAutoPgn char Only for Macintosh compatibility, used only during open, when 1,
sep.dxaPgn and sep.dyaPgn are valid page number locations
fPgnRestart uns char Set to 1 when page numbering should be restarted at the beginning
of this section
fEndNote uns char When 1, footnotes placed at end of section. When 0, footnotes are
placed at bottom of page.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 186 of 210
nLnnMod uns short If 0, no line numbering, otherwise this is the line number modulus
(e.g. if nLnnMod is 5, line numbers appear on line 5, 10, etc.)
dxaPgn short When fAutoPgn ==1, gives the x position of auto page number on
page in twips (for Macintosh compatibility only)
dyaPgn short When fAutoPgn ==1, gives the y position of auto page number on
page in twips (for Macintosh compatibility only)
dmBinFirst uns short Bin number supplied from windows printer driver indicating which
bin the first page of section will be printed
dmBinOther uns short Bin number supplied from windows printer driver indicating which
bin the pages other than the first page of section will be printed
fPropRMark short When 1, properties have been changed with revision marking on
ibstPropRMark short Index to author IDs stored in hsttbfRMark. used when properties
have been changed when revision marking was enabled
dttmPropRMark DTTM Date/time at which properties of this were changed for this run of
text by the author. (Only recorded when revision marking is on.)
dyaHdrTop uns long Y position of top header measured from top edge of page
dyaHdrBottom uns long Y position of bottom header measured from top edge of page
olstAnm OLST Multilevel auto numbering list data (see OLST definition)
fHasOldProps uchar 1 Used for section property revision marking. The sep at the time
fHasOldProps is set to 1, the is the old sep.
2 2 grpprl char[] List of sprms that encodes the differences between the
properties of a section and Word's default section properties
ipat Pattern
0 Automatic
1 Solid
2 5 Percent
3 10 Percent
4 20 Percent
5 25 Percent
6 30 Percent
7 40 Percent
8 50 Percent
9 60 Percent
10 70 Percent
11 75 Percent
12 80 Percent
13 90 Percent
14 Dark Horizontal
15 Dark Vertical
16 Dark Forward Diagonal
17 Dark Backward Diagonal
18 Dark Cross
19 Dark Diagonal Cross
20 Horizontal
21 Vertical
22 Forward Diagonal
23 Backward Diagonal
24 Cross
25 Diagonal Cross
35 2.5 Percent
36 7.5 Percent
37 12.5 Percent
38 15 Percent
39 17.5 Percent
40 22.5 Percent
41 27.5 Percent
42 32.5 Percent
43 35 Percent
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 191 of 210
ipat Pattern
44 37.5 Percent
45 42.5 Percent
46 45 Percent
47 47.5 Percent
48 52.5 Percent
49 55 Percent
50 57.5 Percent
51 62.5 Percent
52 65 Percent
53 67.5 Percent
54 72.5 Percent
55 77.5 Percent
56 82.5 Percent
57 85 Percent
58 87.5 Percent
59 92.5 Percent
60 95 Percent
61 97.5 Percent
62 97 Percent
fExtend short :1
fFirstMerged BF :1 0001 When 1, cell is first cell of a range of cells that have
been merged. When a cell is merged, the display
areas of the merged cells are consolidated and the
text within the cells is interpreted as belonging to one
text stream for purposes of calculating line breaks.
fMerged BF :1 0002 When 1, cell has been merged with preceding cell
fBackward BF :1 0008 For a vertical table cell, text flow is bottom to top
when 1 and is bottom to top when 0
fRotateFont BF :1 0010 When 1, cell has rotated characters (i.e. uses @font)
fNoWrap BF :1 2000 When 1, do not allow text to wrap in the table cell
0 0 brcTL2BR BRC Diagonal border from the top left to the bottom right of the cell.
8 8 brcTR2BL BRC Diagonal border from the top right to the bottom left of the cell.
0 0 itl short Index to Word‘s table of table looks (see itl table below)
2 2 fBorders short :1 0001 When ==1, use the border properties from the selected table look
fShading short :1 0002 When ==1, use the shading properties from the selected table look
fFont short :1 0004 When ==1, use the font from the selected table look
fColor short :1 0008 When ==1, use the color from the selected table look
fBestFit short :1 0010 When ==1, do best fit from the selected table look
fHdrRows short :1 0020 When ==1, apply properties from the selected table look to the
header rows in the table
fLastRow short :1 0040 When ==1, apply properties from the selected table look to the last
row in the table
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 195 of 210
fHdrCols short :1 0080 When ==1, apply properties from the selected table look to the
header columns of the table
fLastCol short :1 0100 When ==1, apply properties from the selected table look to the last
column of the table
0 left justify
1 center
2 right justify
dxaGapHalf long 4 Measures half of the white space that will
be maintained between text in adjacent
columns of a table row. A dxaGapHalf
width of white space will be maintained on
both sides of a column boundary.
dyaRowHeight long 4 When greater than 0. guarantees that the
height of the table will be at least
dyaRowHeight high. When less than 0,
guarantees that the height of the table will
be exactly absolute value of
dyaRowHeight high. When 0, table will be
given a height large enough to represent
all of the text in all of the cells of the table.
Cells with vertical text flow make no
contribution to the computation of the
height of rows with auto or at least height.
Neither do vertically merged cells, except
in the last row of the vertical merge. If an
auto height row consists entirely of cells
which have vertical text direction or are
vertically merged, and the row does not
contain the last cell in any vertical cell
merge, then the row is given height equal
to that of the end of cell mark in the first
cell.
fCantSplit uchar 1 When 1, table row may not be split across
page bounds
fKeepFollow unsigned short :1 When set to 1, keep this row with the
following row
rsid RSID 4 Save ID for last time this TAP was revised
10 A long Reserved
18 12 txidUndo long
0 0 fn short
4 4 lvl short
DataSpaces
Every rights managed file contains a new storage named ―\006DataSpaces‖ which contains
meta information used to help manage the process of protecting the content within the
document.More information can be found at
http://msdn2.microsoft.com/en-us/library/aa767782(VS.85).aspx. The most important
content in this storage is the information under the ―TransformInfo‖ storage. This storage
contains the issuance licenses and end-user licenses required to protect and open a rights
managed file.
DRMContent
The new stream named ―\011DRMContent‖ contains the encrypted binary content of the Word
document. The format of this stream contains a series of encrypted bytes. When you decrypt
the whole stream and open the resultant byte stream as a compound storage, then that
storage will contain all streams and substorages that are found in a normal Word document,
using the exact same binary file format as a non-IRM-protected Word file. Only the encrypted
streams are located inside of this storage. The unencrypted streams (for example Document
Summary Information and the macro stream in Word and Microsoft Office Excel®) are not
stored inside this storage. They are found unencrypted off of the root of the document‘s
storage.
DRMViewerContent
The final new stream that may exist within an IRM-protected file is the optional
―\011DRMViewerContent‖ stream which contains a compressed, encrypted MHTML stream for
users of the Rights Management Add-on for Internet Explorer. This is the option for users who
need to see IRM protected content but do not have access to an IRM enabled Office client.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 205 of 210
/* “x” wmf and PICT data sizes immediately follow as 2 four-byte longs */
};
After reading the PIC structure from the picture data block, the reader should skip cbMETAHDR
bytes (the size of a standard Windows metafile header). It should then compare the next
cbWmfXBegin bytes in the picture data block against the bytes in the rgbWmfXBegin array
above. If they do not match, the picture is a normal picture—Windows metafile, bitmap or
TIFF.
If they do match, then the reader should read the next 8 bytes in the picture data block as two
4-byte ―long‖s (Intel 80x86 byte order). These numbers are the sizes (in bytes) of the ―x‖
metafile and the Macintosh PICT data, respectively. The size of the ―x‖ metafile is measured
from its start immediately after the PIC structure. It is possible for the PICT‘s size to be zero.
In this case, there is no PICT data, and the reader may use the ―x‖ Windows metafile as the
picture‘s representation.
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 206 of 210
The table below defines the classification of various ranges of Unicode characters:
Unicode subrange Character range Classification
0x4a00->0x4dff
usrReserved1 0xd800
usrReserved2
The table below describes the behavior of the Unicode subrange usrLatin1. Shared
characters are marked in this table with a 1, while characters marked with a 0 are considered
―non-East Asian‖. All other characters in this Unicode subrange are considered ―non-East
Asian‖.
// 0 1 2 3 4 5 6 7 8 9 a b c d e f
0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, // 0x00a0-0x00af
1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, // 0x00b0-0x00bf
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00c0-0x00cf
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00d0-0x00df
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00e0-0x00ef
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, // 0x00f0-0x00ff
};
The table below describes the behavior of the Unicode range usrLatinXA. Shared characters
are marked in this table with a 1, while characters marked with a 0 are considered ―non-East
Asian‖. All other characters in this Unicode subrange are considered ―non-East Asian‖.
// 0 1 2 3 4 5 6 7 8 9 a b c d e f
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x0100-0x010f
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, // 0x0110-0x011f
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, // 0x0120-0x012f
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x0130-0x013f
0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, // 0x0140-0x014f
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x0150-0x015f
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, // 0x0160-0x016f
Microsoft Office Word 97-2007 Binary File Format (.doc) Specification Page 209 of 210
In usrLatinXB shared characters are 0x192, 0x1FA, 0x1FB, 0x1FC, 0x1FD, 0x1FE and
0x1FF. All other characters in this Unicode subrange are considered ―non-East Asian‖.
In usrIPAExtensions shared characters are 0x251, and 0x261.
An optimization is available. If the East Asian font chp.ftcFE=0 and chp.idctHint=0 and
chp.ftcAscii=chp.ftcOther, the font is chp.ftcAscii and the language is
chp.lidDefault.