Library (Computing) : Talk Page Wikiproject Computer Science Computer Science Portal
Library (Computing) : Talk Page Wikiproject Computer Science Computer Science Portal
Libraries contain code and data that provide services to independent programs. This allows the
sharing and changing of code and data in a modular fashion. Someexecutables are both standalone
programs and libraries, but most libraries are not executables. Executables and libraries make
references known as links to each other through the process known as linking, which is typically done
by a linker.
As of 2009, most modern software systems provide libraries that implement the majority of system
services. Such libraries have commoditized the services which a modern application requires. As
such, most code used by modern applications is provided in these system libraries.
Contents
[hide]
1 History
2 Types
o 2.1 Static libraries
o 2.2 Dynamic linking
2.2.1 Relocation
2.2.2 Locating libraries at
runtime
2.2.2.1 Unix-like
systems
2.2.2.2 Microsoft
Windows
2.2.2.3 OpenStep
2.2.2.4 AmigaOS
2.2.3 Shared libraries
2.2.4 Dynamic loading
o 2.3 Remote libraries
3 Naming
4 See also
5 References
6 External links
[edit]History
COBOL also included "primitive capabilities for a library system" in 1959 [2], but Jean
Sammet described them as "inadequate library facilities" in retrospect. [3]
Another major contributor to the modern library concept came in the form of
the subprogram innovation of FORTRAN. FORTRAN subprograms can be compiled independently of
each other, but the compiler lacks a linker, so type checking between subprograms is impossible.[4]
Finally, historians of the concept should remember the influential Simula 67. Simula was the
first object-oriented programming language, and its classes are nearly identical to the modern concept
as used in Java, C++, and C#. The class concept of Simula was also a progenitor of
thepackage in Ada and the module of Modula-2.[5] Even when developed originally in 1965, Simula
classes could be included in library files and added at compile time. [6]
[edit]Types
[edit]Static libraries
Main article: Static Library
Originally, only static libraries existed. A static library, also known as an archive, consists of a set
of routines which are copied into a target application by the compiler, linker, or binder,
producing object files and a stand-alone executable file. This process, and the stand-alone executable
file, are known as a static build of the target application. Actual addresses for jumps and other routine
calls are stored in a relative or symbolic form which cannot be resolved until all code and libraries are
assigned final static addresses.
The linker resolves all of the unresolved addresses into fixed or relocatable addresses (from a
common base) by loading all code and libraries into actual runtime memory locations. This linking
process can take as much, or more, time than the compilation process, and must be performed when
any of the modules is recompiled. Most compiled languages have a standard library, but programmers
can also create their own custom libraries.
A linker may work on specific types of object files, and thus require specific (compatible) types of
libraries. Collecting object files into a static library may ease their distribution and encourage their use.
A client, either a program or a library subroutine, accesses a library object by referencing just
its name. The linking process resolves references by searching the libraries in the order given.
Usually, it is not considered an error if a name can be found multiple times in a given set of libraries.
Some programming languages may use a feature called "smart linking" where the linker is aware of or
integrated with the compiler, such that the linker "knows" how external references are used, and code
in a library that is never actually used, even though internally referenced, can be discarded from the
compiled application. For example, a program that only uses integers for arithmetic, or does no
arithmetic operations at all, can exclude the floating-point library routines. This smart-linking feature
can lead to smaller application file sizes and reduced memory usage.
[edit]Dynamic linking
Dynamic linking involves loading the subroutines of a library (which may be referred to as a DLL,
especially under Windows, or as a DSO(dynamic shared object) under Unix-like systems) into an
application program at load time or runtime, rather than linking them in at compile time. Only a
minimum amount of work is done at compile time by the linker; it only records what library routines the
program needs and the index names or numbers of the routines in the library. The majority of the
work of linking is done at the time the application is loaded (loadtime) or during execution (runtime).
The necessary linking code, called a loader, is actually part of the underlying operating system. At the
appropriate time the loader finds the relevant DLLs or DSOs and adds the relevant data to
the process's memory space.
On many computer and OS architectures, such linking can improve upon memory usage because
there only needs to be one copy of the DLL or DSO in memory for multiple processes to have access
to it.
Some operating systems[which?] can only link in a library at loadtime, before the process starts
executing; others can wait until after the process has started to execute and link in the library just
when it is actually referenced (i.e., at runtime). The latter is often called "delay loading" or "deferred
loading". Non-substitutable DLLs can cause runtime problems. When software is built such that DLLs
can be replaced by other DLLs with a similar interface (but different functionality), that software may
be said to have a plugin architecture, and the libraries may be referred to as plugins.
The increasing use of dynamic linkage has implications for software licensing. For example, the GPL
linking exception allows programs which do not license themselves under GPL to link to libraries
licensed under GPL without thereby becoming subject to GPL requirements.
[edit]Relocation
The loader must solve one problem: the actual location in memory of the library data cannot be known
until the executable and all dynamically linked libraries have been loaded into memory. This is
because the memory locations used depend on which specific dynamic libraries have been loaded. It
is not possible to depend on the absolute location of the data in the executable, nor even in the
library, since conflicts between different libraries would result: if two of them specified the same or
overlapping addresses, it would be impossible to use both in the same program.
However, in practice, the shared libraries on most systems do not change often. Therefore, systems
can compute a likely load address for each shared library on the system before it is needed, and store
that information in the libraries and executables. If every shared library that is loaded has undergone
this process, then each will load at its predetermined address, which speeds up the process of
dynamic linking. This optimization is known as prebinding in Mac OS X and prelinking in Linux.
Disadvantages of this technique include the time required to precompute these addresses every time
the shared libraries change, the inability to use address space layout randomization, and the
requirement of sufficient virtual address space for use (a problem that will be alleviated by the
adoption of 64-bit architectures, at least for the time being).
The library itself contains a jump table of all the methods within it, known as entry points. Calls into the
library "jump through" this table, looking up the location of the code in memory, then calling it. This
introduces overhead in calling into the library, but the delay is so small as to be negligible.
Dynamic linkers/loaders vary widely in functionality. Some depend on the executable storing explicit
paths to the libraries. Any change to the library naming or layout of the file system will cause these
systems to fail. More commonly, only the name of the library (and not the path) is stored in the
executable, with the operating system supplying a method to find the library on-disk based on some
algorithm.
One of the biggest disadvantages of dynamic linking involves the executables depending on the
separately stored libraries in order to function properly. If the library is deleted, moved, or renamed, or
if an incompatible version of the DLL is copied to a place that is earlier in the search, the executable
would fail to load. On Windows this is commonly known as DLL hell.
[edit]Unix-like systems
Most Unix-like systems have a "search path" specifying file system directories in which to look for
dynamic libraries. Some systems specify the default path in a configuration file; others hard-code it
into the dynamic loader. Some executable file formats can specify additional directories in which to
search for libraries for a particular program. This can usually be overridden with an environment
variable, although it is disabled for setuid and setgid programs, so that a user can't force such a
program to run arbitrary code with root permissions. Developers of libraries are encouraged to place
their dynamic libraries in places in the default search path. On the downside, this can make
installation of new libraries problematic, and these "known" locations quickly become home to an
increasing number of library files, making management more complex.
[edit]Microsoft Windows
Microsoft Windows will check the registry to determine the proper place to find an ActiveX DLL, but for
other DLLs it will check the directory where it loaded the program from; the current working directory;
any directories set by calling the SetDllDirectory() function; the System32, System, and
Windows directories; and finally the directories specified by the PATH environment variable.
[8]
Applications written for the .NET Framework framework (since 2002), also check the Global
Assembly Cache as the primary store of shared dll files to remove the issue of DLL hell.
[edit]OpenStep
OpenStep used a more flexible system, collecting a list of libraries from a number of known locations
(similar to the PATH concept) when the system first starts. Moving libraries around causes no
problems at all, although users incur a time cost when first starting the system.
[edit]AmigaOS
[edit]Shared libraries
In addition to identifying static and dynamic loading, computer scientists also often classify libraries
according to how they are shared among programs. Dynamic libraries almost always offer some form
of sharing, allowing the same library to be used by multiple programs at the same time. Static
libraries, by definition, cannot be shared. The term "linker" comes from the process of copying
procedures or subroutines which may come from "relocatable" libraries and adjusting or "linking" the
machine address to the final locations of each module.
The term shared library conveys some ambiguity because it covers at least two different concepts.
First, it is the sharing of code located on disk by unrelated programs. The second concept is the
sharing of code in memory, when programs execute the same physical page of RAM, mapped into
different address spaces. It would seem that the latter would be preferable, and indeed it has a
number of advantages. For instance on the OpenStep system, applications were often only a few
hundred kilobytes in size and loaded almost instantly; the vast majority of their code was located in
libraries that had already been loaded for other purposes by the operating system. [citation needed] There is
a cost, however; shared code must be specifically written to run in a multitasking environment. In
some older environments such as 16-bit Windows or MPE for the HP 3000, only stack based data
(local) was allowed, or other significant restrictions were placed on writing a DLL.
Programs can accomplish RAM sharing by using position independent code as in Unix, which leads to
a complex but flexible architecture, or by using position dependent code as in Windows and OS/2.
These systems make sure, by various tricks like pre-mapping the address space and reserving slots
for each DLL, that code has a great probability of being shared. Windows DLLs are not shared
libraries in the Unix sense. (A third alternative is single-level store, as used by the IBM System/38 and
its successors. This allows position-dependent code but places no significant restrictions on where
code can be placed or how it can be shared.) The rest of this section concentrates on aspects
common to both variants.
As of 2009, most modern operating systems can have shared libraries of the same format as the
"regular" executables. This offers two main advantages: first, it requires making only one loader for
both of them, rather than two (having the single loader is considered well worth its added complexity).
Secondly, it allows the executables also to be used as DLLs, if they have a symbol table. Typical
executable/DLL formats are ELF and Mach-O (both in Unix) and PE (Windows). In Windows, the
concept was taken one step further, with even system resources such as fonts being bundled in the
DLL file format. The same is true under OpenStep, where the universal "bundle" format is used for
almost all system resources.
In some cases different versions of DLLs can cause problems, especially when DLLs of different
versions have the same file name, and different applications installed on a system each require a
specific version. Such a scenario is known as DLL hell. Most modern operating systems, after 2001,
have clean-up methods to eliminate such situations.
[edit]Dynamic loading
Dynamic loading, a subset of dynamic linking, involves a dynamically linked library loading and
unloading at run-time on request. Such a request may be made implicitly at compile-time or explicitly
at run-time. Implicit requests are made at compile-time when a linker adds library references that
include file paths or simply file names. Explicit requests are made when applications make direct calls
to an operating system's API at runtime.
Most operating systems that support dynamically linked libraries also support dynamically loading
such libraries via a run-time linker API. For instance, Microsoft Windows uses the API
functions LoadLibrary, LoadLibraryEx, FreeLibrary and GetProcAddress withMicrosoft
Dynamic Link Libraries; POSIX based systems, including most UNIX and UNIX-like systems,
use dlopen, dlclose and dlsym. Some development systems automate this process.
[edit]Remote libraries
Another solution to the library issue comes from using completely separate executables (often in
some lightweight form) and calling them using a remote procedure call (RPC) over a network to
another computer. This approach maximizes operating system re-use: the code needed to support the
library is the same code being used to provide application support and security for every other
program. Additionally, such systems do not require the library to exist on the same machine, but can
forward the requests over the network.
However, such an approach means that every library call requires a considerable amount of
overhead. RPC calls are much more expensive than calling a shared library which has already been
loaded on the same machine. This approach is commonly used in a distributed architecture which
makes heavy use of such remote calls, notably client-server systems and application servers such
as Enterprise JavaBeans.
At the same time many developers worked on the idea of multi-tier programs, in which a "display"
running on a desktop computer would use the services of a mainframe or minicomputer for data
storage or processing. For instance, a program on a GUI-based computer would send messages to a
minicomputer to return small samples of a huge dataset for display. Remote procedure calls already
handled these tasks, but there was no standard RPC system.
Soon the majority of the minicomputer and mainframe vendors instigated projects to combine the two,
producing an OOP library format that could be used anywhere. Such systems were known as object
libraries, or distributed objects if they supported remote access (not all did). Microsoft's COM is an
example of such a system for local use, DCOM a modified version that supports remote access.
For some time object libraries held the status of the "next big thing" in the programming world. There
were a number of efforts to create systems that would run across platforms, and companies competed
to try to get developers locked into their own system. Examples includeIBM's System Object
Model (SOM/DSOM), Sun Microsystems' Distributed Objects Everywhere (DOE), NeXT's Portable
Distributed Objects(PDO), Digital's ObjectBroker, Microsoft's Component Object
Model (COM/DCOM), and any number of CORBA-based systems.
After the inevitable cooling of marketing hype, object libraries continue to be used in both object-
oriented programming and distributed information systems. Class libraries are the rough OOP
equivalent of older types of code libraries. They contain classes, which describe characteristics and
define actions (methods) that involve objects. Class libraries are used to create instances, or objects
with their characteristics set to specific values. In some OOP languages, like Java, the distinction is
clear, with the classes often contained in library files (like Java's JAR file format) and the instantiated
objects residing only in memory (although potentially able to be made persistent in separate files). In
others, like Smalltalk, the class libraries are merely the starting point for a system image that includes
the entire state of the environment, classes and all instantiated objects.