Network Programming (NP) Unit Wise Materials
Network Programming (NP) Unit Wise Materials
NETWORK
PROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
2|Page NETWORKPROGRAMMING
UNIT-I
When writing programs that communicate across a computer network, one must first invent a
protocol, an agreement on how those programs will communicate. Before delving into the
design details of a protocol, high-level decisions must be made about which program is
expected to initiate communication and when responses are expected. For example, a Web
server is typically thought of as a long-running program (or daemon) that sends network
messages only in response to requests coming in from the network. The other side of the
protocol is a Web client, such as a browser, which always initiates communication with the
server. This organization into client and server is used by most network-aware applications.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
3|Page NETWORKPROGRAMMING
OSI Model
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
4|Page NETWORKPROGRAMMING
A common way to describe the layers in a network is to use the International Organization for
Standardization (ISO) open systems interconnection (OSI) model for computer
communications. This is a seven-layer model, along with the approximate mapping to the
Internet protocol suite.
The sockets programming interfaces described are interfaces from the upper three layers (the
"application") into the transport layer. Why do sockets provide the interface from the upper
three layers of the OSI model into the transport layer? There are two reasons for this design:
First, the upper three layers handle all the details of the application (FTP, Telnet, or HTTP,
for example) and know little about the communication details. The lower four layers know
little about the application, but handle all the communication details: sending data, waiting
for acknowledgments, sequencing data that arrives out of order, calculating and verifying
checksums, and so on. The second reason is that the upper three layers often form what is
called a user process while the lower four layers are normally provided as part of the
operating system (OS) kernel. Unix provides this separation between the user process and the
kernel, as do many other contemporary operating systems. Therefore, the interface between
layers 4 and 5 is the natural place to build the API.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
5|Page NETWORKPROGRAMMING
represents SOCKET
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
6|Page NETWORKPROGRAMMING
IPv4 Internet Protocol version 4. IPv4, which we often denote as just IP, has been the
workhorse protocol of the IP suite since the early 1980s. It uses 32-bit addresses. IPv4
provides packet delivery service for TCP, UDP, SCTP, ICMP, and IGMP.
IPv6 Internet Protocol version 6. IPv6 was designed in the mid-1990s as a replacement for
IPv4. The major change is a larger address comprising 128 bits, to deal with the explosive
growth of the Internet in the 1990s. IPv6 provides packet delivery service for TCP, UDP,
SCTP, and ICMPv6. We often use the word "IP" as an adjective, as in IP layer and IP
address, when the distinction between IPv4 and IPv6 is not needed.
UDP User Datagram Protocol. UDP is a connectionless protocol, and UDP sockets are an
example of datagram sockets. There is no guarantee that UDP datagrams ever reach their
intended destination. As with TCP, UDP can use either IPv4 or IPv6.
ICMP Internet Control Message Protocol. ICMP handles error and control information
between routers and hosts. These messages are normally generated by and processed by the
TCP/IP networking software itself, not user processes, although we show the ping and
traceroute programs, which use ICMP. We sometimes refer to this protocol as ICMPv4 to
distinguish it from ICMPv6.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
7|Page NETWORKPROGRAMMING
IGMP Internet Group Management Protocol. IGMP is used with multicasting, which is
optional with IPv4.
ARP Address Resolution Protocol. ARP maps an IPv4 address into a hardware address
(such as an Ethernet address). ARP is normally used on broadcast networks such as Ethernet,
token ring, and FDDI, and is not needed on point-to-point networks.
RARP Reverse Address Resolution Protocol. RARP maps a hardware address into an IPv4
address. It is sometimes used when a diskless node is booting.
ICMPv6 Internet Control Message Protocol version 6. ICMPv6 combines the functionality
of ICMPv4, IGMP, and ARP.
BPF BSD packet filter. This interface provides access to the datalink layer. It is normally
found on Berkeley-derived kernels.
DLPI Datalink provider interface. This interface also provides access to the datalink layer. It
is normally provided with SVR4.
We use the terms "IPv4/IPv6 host" and "dual-stack host" to denote hosts that
support both IPv4 and IPv6.
The User Datagram Protocol (UDP) provides a connectionless, unreliable transport service.
Connectionless means that a communication session between hosts is not established before
exchanging data. UDP is often used for communications that use broadcast or multicast
Internet Protocol (IP) packets. The UDP connectionless packet delivery service is unreliable
because it does not guarantee data packet delivery or send a notification if a packet is not
delivered.
Because delivery of UDP packets is not guaranteed, applications that use this protocol must
supply their own mechanisms for reliability if necessary. Although UDP appears to have
some limitations, it is useful in certain situations.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
8|Page NETWORKPROGRAMMING
Each UDP datagram has a length. The length of a datagram is passed to the receiving
application along with the data.
Three-Way Handshake
1. The server must be prepared to accept an incoming connection. This is normally done
by calling socket, bind, and listen and is called a passive open.
2. The client issues an active open by calling connect. This causes the client TCP to send
a "synchronize" (SYN) segment, which tells the server the client's initial sequence
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
9|Page NETWORKPROGRAMMING
number for the data that the client will send on the connection. Normally, there is no
data sent with the SYN; it just contains an IP header, a TCP header, and possible TCP
options (which we will talk about shortly).
3. The server must acknowledge (ACK) the client's SYN and the server must also send
its own SYN containing the initial sequence number for the data that the server will
send on the connection. The server sends its SYN and the ACK of the client's SYN in
a single segment.
4. The client must acknowledge the server‘s SYN.
1. One application calls close first, and we say that this end performs the active close.
This end's TCP sends a FIN segment, which means it is finished sending data.
2. The other end that receives the FIN performs the passive close. The received FIN is
acknowledged by TCP. The receipt of the FIN is also passed to the application as an
endof- file (after any data that may have already been queued for the application to
receive), since the receipt of the FIN means the application will not receive any
additional data on the connection.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
10 | P a g e NETWORKPROGRAMMING
3. Sometime later, the application that received the end-of-file will close its socket. This
causes its TCP to send a FIN.
4. The TCP on the system that receives this final FIN (the end that did the active close)
acknowledges the FIN.
Since a FIN and an ACK are required in each direction, four segments are normally required.
We use the qualifier "normally" because in some scenarios, the FIN in Step 1 is sent with
data. Also, the segments in Steps 2 and 3 are both from the end performing the passive close
and could be combined into one segment.
Undoubtedly, one of the most misunderstood aspects of TCP with regard to network
programming is its TIME_WAIT state. The end that performs the active close goes through
this state. The duration that this endpoint remains in this state is twice the maximum segment
lifetime (MSL), sometimes called 2MSL.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
11 | P a g e NETWORKPROGRAMMING
Every implementation of TCP must choose a value for the MSL. The recommended value in
RFC 1122 [Braden 1989] is 2 minutes, although Berkeley-derived implementations have
traditionally used a value of 30 seconds instead. This means the duration of the TIME_WAIT
state is between 1 and 4 minutes. The MSL is the maximum amount of time that any given IP
datagram can live in a network. We know this time is bounded because every datagram
contains an 8-bit hop limit with a maximum value of 255. Although this is a hop limit and not
a true time limit, the assumption is made that a packet with the maximum hop limit of 255
cannot exist in a network for more than MSL seconds.
The way in which a packet gets "lost" in a network is usually the result of routing anomalies.
A router crashes or a link between two routers goes down and it takes the routing protocols
seconds or minutes to stabilize and find an alternate path. During that time period, routing
loops can occur (router A sends packets to router B, and B sends them back to A) and packets
can get caught in these loops. In the meantime, assuming the lost packet is a TCP segment,
the sending TCP times out and retransmits the packet, and the retransmitted packet gets to the
final destination by some alternate path. But sometime later (up to MSL seconds after the lost
packet started on its journey), the routing loop is corrected and the packet that was lost in the
loop is sent to the final destination. This original packet is called a lost duplicate or a
wandering duplicate. TCP must handle these duplicates.
It should be noted that the exchange is really two independent exchanges and it is possible to
close the connection in one direction but not the other. This is known as a half close. The
following example (due to Stevens) demonstrates the use of the half-close.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
12 | P a g e NETWORKPROGRAMMING
The problem here is that the sort program on the remote host will not start sorting the data
until it has read all the data, this event is indicated by the local host closing the connection
and the sort program responding to the corresponding EOF indication. However, the "back"
connection must remain open for the return of data.
Stevens suggests that the library call shutdown() be used with sockets programming to
achieve a half close.
Once the final ACK has been sent on an active close, the port/connection cannot be relaeased
and re-used for the time period 2MSL. This is twice the maximum segment life and this
constraint is imposed in case the the final ACK is lost. If the final ACK is lost then the
passive closing host will time out awaiting an ACK in response to the closing FIN and will
resend the FIN. If this arrives before the 2MSL time has expired there is no problem, after
this time the FIN does not appear to belong to whatever connection might exist between the
two clients.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
13 | P a g e NETWORKPROGRAMMING
RFC 793 defines MSL (Maximum Segment Lifetime) as 120 seconds but some
implementations use 30 or 60 seconds. It is, basically, the maximum time for which it is
reasonable to wait for a segment, i.e. if a segment doesn't reach its destination in MSL, it
probably won't get there at all at it can be assumed that it has been lost.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
14 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
15 | P a g e NETWORKPROGRAMMING
The first reason can be explained by assuming that the final ACK is lost. The server will
resend its final FIN, so the client must maintain state information, allowing it to resend the
final ACK. If it did not maintain this information, it would respond with an RST (a different
type of TCP segment), which would be interpreted by the server as an error. If TCP is
performing all the work necessary to terminate both directions of data flow cleanly for a
connection (its full-duplex close), then it must correctly handle the loss of any of these four
segments. This example also shows why the end that performs the active close is the end that
remains in the TIME_WAIT state: because that end is the one that might have to retransmit
the final ACK.
To understand the second reason for the TIME_WAIT state, assume we have a TCP
connection between 12.106.32.254 port 1500 and 206.168.112.219 port 21. This connection
is closed and then sometime later, we establish another connection between the same IP
addresses and ports: 12.106.32.254 port 1500 and 206.168.112.219 port 21. This latter
connection is called an incarnation of the previous connection since the IP addresses and
ports are the same. TCP must prevent old duplicates from a connection from reappearing at
some later time and being misinterpreted as belonging to a new incarnation of the same
connection. To do this, TCP will not initiate a new incarnation of a connection that is
currently in the TIME_WAIT state. Since the duration of the TIME_WAIT state is twice the
MSL, this allows MSL seconds for a packet in one direction to be lost, and another MSL
seconds for the reply to be lost. By enforcing this rule, we are guaranteed that when we
successfully establish a TCP connection, all old duplicates from previous incarnations of the
connection have expired in the network.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
16 | P a g e NETWORKPROGRAMMING
Port Numbers
SOCKETPAIR:
The socket pair for a TCP connection is the four-tuple that defines the two endpoints of the
connection: the local IP address, local port, foreign IP address, and foreign port. A socket pair
uniquely identifies every TCP connection on a network.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
17 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
18 | P a g e NETWORKPROGRAMMING
UNIT-II
An IPv4 socket address structure, commonly called an "Internet socket address structure," is
named sockaddr_in and is defined by including the <netinet/in.h> header. The POSIX
definition of IPV4 SAS is shown below:
struct in_addr {
in_addr_t s_addr;
};
struct sockaddr_in {
uint8_t sin_len;
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
19 | P a g e NETWORKPROGRAMMING
IMP NOTE: The 32-bit IPv4 address can be accessed in two different ways. For example, if
serv is defined as an Internet socket address structure, then serv.sin_addr references the 32-
bit IPv4 address as an in_addr structure, while serv.sin_addr.s_addr references the same 32-
bit IPv4 address as an in_addr_t (typically an unsigned 32-bit integer). We must be certain
that we are referencing the IPv4 address correctly, especially when it is used as an argument
to a function, because compilers often pass structures differently from integers.
Socket address structures are used only on a given host: The structure itself is not
communicated between different hosts, although certain fields (e.g., the IP address and port)
are used for communication.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
20 | P a g e NETWORKPROGRAMMING
Value-Result Arguments
Three functions, bind, connect, and sendto, pass a socket address structure from the process
to the kernel. One argument to these three functions is the pointer to the socket address
structure and another argument is the integer size of the structure. Since the kernel is passed
both the pointer and the size of what the pointer points to, it knows exactly how much data to
copy from the process into the kernel.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
21 | P a g e NETWORKPROGRAMMING
Four functions, accept, recvfrom, getsockname, and getpeername, pass a socket address
structure from the kernel to the process, the reverse direction from the previous scenario. Two
of the arguments to these four functions are the pointer to the socket address structure along
with a pointer to an integer containing the size of the structure.
The reason that the size changes from an integer to be a pointer to an integer is because the
size is both a value when the function is called (it tells the kernel the size of the structure so
that the kernel does not write past the end of the structure when filling it in) and a result when
the function returns. This type of argument is called a value-result argument.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
22 | P a g e NETWORKPROGRAMMING
We must deal with these byte ordering differences as network programmers because
networking protocols must specify a network byte order. For example, in a TCP segment,
there is a 16-bit port number and a 32-bit IPv4 address. The sending protocol stack and the
receiving protocol stack must agree on the order in which the bytes of these multibyte fields
will be transmitted. The Internet protocols use big-endian byte ordering for these multibyte
integers.
In theory, an implementation could store the fields in a socket address structure in host byte
order and then convert to and from the network byte order when moving the fields to and
from the protocol headers, saving us from having to worry about this detail. But, both history
and the POSIX specification say that certain fields in the socket address structures must be
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
23 | P a g e NETWORKPROGRAMMING
maintained in network byte order. Our concern is therefore converting between host byte
order and network byte order. We use the following four functions to convert between these
two byte orders.
In the names of these functions, h stands for host, n stands for network, s stands for short, and
l stands for long. The terms "short" and "long" are historical artifacts from the Digital VAX
implementation of 4.2BSD. We should instead think of s as a 16-bit value (such as a TCP or
UDP port number) and l as a 32-bit value (such as an IPv4 address). Indeed, on the 64-bit
Digital Alpha, a long integer occupies 64 bits, yet the htonl and ntohl functions operate on
32-bit values.
NOTE: These functions are used exclusively for data functionality between sockets
(storage).
The first group of functions, whose names begin with b (for byte), are from 4.2BSD and are
still provided by almost any system that supports the socket functions. The second group of
functions, whose names begin with mem (for memory), are from the ANSI C standard and
are provided with any system that supports an ANSI C library.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
24 | P a g e NETWORKPROGRAMMING
src might represent application space and dest might represent socket send buffer space
(socket receive buffer space).
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
25 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
26 | P a g e NETWORKPROGRAMMING
sock_ntop Function
A basic problem with inet_ntop is that it requires the caller to pass a pointer to a binary
address. This address is normally contained in a socket address structure, requiring the caller
to know the format of the structure and the address family.
To solve this problem, sock_ntop() is used which takes pointer to a socket address structure
as an argument, calls the appropriate function and the presentation address is returned.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
27 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
28 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
29 | P a g e NETWORKPROGRAMMING
SOCK_STREAM - connection-oriented
SOCK_DGRAM - connection-less
SOCK_RAW - access to low-level protocols or network interfaces.
Protocol - Accommodates multiple protocols within a family.
Bind:
bind (socket, localaddr, addrlen);
Socket is created without any association to local or destination addresses, so a program uses
bind to establish a local address for it.
Socket - integer descriptor of the socket.
Localaddr - structure that specifies the local address to be bound.
Addrlen - integer length of the address (in bytes).
Listen:
listen (socket, qlength);
Server creates a socket, binds it to a well-known port, and waits for requests. To avoid
rejecting service requests that cannot be handled, a server queue is created using Listen. It
provides a mechanism to create the queue and then listen for incoming connections (passive
mode). Listen only works with sockets using a reliable stream service.
Socket - Integer descriptor.
Qlength - length of the request queue for that socket (max. = 5).
Connect:
connect (socket, destaddr, addrlen);
Binds a permanent destination to a socket placing it in a connected state. Sockets using
connection-less service do not have to use connect (specify the address in every datagram),
but may.
Socket - socket descriptor.
Destaddr - socket_addr structure (also includes protocol port number) specifying the
destination address.
Addrlen - length of destination address (in bytes).
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
30 | P a g e NETWORKPROGRAMMING
Accept:
accept (socket, addr, addrlen);
Bind associates a socket with port, but that socket is not connected to a foreign destination.
When a request comes in, Accept establishes the full connection. It blocks until a connection
request arrives.
Addr - pointer to the sockaddr structure.
Addrlen - pointer to integer size of address.
Server Side
Server Side (depends on connection type):
Socket
Bind
Listen
Accept
Read (may be repeated)
Write (may be repeated)
Close (go back to Accept)
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
31 | P a g e NETWORKPROGRAMMING
Shutdown:
Shutdown (socket, direction);
The shutdown function applies to full-duplex sockets (connected using a TCP socket) and is
used to partially close the connection.
Socket - socket descriptor of a connected socket.
Direction - direction in which shutdown is desired
0 = terminate further input.
1 = terminate further output.
2 = terminate input / output (close).
IMPORTANT NOTES:
File and Socket Descriptors:
A socket is a generalized UNIX file access mechanism that provides an endpoint for
communication. Descriptors (maintained in the descriptor tables) are kept per process by the
operating system to point to internal data structures for files and sockets. Descriptors are
small integer values.
File Descriptor:
Bound to a file when open is called.
Socket Descriptor:
Created using open, but does not bind it to a destination.
Unbounded - UDP specifies destination every time.
Bounded - TCP specifies destination during an open system call.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
32 | P a g e NETWORKPROGRAMMING
After a socket has been created (using open), additional system calls are required to specify
the details of it‘s use.
Passive Socket - used by a server to wait for calls.
Active Socket - used by a client to initiate a connection.
I/O Functions:
Open - prepare for input / output.
Close - terminate the use of a device.
Write - transfer data from memory to an output device.
Read - transfer data from an input device to memory.
Lseek - position the head of a disk drive to a specific place on the disk.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
33 | P a g e NETWORKPROGRAMMING
Concurrent Servers
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
34 | P a g e NETWORKPROGRAMMING
These two functions return either the local protocol address associated with a socket
(getsockname) or the foreign protocol address associated with a socket (getpeername).
#include <sys/socket.h>
int getsockname(intsockfd, struct sockaddr *localaddr, socklen_t *addrlen);
int getpeername(intsockfd, struct sockaddr *peeraddr, socklen_t *addrlen);
Both return: 0 if OK, -1 on error
Notice that the final argument for both functions is a value-result argument. That is, both
functions fill in the socket address structure pointed to by localaddr or peeraddr. We
mentioned in our discussion of bind that the term "name" is misleading. These two functions
return the protocol address associated with one of the two ends of a network connection,
which for IPV4 and IPV6 is the combination of an IP address and port number. These
functions have nothing to do with domain names.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
35 | P a g e NETWORKPROGRAMMING
After connect successfully returns in a TCP client that does not call bind,
getsockname returns the local IP address and local port number assigned to the
connection by the kernel.
After calling bind with a port number of 0 (telling the kernel to choose the local port
number), getsockname returns the local port number that was assigned. getsockname
can be called to obtain the address family of a socket.
In a TCP server that binds the wildcard IP address, once a connection is established
with a client (accept returns successfully), the server can call getsockname to obtain
the local IP address assigned to the connection. The socket descriptor argument in this
call must be that of the connected socket, and not the listening socket.
When a server is execed by the process that calls accept, the only way the server can
obtain the identity of the client is to call getpeername.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
36 | P a g e NETWORKPROGRAMMING
UNIT-III
Introduction
Our simple example is an echo server that performs the following steps:
1. The client reads a line of text from its standard input and writes the line to the server.
2. The server reads the line from its network input and echoes the line back to the client.
3. The client reads the echoed line and prints it on its standard output.
So, at Server the status is ―Passive Open‖ and the format is:
Server
socket() - SP = (IPs:Ps , IPc:Pc)
bind() - SP = (localhost:33600 , IPc:Pc)
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
37 | P a g e NETWORKPROGRAMMING
Now, the Client requests the connection with the server. The function calls are;
socket(). The socket pair is;
SP = (IPc:Pc , IPs:Ps)
So, at the client side, the status is ―Active Open‖. Now, ―SIMULTANEOUS OPEN‖
situation occurs as both the ends connect with each other as,
At Client:
Call is connect() – SP = (localhost:33597, x.y.z.w:33600)
At Server:
Call is accept() – SP = (localhost:33600 , a.b.c.d:33597)
The format is:
Client
socket() - SP = (IPc:Pc , IPs:Ps)
SIMULTANEOUS OPEN
connect() – SP = (localhost:33597, x.y.z.w:33600)
accept() – SP = (localhost:33600 , a.b.c.d:33597)
At this point, Normal Startup of Client and Server is said to be occurred.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
38 | P a g e NETWORKPROGRAMMING
Normal Termination
We can follow through the steps involved in the normal termination of our client and server:
1. When we type our EOF character, fgets returns a null pointer and the function str_cli
returns.
2. When str_cli returns to the client main function , the latter terminates by calling exit.
3. Part of process termination is the closing of all open descriptors, so the client socket is
closed by the kernel. This sends a FIN to the server, to which the server TCP responds
with an ACK. This is the first half of the TCP connection termination sequence. At
this point, the server socket is in the CLOSE_WAIT state and the client socket is in
the FIN_WAIT_2 state.
4. When the server TCP receives the FIN, the server child is blocked in a call to
readline, and readline then returns 0. This causes the str_echo function to return to the
server child main.
5. The server child terminates by calling exit.
6. All open descriptors in the server child are closed. The closing of the connected
socket by the child causes the final two segments of the TCP connection termination
to take place: a FIN from the server to the client, and an ACK from the client. At this
point, the connection is completely terminated. The client socket enters the
TIME_WAIT state.
7. Finally, the SIGCHLD signal is sent to the parent when the server child terminates.
This occurs in this example, but we do not catch the signal in our code, and the
default action of the signal is to be ignored. Thus, the child enters the zombie state.
We can verify this with the ps command.
#include <sys/wait.h>
pid_t wait (int *statloc);
pid_t waitpid (pid_tpid, int *statloc, intoptions);
Both return: process ID if OK, 0 or–1 on error
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
39 | P a g e NETWORKPROGRAMMING
wait and waitpid both return two values: the return value of the function is the process ID of
the terminated child, and the termination status of the child (an integer) is returned through
the statloc pointer. There are three macros that we can call that examine the termination
status and tell us if the child terminated normally, was killed by a signal, or was just stopped
by job control. Additional macros let us then fetch the exit status of the child, or the value of
the signal that killed the child, or the value of the job-control signal that stopped the child.
We will use the WIFEXITED and WEXITSTATUS macros for this purpose. If there are no
terminated children for the process calling wait, but the process has one or more children that
are still executing, then wait blocks until the first of the existing children terminates.
waitpid gives us more control over which process to wait for and whether or not to block.
First, the pid argument lets us specify the process ID that we want to wait for. A value of -1
says to wait for the first of our children to terminate. (There are other options, dealing with
process group IDs, but we do not need them in this text.) The options argument lets us
specify additional options. The most common option is WNOHANG. This option tells the
kernel not to block if there are no terminated children.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
40 | P a g e NETWORKPROGRAMMING
6. We can still type a line of input to the client. Here is what happens at the client
starting from Step 1:
linux %tcpcli01 127.0.0.1 start client
hello the first line that we type
hello is echoed correctly here we kill the
server child on the server host
another line we then type a second line to the client
str_cli : server terminated
prematurely
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
41 | P a g e NETWORKPROGRAMMING
When we type "another line," str_cli calls writen and the client TCP sends the data to
the server. This is allowed by TCP because the receipt of the FIN by the client TCP
only indicates that the server process has closed its end of the connection and will not
be sending any more data. The receipt of the FIN does not tell the client TCP that the
server process has terminated (which in this case, it has).
When the server TCP receives the data from the client, it responds with an RST since
the process that had that socket open has terminated. We can verify that the RST was
sent by watching the packets with tcpdump.
7. The client process will not see the RST because it calls readline immediately after the
call to writen and readline returns 0 (EOF) immediately because of the FIN that was
received in Step 2. Our client is not expecting to receive an EOF at this point so it
quits with the error message "server terminated prematurely."
8. When the client terminates, all its open descriptors are closed.
1. When the server host crashes, nothing is sent out on the existing network connections.
That is, we are assuming the host crashes and is not shut down by an operator.
2. We type a line of input to the client, it is written by writen , and is sent by the client
TCP as a data segment. The client then blocks in the call to readline, waiting for the
echoed reply.
3. If we watch the network with tcpdump, we will see the client TCP continually
retransmitting the data segment, trying to receive an ACK from the server. Section
25.11 of TCPv2 shows a typical pattern for TCP retransmissions: Berkeley-derived
implementations retransmit the data segment 12 times, waiting for around 9 minutes
before giving up. When the client TCP finally gives up (assuming the server host has
not been rebooted during this time, or if the server host has not crashed but was
unreachable on the network, assuming the host was still unreachable), an error is
returned to the client process. Since the client is blocked in the call to readline, it
returns an error. Assuming the server host crashed and there were no responses at all
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
42 | P a g e NETWORKPROGRAMMING
to the client's data segments, the error is ETIMEDOUT. But if some intermediate
router determined that the server host was unreachable and responded with an ICMP
―destination unreachable‖ message, the error is either EHOSTUNREACH or
ENETUNREACH.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
43 | P a g e NETWORKPROGRAMMING
UNIT-IV
Introduction
We saw our TCP client handling two inputs at the same time: standard input and a TCP
socket. We encountered a problem when the client was blocked in a call to fgets (on standard
input) and the server process was killed. The server TCP correctly sent a FIN to the client
TCP, but since the client process was blocked reading from standard input, it never saw the
EOF until it read from the socket (possibly much later). What we need is the capability to tell
the kernel that we want to be notified if one or more I/O conditions are ready (i.e., input is
ready to be read, or the descriptor is capable of taking more output). This capability is called
I/O multiplexing and is provided by the select and poll functions. We will also cover a newer
POSIX variation of the former, called pselect.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
44 | P a g e NETWORKPROGRAMMING
For an input operation on a socket, the first step normally involves waiting for data to arrive
on the network. When the packet arrives, it is copied into a buffer within the kernel. The
second step is copying this data from the kernel's buffer into our application buffer.
I/O Models
The five I/O models those are available to us under UNIX:
blocking I/O
nonblocking I/O
I/O multiplexing (select and poll)
signal driven I/O (SIGIO)
asynchronous I/O (the POSIX aio_functions)
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
45 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
46 | P a g e NETWORKPROGRAMMING
I/O MULTIPLEXING
SIGNAL-DRIVEN I/O
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
47 | P a g e NETWORKPROGRAMMING
SELECT FUNCTION
No problem, you say, just an accept() and a couple of recv()s. Not so fast, buster! What
if you're blocking on an accept() call? How are you going to recv() data at the same
time? "Use non-blocking sockets!" No way! You don't want to be a CPU hog. What, then?
select() gives you the power to monitor several sockets at the same time. It'll tell you
which ones are ready for reading, which are ready for writing, and which sockets have raised
exceptions, if you really want to know that.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
48 | P a g e NETWORKPROGRAMMING
This being said, in modern times select(), though very portable, is one of the slowest
methods for monitoring sockets. One possible alternative is libevent, or something similar,
that encapsulates all the system-dependent stuff involved with getting socket notifications.
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
When select() returns, readfds will be modified to reflect which of the file descriptors
you selected which is ready for reading. You can test them with the macro FD_ISSET(),
below.
Before progressing much further, I'll talk about how to manipulate these sets. Each set is of
the type fd_set. The following macros operate on this type:
Finally, what is this weirded out struct timeval? Well, sometimes you don't want to wait
forever for someone to send you some data. Maybe every 96 seconds you want to print "Still
Going..." to the terminal even though nothing has happened. This time structure allows you to
specify a timeout period. If the time is exceeded and select() still hasn't found any ready
file descriptors, it'll return so you can continue processing.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
49 | P a g e NETWORKPROGRAMMING
struct timeval {
int tv_sec; // seconds
int tv_usec; // microseconds
};
Just set tv_sec to the number of seconds to wait, and set tv_usec to the number of
microseconds to wait. Yes, that's microseconds, not milliseconds. There are 1,000
microseconds in a millisecond, and 1,000 milliseconds in a second. Thus, there are 1,000,000
microseconds in a second. Why is it "usec"? The "u" is supposed to look like the Greek letter
μ (Mu) that we use for "micro". Also, when the function returns, timeout might be updated
to show the time still remaining. This depends on what flavor of Unix you're running.
Yay! We have a microsecond resolution timer! Well, don't count on it. You'll probably have
to wait some part of your standard Unix timeslice no matter how small you set yourstruct
timeval.
Other things of interest: If you set the fields in your struct timeval to 0, select() will
timeout immediately, effectively polling all the file descriptors in your sets. If you set the
parametertimeout to NULL, it will never timeout, and will wait until the first file descriptor
is ready. Finally, if you don't care about waiting for a certain set, you can just set it to NULL
in the call toselect().
The following code snippet waits 2.5 seconds for something to appear on standard input:
/*
** select.c -- a select() demo
*/
#include <stdio.h>
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>
int main(void)
{
struct timeval tv;
fd_set readfds;
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
50 | P a g e NETWORKPROGRAMMING
tv.tv_sec = 2;
tv.tv_usec = 500000;
FD_ZERO(&readfds);
FD_SET(STDIN, &readfds);
if (FD_ISSET(STDIN, &readfds))
printf("A key was pressed!\n");
else
printf("Timed out.\n");
return 0;
}
If you're on a line buffered terminal, the key you hit should be RETURN or it will time out
anyway.
Now, some of you might think this is a great way to wait for data on a datagram socket—and
you are right: it might be. Some Unices can use select in this manner, and some can't. You
should see what your local man page says on the matter if you want to attempt it.
Some Unices update the time in your struct timeval to reflect the amount of time still
remaining before a timeout. But others do not. Don't rely on that occurring if you want to be
portable. (Use gettimeofday() if you need to track time elapsed. It's a bummer, I know,
but that's the way it is.)
What happens if a socket in the read set closes the connection? Well, in that
case, select() returns with that socket descriptor set as "ready to read". When you actually
do recv() from it,recv() will return 0. That's how you know the client has closed the
connection.
One more note of interest about select(): if you have a socket that is listen()ing, you
can check to see if there is a new connection by putting that socket's file descriptor in
the readfds set.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
51 | P a g e NETWORKPROGRAMMING
But, by popular demand, here is an in-depth example. Unfortunately, the difference between
the dirt-simple example, above, and this one here is significant. But have a look, then read the
description that follows it.
This program acts like a simple multi-user chat server. Start it running in one window,
then telnet to it ("telnet hostname 9034") from multiple other windows. When you type
something in onetelnet session, it should appear in all the others.
/*
** selectserver.c -- a cheezy multiperson chat server
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>
int main(void)
{
fd_set master; // master file descriptor list
fd_set read_fds; // temp file descriptor list for select()
int fdmax; // maximum file descriptor number
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
52 | P a g e NETWORKPROGRAMMING
char remoteIP[INET6_ADDRSTRLEN];
break;
}
// listen
if (listen(listener, 10) == -1) {
perror("listen");
exit(3);
}
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
53 | P a g e NETWORKPROGRAMMING
FD_SET(listener, &master);
// main loop
for(;;) {
read_fds = master; // copy it
if (select(fdmax+1, &read_fds, NULL, NULL, NULL) == -1) {
perror("select");
exit(4);
}
if (newfd == -1) {
perror("accept");
} else {
FD_SET(newfd, &master); // add to master set
if (newfd > fdmax) { // keep track of the max
fdmax = newfd;
}
printf("selectserver: new connection from %s on "
"socket %d\n",
inet_ntop(remoteaddr.ss_family,
get_in_addr((struct sockaddr*)&remoteaddr),
remoteIP, INET6_ADDRSTRLEN),
newfd);
}
} else {
// handle data from a client
if ((nbytes = recv(i, buf, sizeof buf, 0)) <= 0) {
// got error or connection closed by client
if (nbytes == 0) {
// connection closed
printf("selectserver: socket %d hung up\n", i);
} else {
perror("recv");
}
close(i); // bye!
FD_CLR(i, &master); // remove from master set
} else {
// we got some data from a client
for(j = 0; j <= fdmax; j++) {
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
54 | P a g e NETWORKPROGRAMMING
// send to everyone!
if (FD_ISSET(j, &master)) {
// except the listener and ourselves
if (j != listener && j != i) {
if (send(j, buf, nbytes, 0) == -1) {
perror("send");
}
}
}
}
}
} // END handle data from client
} // END got new incoming connection
} // END looping through file descriptors
} // END for(;;)--and you thought it would never end!
return 0;
}
Notice I have two file descriptor sets in the code: master and read_fds. The first, master,
holds all the socket descriptors that are currently connected, as well as the socket descriptor
that is listening for new connections.
The reason I have the master set is that select() actually changes the set you pass into it
to reflect which sockets are ready to read. Since I have to keep track of the connections from
one call of select() to the next, I must store these safely away somewhere. At the last
minute, I copy the master into the read_fds, and then call select().
But doesn't this mean that every time I get a new connection, I have to add it to
the master set? Yup! And every time a connection closes, I have to remove it from
the master set? Yes, it does.
Notice I check to see when the listener socket is ready to read. When it is, it means I have
a new connection pending, and I accept() it and add it to the master set. Similarly, when
a client connection is ready to read, and recv() returns 0, I know the client has closed the
connection, and I must remove it from the master set.
If the client recv() returns non-zero, though, I know some data has been received. So I get
it, and then go through the master list and send that data to all the rest of the connected
clients.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
55 | P a g e NETWORKPROGRAMMING
In addition, here is a bonus afterthought: there is another function called poll() which
behaves much the same way select() does, but with a different system for managing the
file descriptor sets.
http://beej.us/guide/bgnet/output/html/singlepage/bgnet.html#select
POLL FUNCTION
poll()
Prototypes
#include <sys/poll.h>
Description
This function is very similar to select() in that they both watch sets of file descriptors for
events, such as incoming data ready to recv(), socket ready to send() data to, out-of-band
data ready to recv(), errors, etc.
The basic idea is that you pass an array of nfds struct pollfds in ufds, along with a
timeout in milliseconds (1000 milliseconds in a second.) The timeout can be negative if you
want to wait forever. If no event happens on any of the socket descriptors by the
timeout, poll() will return.
Each element in the array of struct pollfds represents one socket descriptor, and
contains the following fields:
struct pollfd {
int fd; // the socket descriptor
short events; // bitmap of events we're interested in
short revents; // when poll() returns, bitmap of events that occurred
};
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
56 | P a g e NETWORKPROGRAMMING
Before calling poll(), load fd with the socket descriptor (if you set fd to a negative
number, this struct pollfd is ignored and its revents field is set to zero) and then
construct the eventsfield by bitwise-ORing the following macros:
Once the poll() call returns, the revents field will be constructed as a bitwise-OR of the
above fields, telling you which descriptors actually have had that event occur. Additionally,
these other fields might be present:
Return Value
Returns the number of elements in the ufds array that have had event occur on them; this can
be zero if the timeout occurred. Also returns -1 on error (and errno will be set accordingly.)
Example
int s1, s2;
int rv;
char buf1[256], buf2[256];
struct pollfd ufds[2];
ufds[0].fd = s1;
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
57 | P a g e NETWORKPROGRAMMING
ufds[1] = s2;
ufds[1].events = POLLIN; // check for just normal data
if (rv == -1) {
perror("poll"); // error occurred in poll()
} else if (rv == 0) {
printf("Timeout occurred! No data after 3.5 seconds.\n");
} else {
// check for events on s1:
if (ufds[0].revents & POLLIN) {
recv(s1, buf1, sizeof buf1, 0); // receive normal data
}
if (ufds[0].revents & POLLPRI) {
recv(s1, buf1, sizeof buf1, MSG_OOB); // out-of-band data
}
Socket Options
There are various ways to get and set the options that affect a socket:
The getsockopt and setsockopt functions
The fcntl function
The ioctl function
This chapter starts by covering the setsockopt and getsockopt functions, followed by an
example that prints the default value of all the options, and then a detailed description of all
the socket options. We divide the detailed descriptions into the following categories: generic,
IPv4, IPv6, TCP, and SCTP. This detailed coverage can be skipped during a first reading of
this chapter, and the individual sections referred to when needed. A few options are discussed
in detail in a later chapter, such as the IPv4 and IPv6 multicasting options.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
58 | P a g e NETWORKPROGRAMMING
setsockopt(), getsockopt()
Prototypes
#include <sys/types.h>
#include <sys/socket.h>
Description
Sockets are fairly configurable beasts. In fact, they are so configurable, I'm not even going to
cover it all here. It's probably system-dependent anyway. But I will talk about the basics.
Obviously, these functions get and set certain options on a socket. On a Linux box, all the
socket information is in the man page for socket in section 7. (Type: "man 7 socket" to get
all these goodies.)
As for parameters, s is the socket you're talking about, level should be set to SOL_SOCKET.
Then you set the optname to the name you're interested in. Again, see your man page for all
the options, but here are some of the most fun ones:
SO_BINDTODEVICE Bind this socket to a symbolic device name like eth0 instead of
using bind() to bind it to an IP address. Type the command
ifconfig under Unix to see the device names.
SO_REUSEADDR Allows other sockets to bind() to this port, unless there is an
active listening socket bound to the port already. This enables
you to get around those "Address already in use" error messages
when you try to restart your server after a crash.
SO_BROADCAST Allows UDP datagram (SOCK_DGRAM) sockets to send and
receive packets sent to and from the broadcast address. Does
nothing—NOTHING!!—to TCP stream sockets! Hahaha!
As for the parameter optval, it's usually a pointer to an int indicating the value in question.
For booleans, zero is false, and non-zero is true. And that's an absolute fact, unless it's
different on your system. If there is no parameter to be passed, optval can be NULL.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
59 | P a g e NETWORKPROGRAMMING
The final parameter, optlen, is filled out for you by getsockopt() and you have to
specify it for setsockopt(), where it will probably be sizeof(int).
Warning: on some systems (notably Sun and Windows), the option can be a char instead of
an int, and is set to, for example, a character value of '1' instead of an int value of 1.
Again, check your own man pages for more info with "man setsockopt" and "man 7
socket"!
Return Value
Example
int optval;
int optlen;
char *optval2;
SO_DEBUG
Provides the ability to turn on recording of debugging information. This option takes an int value in
the optval argument. This is a BOOL option.
SO_BROADCAST
Permits sending of broadcast messages, if this is supported by the protocol. This option takes
an int value in the optval argument. This is a BOOL option.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
60 | P a g e NETWORKPROGRAMMING
SO_REUSEADDR
Specifies that the rules used in validating addresses supplied to bind() should allow reuse of local
addresses, if this is supported by the protocol. This option takes an int value in the optval argument.
This is a BOOLoption.
SO_KEEPALIVE
Keeps connections active by enabling periodic transmission of messages, if this is supported by the
protocol.
If the connected socket fails to respond to these messages, the connection is broken and processes
writing to that socket are notified with an ENETRESET errno. This option takes an int value in
the optval argument. This is a BOOL option.
SO_LINGER
Specifies whether the socket lingers on close() if data is present. If SO_LINGER is set, the system
blocks the process during close() until it can transmit the data or until the end of the interval
indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified,
and close() is issued, the system handles the call in a way that allows the process to continue as
quickly as possible. This option takes a linger structure in the optval argument.
SO_OOBINLINE
Specifies whether the socket leaves received out-of-band data (data marked urgent) in line. This option
takes an int value in optval argument. This is a BOOL option.
SO_SNDBUF
Sets send buffer size information. This option takes an int value in the optval argument.
SO_RCVBUF
Sets receive buffer size information. This option takes an int value in the optval argument.
SO_DONTROUTE
Specifies whether outgoing messages bypass the standard routing facilities. The destination must be on
a directly-connected network, and messages are directed to the appropriate network interface according
to the destination address. The effect, if any, of this option depends on what protocol is in use. This
option takes an int value in the optval argument. This is a BOOL option.
TCP_NODELAY
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
61 | P a g e NETWORKPROGRAMMING
Specifies whether the Nagle algorithm used by TCP for send coalescing is to be disabled. This option
takes an int value in the optval argument. This is a BOOL option.
For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the
option is enabled.
SO_DEBUG
Reports whether debugging information is being recorded. This option stores an int value in
the optval argument. This is a BOOL option.
SO_ACCEPTCONN
Reports whether socket listening is enabled. This option stores an int value in the optval argument.
This is a BOOL option.
SO_BROADCAST
Reports whether transmission of broadcast messages is supported, if this is supported by the protocol.
This option stores an int value in the optval argument. This is a BOOL option.
SO_REUSEADDR
Reports whether the rules used in validating addresses supplied to bind() should allow reuse of local
addresses, if this is supported by the protocol. This option stores an int value in the optval argument.
This is a BOOLoption.
SO_KEEPALIVE
Reports whether connections are kept active with periodic transmission of messages, if this is supported
by the protocol.
If the connected socket fails to respond to these messages, the connection is broken and processes
writing to that socket are notified with an ENETRESET errno. This option stores an int value in
the optval argument. This is a BOOL option.
SO_LINGER
Reports whether the socket lingers on close() if data is present. If SO_LINGER is set, the system
blocks the process during close() until it can transmit the data or until the end of the interval
indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified,
and close() is issued, the system handles the call in a way that allows the process to continue as
quickly as possible. This option stores a linger structure in the optval argument.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
62 | P a g e NETWORKPROGRAMMING
SO_OOBINLINE
Reports whether the socket leaves received out-of-band data (data marked urgent) in line. This option
stores an int value in optval argument. This is a BOOL option.
SO_SNDBUF
Reports send buffer size information. This option stores an int value in the optval argument.
SO_RCVBUF
Reports receive buffer size information. This option stores an int value in the optval argument.
SO_ERROR
Reports information about error status and clears it. This option stores an int value in
the optval argument.
SO_TYPE
Reports the socket type. This option stores an int value in the optval argument.
SO_DONTROUTE
Reports whether outgoing messages bypass the standard routing facilities. The destination must be on
a directly-connected network, and messages are directed to the appropriate network interface according
to the destination address. The effect, if any, of this option depends on what protocol is in use. This
option stores an int value in the optval argument. This is a BOOL option.
SO_MAX_MSG_SIZE
Maximum size of a message for message-oriented socket types (for example, SOCK_DGRAM). Has no
meaning for stream-oriented sockets. This option stores an int value in the optval argument.
TCP_NODELAY
Specifies whether the Nagle algorithm used by TCP for send coalescing is disabled. This option stores
an int value in the optval argument. This is a BOOL option.
For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the
option is enabled.
http://www.mkssoftware.com/docs/man3/setsockopt.3.asp
http://www.mkssoftware.com/docs/man3/getsockopt.3.asp
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
63 | P a g e NETWORKPROGRAMMING
fcntl()
Control socket descriptors
Prototypes
#include <sys/unistd.h>
#include <sys/fcntl.h>
Description
This function is typically used to do file locking and other file-oriented stuff, but it also has a
couple socket-related functions that you might see or use from time to time.
Parameter s is the socket descriptor you wish to operate on, cmd should be set to F_SETFL,
and arg can be one of the following commands. (Like I said, there's more to fcntl() than
I'm letting on here, but I'm trying to stay socket-oriented.)
O_NONBLOCK Set the socket to be non-blocking. See the section on blocking for more
details.
O_ASYNC Set the socket to do asynchronous I/O. When data is ready to
be recv()'d on the socket, the signal SIGIO will be raised. This is rare
to see, and beyond the scope of the guide. And I think it's only available
on certain systems.
Return Value
Different uses of the fcntl() system call actually have different return values, but I haven't
covered them here because they're not socket-related. See your local fcntl() man page for
more information.
Example
int s = socket(PF_INET, SOCK_STREAM, 0);
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
64 | P a g e NETWORKPROGRAMMING
UNIT-V
Introduction
There are some fundamental differences between applications written using TCP versus those
that use UDP. These are because of the differences in the two transport layers: UDP is a
connectionless, unreliable, datagram protocol, quite unlike the connection-oriented, reliable
byte stream provided by TCP. Nevertheless, there are instances when it makes sense to use
UDP instead of TCP. Some popular applications are built using UDP: DNS, NFS, and
SNMP, for example.
The below figure shows the function calls for a typical UDP client/server. The client does
not establish a connection with the server. Instead, the client just sends a datagram to the
server using the sendto function, which requires the address of the destination (the server) as
a parameter. Similarly, the server does not accept a connection from a client. Instead, the
server just calls the recvfrom function, which waits until data arrives from some client.
recvfrom returns the protocol address of the client, along with the datagram, so the server can
send a response to the correct client.
The figure also shows a timeline of the typical scenario that takes place for a UDP
client/server exchange. We can compare this to the typical TCP exchange. We will also
describe the new functions that we us with UDP sockets, recvfrom and sendto, and redo our
echo client/server to use UDP. We will also describe the use of the connect function with a
UDP socket, and the concept of asynchronous errors.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
65 | P a g e NETWORKPROGRAMMING
send(), sendto()
Prototypes
#include <sys/types.h>
#include <sys/socket.h>
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
66 | P a g e NETWORKPROGRAMMING
Description
These functions send data to a socket. Generally speaking, send() is used for
TCP SOCK_STREAM connected sockets, and sendto() is used for
UDP SOCK_DGRAM unconnected datagram sockets. With the unconnected sockets, you must
specify the destination of a packet each time you send one, and that's why the last parameters
of sendto() define where the packet is going.
With both send() and sendto(), the parameter s is the socket, buf is a pointer to the data
you want to send, len is the number of bytes you want to send, and flags allows you to
specify more information about how the data is to be sent. Set flags to zero if you want it to
be "normal" data. Here are some of the commonly used flags, but check your
local send() man pages for more details:
MSG_OOB Send as "out of band" data. TCP supports this, and it's a way to
tell the receiving system that this data has a higher priority than
the normal data. The receiver will receive the
signal SIGURG and it can then receive this data without first
receiving all the rest of the normal data in the queue.
MSG_DONTROUTE Don't send this data over a router, just keep it local.
MSG_DONTWAIT If send() would block because outbound traffic is clogged,
have it return EAGAIN. This is like a "enable non-blocking just
for this send." See the section on blocking for more details.
MSG_NOSIGNAL If you send() to a remote host which is no longer recv()ing,
you'll typically get the signal SIGPIPE. Adding this flag
prevents that signal from being raised.
Return Value
Returns the number of bytes actually sent, or -1 on error (and errno will be set
accordingly.) Note that the number of bytes actually sent might be less than the number you
asked it to send! See the section on handling partial send()s for a helper function to get
around this.
Also, if the socket has been closed by either side, the process calling send() will get the
signal SIGPIPE. (Unless send() was called with the MSG_NOSIGNAL flag.)
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
67 | P a g e NETWORKPROGRAMMING
Example
int spatula_count = 3490;
char *secret_message = "The Cheese is in The Toaster";
recv(), recvfrom()
Prototypes
#include <sys/types.h>
#include <sys/socket.h>
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
68 | P a g e NETWORKPROGRAMMING
Description
Once you have a socket up and connected, you can read incoming data from the remote side
using the recv() (for TCP SOCK_STREAM sockets) and recvfrom() (for
UDP SOCK_DGRAMsockets).
Both functions take the socket descriptor s, a pointer to the buffer buf, the size (in bytes) of
the buffer len, and a set of flags that control how the functions work.
Additionally, the recvfrom() takes a struct sockaddr*, from that will tell you where
the data came from, and will fill in fromlen with the size of struct sockaddr. (You
must also initializefromlen to be the size of from or struct sockaddr.)
So what wondrous flags can you pass into this function? Here are some of them, but you
should check your local man pages for more information and what is actually supported on
your system. You bitwise-or these together, or just set flags to 0 if you want it to be a
regular vanilla recv().
MSG_OOB Receive Out of Band data. This is how to get data that has been
sent to you with the MSG_OOB flag in send(). As the receiving
side, you will have had signal SIGURG raised telling you there is
urgent data. In your handler for that signal, you could
call recv()with this MSG_OOB flag.
MSG_PEEK If you want to call recv() "just for pretend", you can call it
with this flag. This will tell you what's waiting in the buffer for
when you call recv() "for real"
(i.e. without the MSG_PEEK flag. It's like a sneak preview into
the next recv() call.
MSG_WAITALL Tell recv() to not return until all the data you specified in
the len parameter. It will ignore your wishes in extreme
circumstances, however, like if a signal interrupts the call or if
some error occurs or if the remote side closes the connection,
etc. Don't be mad with it.
When you call recv(), it will block until there is some data to read. If you want to not
block, set the socket to non-blocking or check with select() or poll() to see if there is
incoming data before calling recv() or recvfrom().
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
69 | P a g e NETWORKPROGRAMMING
Return Value
Returns the number of bytes actually received (which might be less than you requested in
the len parameter), or -1 on error (and errno will be set accordingly.)
If the remote side has closed the connection, recv() will return 0. This is the normal method
for determining if the remote side has closed the connection. Normality is good, rebel!
Example
// stream sockets and recv()
// all right! now that we're connected, we can receive some data!
byte_count = recv(sockfd, buf, sizeof buf, 0);
printf("recv()'d %d bytes of data in buf\n", byte_count);
// datagram sockets and recvfrom()
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
70 | P a g e NETWORKPROGRAMMING
Lost Datagrams
Our UDP client/server example is not reliable. If a client datagram is lost (say it is discarded
by some router between the client and server), the client will block forever in its call to
recvfrom in the function dg_cli, waiting for a server reply that will never arrive. Similarly, if
the client datagram arrives at the server but the server's reply is lost, the client will again
block forever in its call to recvfrom. A typical way to prevent this is to place a timeout on the
client's call to recvfrom.
Just placing a timeout on the recvfrom is not the entire solution. For example, if we do time
out, we cannot tell whether our datagram never made it to the server, or if the server's reply
never made it back. If the client's request was something like "transfer a certain amount of
money from account A to account B" (instead of our simple echo server), it would make a big
difference as to whether the request was lost or the reply was lost.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
71 | P a g e NETWORKPROGRAMMING
an asynchronous error is not returned on a UDP socket unless the socket has been connected.
Indeed, we are able to call connect for a UDP socket. But this does not result in anything like
a TCP connection: There is no three-way handshake. Instead, the kernel just checks for any
immediate errors (e.g., an obviously unreachable destination), records the IP address and port
number of the peer (from the socket address structure passed to connect), and returns
immediately to the calling process.
Overloading the connect function with this capability for UDP sockets is confusing. If
theconvention that sockname is the local protocol address and peername is the foreign
protocol address is used, then a better name would have been setpeername. Similarly, a better
name for the bind function would be setsockname. With this capability, we must now
distinguish between
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
72 | P a g e NETWORKPROGRAMMING
With a connected UDP socket, three things change, compared to the default unconnected
UDP socket:
1. We can no longer specify the destination IP address and port for an output operation.
That is, we do not use sendto, but write or send instead. Anything written to a
connected UDP socket is automatically sent to the protocol address (e.g., IP address
and port) specified by connect.
2. We do not need to use recvfrom to learn the sender of a datagram, but read, recv, or
recvmsg instead. The only datagrams returned by the kernel for an input operation on
a connected UDP socket are those arriving from the protocol address specified in
connect. Datagrams destined to the connected UDP socket's local protocol address
(e.g., IP address and port) but arriving from a protocol address other than the one to
which the socket was connected are not passed to the connected socket. This limits a
connected UDP socket to exchanging datagrams with one and only one peer.
3. Asynchronous errors are returned to the process for connected UDP sockets. The
corollary, as we previously described, is that unconnected UDP sockets do not receive
asynchronous errors.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
73 | P a g e NETWORKPROGRAMMING
W.r.to Client:
SLOW-BIT RATE IS LESS
FAST-BIT RATE IS MORE
W.r.to Server:
SLOW-RECEIVER BUFFER (WINDOW) SIZE IS LESS
FAST- RECEIVER BUFFER (WINDOW) SIZE IS MORE
In CASE 2, the Datagrams are lost to the maximum extent. This is the normal situation that is
present in UDP Communication.
In CASE 1, the Datagrams are maintained and delivered to the receiver (as there will be flow
control).
Consider the following example for CASE 2:
The client sent 2,000 datagrams, but the server application received only 30 of these, for a
98% loss rate. is no indication whatsoever to the server application or to the client application
that these datagrams were As we have said, UDP has no flow control and it is unreliable. It is
trivial, as we have shown, for a UDP sender overrun the receiver.
If we look at the netstat output, the total number of datagrams received by the server host (not
the server application) is 2,000 (73,208 - 71,208). The counter "dropped due to full socket
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
74 | P a g e NETWORKPROGRAMMING
buffers" indicates how many datagrams were received by UDP but were discarded because
the receiving socket's receive queue was full 775 of TCPv2). This value is 1,970 (3,491 -
1,971), which when added to the counter output by the application.
THE FIRST SET OF LINES IS WHEN THE DATAGRAMS ARE NOT YET OBTAINED
AT THE CLIENT SIDE (BEFORE THIS COMMUNICATION).
This specifies clearly that there is lack of flow control with the UDP Service.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
75 | P a g e NETWORKPROGRAMMING
In the above figure, UDP Client connects with the UDP Server using bind(). But, in order for
the datagrams to move from UDP Client to UDP Server, they should move through
intermediate routers. So, PEER System now becomes R1 but not UDP Server. This is
because we are using connect() within the UDP communication.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
76 | P a g e NETWORKPROGRAMMING
UNIT-VI
Resource Records
Entries in the DNS are known as resource records (RRs). There are only a few types of RRs
that we are interested in.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
77 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
78 | P a g e NETWORKPROGRAMMING
The resolver code reads its system-dependent configuration files to determine the location of
the organization's name servers. (We use the plural "name servers" because most
organizations run multiple name servers, even though we show only one local server in the
figure. Multiple name servers are absolutely required for reliability and redundancy.) The file
/etc/resolv.conf normally contains the IP addresses of the local name servers.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
79 | P a g e NETWORKPROGRAMMING
It might be nice to use the names of the name servers in the /etc/resolv.conf file, since the
names are easier to remember and configure, but this introduces a chicken-and-egg problem
of where to go to do the name-to-address conversion for the server that will do the name and
address conversion! The resolver sends the query to the local name server using UDP. If the
local name server does not know the answer, it will normally query other name servers across
the Internet, also using UDP. If the answers are too large to fit in a UDP packet, the resolver
will automatically switch to TCP.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
80 | P a g e NETWORKPROGRAMMING
gethostbyname differs from the other socket functions that we have described in that it does
not set errno when an error occurs. Instead, it sets the global integer h_errno to one of the
following constants defined by including <netdb.h>:
HOST_NOT_FOUND
TRY_AGAIN
NO_RECOVERY
NO_DATA (identical to NO_ADDRESS)
This function returns a pointer to the same hostent structure that we described with
gethostbyname. The field of interest in this structure is normally h_name, the canonical
hostname.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
81 | P a g e NETWORKPROGRAMMING
The addr argument is not a char*, but is really a pointer to an in_addr structure containing the
IPv4 address. len is the size of this structure: 4 for an IPv4 address. The family argument is
AF_INET.In terms of the DNS, gethostbyaddr queries a name server for a PTR record in the
inaddr.arpa domain.
The service name servname must be specified. If a protocol is also specified (protoname is a
non-null pointer), then the entry must also have a matching protocol. Some Internet services
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
82 | P a g e NETWORKPROGRAMMING
are provided using either TCP or UDP, while others support only a single protocol (e.g., FTP
requires TCP). If protoname is not specified and the service supports multiple protocols, it is
implementation-dependent as to which port number is returned. Normally this does not
matter, because services that support multiple protocols often use the same TCP and UDP
port number,but this is not guaranteed.
The main field of interest in the servent structure is the port number. Since the port number is
returned in network byte order, we must not call htons when storing this into a socket address
structure.
The next function, getservbyport, looks up a service given its port number and an optional
protocol.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
83 | P a g e NETWORKPROGRAMMING
UNIT-VII
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
84 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
85 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
86 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
87 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
88 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
89 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
90 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
91 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
92 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
93 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
94 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
95 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
96 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
97 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
98 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
99 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
100 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
101 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
102 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
103 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
104 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
105 | P a g e NETWORKPROGRAMMING
File Locking
File locking provides a very simple yet incredibly useful mechanism for coordinating file
accesses. Before I begin to lay out the details, let me fill you in on some file locking secrets:
There are two types of locking mechanisms: mandatory and advisory. Mandatory systems
will actually prevent read()s and write()s to file. Several Unix systems support them.
Nevertheless, I'm going to ignore them throughout this document, preferring instead to talk
solely about advisory locks. With an advisory lock system, processes can still read and write
from a file while it's locked. Useless? Not quite, since there is a way for a process to check
for the existence of a lock before a read or write. See, it's a kind of cooperative locking
system. This is easily sufficient for almost all cases where file locking is necessary.
Since that's out of the way, whenever I refer to a lock from now on in this document, I'm
referring to advisory locks. So there.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
106 | P a g e NETWORKPROGRAMMING
Now, let me break down the concept of a lock a little bit more. There are two types of
(advisory!) locks: read locks and write locks (also referred to as shared locks and exclusive
locks, respectively.) The way read locks work is that they don't interfere with other read
locks. For instance, multiple processes can have a file locked for reading at the same.
However, when a process has an write lock on a file, no other process can activate either a
read or write lock until it is relinquished. One easy way to think of this is that there can be
multiple readers simultaneously, but there can only be one writer at a time.
One last thing before beginning: there are many ways to lock files in Unix systems. System V
likes lockf(), which, personally, I think sucks. Better systems supportflock() which offers
better control over the lock, but still lacks in certain ways. For portability and for
completeness, I'll be talking about how to lock files usingfcntl(). I encourage you, though, to
use one of the higher-level flock()-style functions if it suits your needs, but I want to portably
demonstrate the full range of power you have at your fingertips. (If your System V Unix
doesn't support the POSIX-y fcntl(), you'll have to reconcile the following information with
yourlockf() man page.)
Setting a lock
The fcntl() function does just about everything on the planet, but we'll just use it for file
locking. Setting the lock consists of filling out a struct flock (declared in fcntl.h) that
describes the type of lock needed, open()ing the file with the matching mode, and
calling fcntl() with the proper arguments.
fd = open("filename", O_WRONLY);
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
107 | P a g e NETWORKPROGRAMMING
What just happened? Let's start with the struct flock since the fields in it are used
to describe the locking action taking place. Here are some field definitions:
l_type This is where you signify the type of lock you want to set. It's
either F_RDLCK, F_WRLCK, or F_UNLCK if you want to set a read lock,
write lock, or clear the lock, respectively.
l_whence This field determines where the l_start field starts from (it's like an
offset for the offset). It can be either SEEK_SET, SEEK_CUR,
or SEEK_END, for beginning of file, current file position, or end of file.
l_start This is the starting offset in bytes of the lock, relative to l_whence.
l_len This is the length of the lock region in bytes (which starts
from l_start which is relative to l_whence.
l_pid The process ID of the process dealing with the lock. Use getpid() to get
this.
The next step is to open() the file, since flock() needs a file descriptor of the file that's
being locked. Note that when you open the file, you need to open it in the same mode as you
have specified in the lock, as shown in the table, below. If you open the file in the wrong
mode for a given lock type, fcntl() will return -1 and errno will be set to EBADF.
l_type mode
F_RDLCK O_RDONLY or O_RDWR
Finally, the call to fcntl() actually sets, clears, or gets the lock. See, the second argument
(the cmd) to fcntl() tells it what to do with the data passed to it in the struct flock.
The following list summarizes what each fcntl() cmd does:
F_SETLKW This argument tells fcntl() to attempt to obtain the lock requested in
the struct flock structure. If the lock cannot be obtained (since someone
else has it locked already), fcntl() will wait (block) until the lock has
cleared, then will set it itself. This is a very useful command. I use it all the
time.
F_SETLK This function is almost identical to F_SETLKW. The only difference is that this
one will not wait if it cannot obtain a lock. It will return immediately with -1.
This function can be used to clear a lock by setting the l_type field in
the struct flock to F_UNLCK.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
108 | P a g e NETWORKPROGRAMMING
F_GETLK If you want to only check to see if there is a lock, but don't want to set one, you
can use this command. It looks through all the file locks until it finds one that
conflicts with the lock you specified in the struct flock. It then copies the
conflicting lock's information into the struct and returns it to you. If it can't
find a conflicting lock, fcntl() returns the struct as you passed it, except it
sets the l_type field to F_UNLCK.
In our above example, we call fcntl() with F_SETLKW as the argument, so it blocks until it
can set the lock, then sets it and continues.
Clearing a lock
Whew! After all the locking stuff up there, it's time for something easy: unlocking! Actually,
this is a piece of cake in comparison. I'll just reuse that first example and add the code to
unlock it at the end:
Now, I left the old locking code in there for high contrast, but you can tell that I just changed
the l_type field to F_UNLCK (leaving the others completely unchanged!) and
called fcntl() withF_SETLK as the command. Easy!
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
109 | P a g e NETWORKPROGRAMMING
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
fl.l_pid = getpid();
if (argc > 1)
fl.l_type = F_RDLCK;
printf("got lock\n");
printf("Press <RETURN> to release lock: ");
getchar();
www.UandiStar.org
www.UandiStar.org
110 | P a g e NETWORKPROGRAMMING
printf("Unlocked.\n");
close(fd);
return 0;
}
Compile that puppy up and start messing with it in a couple windows. Notice that when
one lockdemo has a read lock, other instances of the program can get their own read locks
with no problem. It's only when a write lock is obtained that other processes can't get a lock
of any kind.
Another thing to notice is that you can't get a write lock if there are any read locks on the
same region of the file. The process waiting to get the write lock will wait until all the read
locks are cleared. One upshot of this is that you can keep piling on read locks (because a read
lock doesn't stop other processes from getting read locks) and any processes waiting for a
write lock will sit there and starve. There isn't a rule anywhere that keeps you from adding
more read locks if there is a process waiting for a write lock. You must be careful.
Practically, though, you will probably mostly be using write locks to guarantee exclusive
access to a file for a short amount of time while it's being updated; that is the most common
use of locks
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
111 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
112 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
113 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
114 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
115 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
116 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
117 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
118 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
119 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
120 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
121 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
122 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
123 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
124 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
125 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
126 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
127 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
128 | P a g e NETWORKPROGRAMMING
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
129 | P a g e NETWORKPROGRAMMING
Shared memory is the fastest form of IPC available. Once the memory is mapped into the
address space of the processes that are sharing the memory region, no kernel involvement
occurs in passing data between the processes. What is normally required, however, is some
form of synchronization between the processes that are storing and fetching information to
and from the shared memory region.
• The server reads from the input file. The file data is read by the kernel into its memory and
then copied from the kernel to the process.
• The server writes this data in a message, using a pipe, FIFO, or message queue. These forms
of IPC normally require the data to be copied from the process to the kernel.
• The client reads the data from the IPC channel, normally requiring the data to be copied
from the kernel to the process.
• Finally, the data is copied from the client‘s buffer, the second argument to the write
function, to the output file.
A total of four copies of the data are normally required. Additionally, these four copies are
done between the kernel and a process, often an expensive copy (more expensive than
copying data within the kernel, or copying data within a single process).
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
130 | P a g e NETWORKPROGRAMMING
The problem with these forms of IPC—pipes, FIFOs, and message queues—is that for two
processes to exchange information, the information has to go through the kernel.
Shared memory provides a way around this by letting two or more processes share a region of
memory. The processes must, of course, coordinate or synchronize the use of the shared
memory among themselves. (Sharing a common piece of memory is similar to sharing a disk
file, such as the sequence number file used in all the file locking examples.)
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
131 | P a g e NETWORKPROGRAMMING
The sections that follow will explore the shared memory system calls and discuss how they
were applied to this utility program. The discussion covers the following areas:
Shared memory must be created, or it must be located if another process has already created
it. The program is given an IPC ID to refer to when it has been created or located. Once you
have this IPC ID, it is possible to inquire about the shared memory region attributes and
change some of them, such as the ownership and permissions. Before shared memory can be
read from or written to, it must be attached to the memory space of your current process. This
involves the selection of a starting address for your shared memory region.
When a process is finished with a shared memory region, it is able to detach it from its
memory space. Once all processes have finished with the shared memory region and detached
it, the region can be destroyed to give the memory back to the kernel.
Shared memory is created and accessed if it already exists using the shmget(2) function. Its
function synopsis is as follows:
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
int shmget(key_t key, int size, int flag);
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
132 | P a g e NETWORKPROGRAMMING
The argument key is the value of the IPC key to use, or the value IPC_PRIVATE. The size
argument specifies the minimum size of the shared memory region required. The actual size
created will be rounded up to a platform-specific multiple of a virtual memory page size. The
flag option must contain the permission bits if shared memory is being created. Additional
flags that may be used include IPC_CREAT and IPC_EXCL, when shared memory is being
created.
The return value is the IPC ID of the shared memory region when the call is successful (this
includes the value zero). The value -1 is returned if the call fails, with errno set.
Attributes of the shared memory, including its permissions and actual size, are obtained using
the shmctl(2) system call. Its function synopsis is as follows:
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
int shmctl(int shmid, int cmd, struct shmid_ds *buf);
The argument shmid specifies the shared memory IPC ID, which is obtained from shmget(2).
The cmd is a shmctl(2) command value, while buf is an argument used with certain
commands. The valid commands for shmctl(2) are:
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
133 | P a g e NETWORKPROGRAMMING
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
void * shmat(int shmid, void *addr, int flag);
The argument shmid specifies the IPC ID of the shared memory that you want to attach to
your process. The argument addr indicates the address that you want to use for this. A null
pointer for addr specifies that the UNIX kernel should pick the address instead. The flag
argument permits the option flag SHM_RND to be specified. Specify 0 for flag if no options
apply.
When shmat(2) succeeds, a (void *) address is returned that represents the starting address of
the shared memory region. If the function fails, the value (void *)(-1) is returned instead.
The combination of the addr and the flag option SHM_RND allow three possible ways for
the memory region to be attached:
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
134 | P a g e NETWORKPROGRAMMING
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
int shmdt(void *addr);
The shmdt(2) function simply accepts the address of the shared memory, as it was attached
by shmat(2), in argument addr. The return value is 0 when successful. Otherwise, -1 is
returned and errno holds the error code.
Notice that argument three (buf) is not required by the IPC_RMID command for shmctl(2).
This code is exercised by the -r option of the globvar utility.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
135 | P a g e NETWORKPROGRAMMING
UNIT – VIII
REMOTE LOGIN
http://book.chinaunix.net/special/ebook/addisonWesley/APUE2/0201433079/ch18lev1sec1.html
Introduction
The handling of terminal I/O is a messy area, regardless of the operating system. The
UNIX System is no exception. The manual page for terminal I/O is usually one of the
longest in most editions of the programmer's manuals.
With the UNIX System, a schism formed in the late 1970s when System III developed a
different set of terminal routines from those of Version 7. The System III style of
terminal I/O continued through System V, and the Version 7 style became the standard
for the BSD-derived systems. As with signals, this difference between the two worlds has
been conquered by POSIX.1. In this chapter, we look at all the POSIX.1 terminal
functions and some of the platform-specific additions.
Part of the complexity of the terminal I/O system occurs because people use terminal
I/O for so many different things: terminals, hardwired lines between computers,
modems, printers, and so on.
If we don't do anything special, canonical mode is the default. For example, if the shell
redirects standard input to the terminal and we use read and write to copy standard input
to standard output, the terminal is in canonical mode, and each read returns at most one
line. Programs that manipulate the entire screen, such as the vi editor, use noncanonical
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
136 | P a g e NETWORKPROGRAMMING
mode, since the commands may be single characters and are not terminated by
newlines. Also, this editor doesn't want processing by the system of the special
characters, since they may overlap with the editor commands. For example, the Control-
D character is often the end-of-file character for the terminal, but it's also a vi command
to scroll down one-half screen.
The Version 7 and older BSD-style terminal drivers supported three modes for terminal
input: (a) cooked mode (the input is collected into lines, and the special characters are
processed), (b) raw mode (the input is not assembled into lines, and there is no
processing of special characters), and (c) cbreak mode (the input is not assembled into
lines, but some of the special characters are processed). Figure 18.20shows a POSIX.1
function that places a terminal in cbreak or raw mode.
POSIX.1 defines 11 special input characters, 9 of which we can change. We've been
using some of these throughout the text: the end-of-file character (usually Control-D)
and the suspend character (usually Control-Z), for example. Section 18.3 describes each
of these characters.
We can think of a terminal device as being controlled by a terminal driver, usually within
the kernel. Each terminal device has an input queue and an output queue, shown
in Figure 18.1.
If echoing is enabled, there is an implied link between the input queue and the
output queue.
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
137 | P a g e NETWORKPROGRAMMING
The size of the input queue, MAX_INPUT (see Figure 2.11), is finite. When the input
queue for a particular device fills, the system behavior is implementation
dependent. Most UNIX systems echo the bell character when this happens.
There is another input limit, MAX_CANON, that we don't show here. This limit is the
maximum number of bytes in a canonical input line.
Although the size of the output queue is finite, no constants defining that size are
accessible to the program, because when the output queue starts to fill up, the
kernel simply puts the writing process to sleep until room is available.
We'll see how the tcflush flush function allows us to flush either the input queue or
the output queue. Similarly, when we describe the tcsetattr function, we'll see how
we can tell the system to change the attributes of a terminal device only after the
output queue is empty. (We want to do this, for example, if we're changing the
output attributes.) We can also tell the system to discard everything in the input
queue when changing the terminal attributes. (We want to do this if we're
changing the input attributes or changing between canonical and noncanonical
modes, so that previously entered characters aren't interpreted in the wrong
mode.)
Most UNIX systems implement all the canonical processing in a module called
the terminal line discipline. We can think of this module as a box that sits between the
kernel's generic read and write functions and the actual device driver (see Figure 18.2).
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
138 | P a g e NETWORKPROGRAMMING
All the terminal device characteristics that we can examine and change are contained in
a termios structure. This structure is defined in the header <termios.h>, which we use
throughout this chapter:
struct termios {
tcflag_t c_iflag; /* input flags */
tcflag_t c_oflag; /* output flags */
tcflag_t c_cflag; /* control flags */
tcflag_t c_lflag; /* local flags */
cc_t c_cc[NCCS]; /* control characters */
};
Roughly speaking, the input flags control the input of characters by the terminal device
driver (strip eighth bit on input, enable input parity checking, etc.), the output flags
control the driver output (perform output processing, map newline to CR/LF, etc.), the
control flags affect the RS-232 serial lines (ignore modem status lines, one or two stop
bits per character, etc.), and the local flags affect the interface between the driver and
the user (echo on or off, visually erase characters, enable terminal-generated signals,
job control stop signal for background output, etc.).
The type tcflag_t is big enough to hold each of the flag values and is often defined as
an unsigned int or an unsigned long. The c_cc array contains all the special characters that we
can change. NCCS is the number of elements in this array and is typically between 15 and
20 (since most implementations of the UNIX System support more than the 11 POSIX-
defined special characters). The cc_t type is large enough to hold each special character
and is typically an unsigned char.
Overview
Pseudo Terminal
The term pseudo terminal implies that it looks like a terminal to an application program,
but it's not a real terminal. The diagram shows the typical arrangement of the processes
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org
www.UandiStar.org
139 | P a g e NETWORKPROGRAMMING
involved when a pseudo terminal is being used. The key points in this figure are the
following.
Normally, a process opens the pseudo-terminal master and then calls fork. The child
establishes a new session, opens the corresponding pseudo-terminal slave, duplicates
the file descriptor to the standard input, standard output, and standard error, and then
calls exec. The pseudo-terminal slave becomes the controlling terminal for the child
process.
It appears to the user process above the slave that its standard input, standard output,
and standard error are a terminal device. The process can issue all the terminal I/O
functions on these descriptors. But since there is not a real terminal device beneath the
slave, functions that don't make sense (change the baud rate, send a break character,
set odd parity, etc.) are just ignored.
Anything written to the master appears as input to the slave and vice versa. Indeed, all
the input to the slave comes from the user process above the pseudo-terminal master.
This behaves like a bidirectional pipe, but with the terminal line discipline module
above the slave, we have additional capabilities over a plain pipe.
http://book.chinaunix.net/special/ebook/addisonWesley/APUE2/0201433079/main.html
Notes prepared by D. Teja Santosh, Assistant Professor, KPES, Shabad, R.R. District.
www.UandiStar.org