0% found this document useful (0 votes)
122 views14 pages

CSCI 240 Lecture Notes

C strings are arrays of characters that end with a null character. They can be manipulated using array notation with subscripts or pointer notation. Pointer notation is often preferable since many string functions use pointers. A string can be initialized by assigning it an array of characters ending with '\0' or assigning it a string literal in double quotes, which automatically adds the null character.

Uploaded by

Okiring Silas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views14 pages

CSCI 240 Lecture Notes

C strings are arrays of characters that end with a null character. They can be manipulated using array notation with subscripts or pointer notation. Pointer notation is often preferable since many string functions use pointers. A string can be initialized by assigning it an array of characters ending with '\0' or assigning it a string literal in double quotes, which automatically adds the null character.

Uploaded by

Okiring Silas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

CSCI 240 Lecture Notes - Part 10

C strings
In standard C strings are created as arrays of char.
A null character (ASCII 0) in the array signals the effective end of the array of char;
that is, it marks the end of the string. Characters in the array beyond the null are
ignored by the functions designed to work on these strings.
A set of standard string manipulation functions are provided in the interface defined in
<string.h>.
Since a string is an array of char, you can manipulate it with array notation
(subscripts).
However, it is often easier to use the string functions - which mostly
use pointer notation rather than array notation.
We will look at both approaches.
So how to you create a string?
1. Define an array of char.
char s[80];

Note that we hereby reserve 80 bytes of memory for this string. We are not obligated
to use it all, but we better not overfill it or we will be in trouble (because we would
overwrite memory after it).
2. Put some chars in it and end it with a null. For now, let's look at the string as an
array. So we could do this:
s[0]
s[1]
s[2]
s[3]

=
=
=
=

'J';
'i';
'm';
0;

// J as a char

Note that because chars and ints are interchangeable in C, we can assign the integer 0
to s[3]. That is the null character.
However, we cannot do this to insert the null char:
s[3] = '0'; // Char '0' is ASCII 48, not ASCII 0

There is one other common way to represent the null character:


s[3] = '\0';

// backslash-zero

This emphasizes the fact that it is a char.


So we have an 80-byte array, and we use only the first 4 bytes.
However, we can now print it using cout:
cout << s; // prints: Jim

If we had forgotten to put in the terminating null, this cout could print something like:
Jim. %x6^^* #8, ,d @ b,,.....
That is, it would print bytes in memory starting at s, interpreted as chars, until it found
(by chance) a null (zero-valued byte).
There are simpler ways to initialize a string:
1. Based on the fact that a string is an array of char, we can do the following:
char s[80] = {'J', 'i', 'm', '\0'}; // here we must use '\0'

in this case, the rest of the array would contain all 0's.
2. There is a shortcut that is allowed by C:
char s[80] = "Jim";

in this case also, the rest of the array would contain all 0's.
3. Looking ahead, there is a string function that will copy one string value into
another:
char s[80];
...

strcpy(s, "Jim");

in this case, the rest of the array (after the terminating null) would contain whatever
was there before the strcpy().
Since a string is an array of chars, we can look at individual chars in the string using
array notation.
cout << s[2]; // displays: m

Suppose we want a function that will return the Ith character of a string. We could
write it as follows:
char IthChar(char s[], int i)
{
return s[i];
}

And we could call it as:


cout << IthChar(s, 2));

// displays: m

Note how we designed the function:


it returns a char
it takes 2 arguments:
o an array of char (the string)
o the subscript of the char desired
Note also what would happen if we used:
char ch;
ch = IthChar(s, 10);
ch = IthChar(s, j);

// or
// if j has a value > 3

It would return some random byte (char) that happened to be in memory at 10 bytes
(or j bytes) beyond s - at s[10] or s[j]. But the actual effective string is just "Jim". So
we don't know what is in s[10] or s[j]. And so we don't know what it would be.

We could find out, of course, if we cout-ed the character returned, but it would be
meaningless. It was never assigned a value, so its value is random.
One way to fix the function so we couldn't make this mistake is to re-design the
function to return a code (0 = fail, 1 = ok, for example), and to pass back the char (if
found). We need to know if the subscripti is beyond the end of the string (the null).
int IthChar(char s[], int i, char *ch)
{
int p;
for (p = 0; p <= i; p++)
// if null is before ith pos in s, return(fail)
if (s[p] == '\0')
return 0;
// if we're here, we got to ith char
// without encountering '\0'
*ch = s[i];
return 1;
}

This considers s simply as an array of char, recognizes that the 0 terminates the
effective string, and makes use of array notation to do all this.
We could (again, looking ahead) use one of the standard C string functions to help. In
particular, there's a function called strlen() that can help:
int IthChar(char s[], int i, int *ch)
{
if (i > strlen(s)-1) // ?? check this carefully
return 0;
// fail
*ch = s[i];
return 1;
}

// pass back ith char


// success

Let's check the if-condition carefully. Suppose s has the string "dog"
subscript: 0 1 2 3 4
string:
d o g 0 ? ? etc.

strlen(s) is 3 so we want this condition to be true if i is 3 or more (and then return 0).
Another way to say it is that we want the condition to be false if i is 0, or 1, or 2, but
not if it's more than that (so that we can go on and pass back one of the chars in the
string).

So if (i > 2) - for this example - we want the condition to be true so that we will
execute the return(0).. Since strlen(s) is 3, we can code strlen(s) - 1 to represent 2.
Alternately (verify this yourself) we could use >= strlen(s).

Another Way to Represent strings as Arguments.


Recall that the unsubscripted name of an array is the address of an array. It is the same
as &array[0].
So the name of a char array is really a pointer to a char (the address of the 1st char in
the array).
So we can refer to a string and manipulate it using an alternate (pointer-based)
notation instead of the subscript notation used so far.
For example, suppose we want to write a function to change a string to all uppercase.
char s[80] = "some stuff";
...
strUpper(s); //the calling statement; pass address of s
void strUpper(char *str)
{
char *i; //i will point to a char in str
// Note: the i++ below incr's a ptr var
// This is legal (but new)
for (i = str; *i != '\0'; i++)
*i = toupper(*i);
}

Ok. Now:
str is a ptr to a char (1st char in the passed string = 1st in the char array)
i also is a ptr to a char
in the for loop, i is initialized to point to the same place str points to: store the
address in str into i.
the loop continues as long as "the thing pointed to by i" is not the null
character. That is, as long as we have not hit the end of the string

each time through the loop, "the char pointed to by i" is given to
the toupper() function which returns the upper-case version of the char, and
then that returned char is stored into "the char pointed to by i"
and then i is incremented so that it points to the next char in the array. See
below - Pointer Arithmetic.
Note: if you're wondering if we could have avoided the use of i and just used str, the answer is
yes; the for loop would look like this:
for (; *str != '\0'; str++) // no initialization needed
*str = toupper(*str);

-- but there are some subtle issues here that we won't cover for a while, so you can ignore this
for now.

The exact same result could also be done using the more familiar subscript notation:
void strUpper(char s[])
{
int i;
// used as a subscript here
for (i = 0; s[i] != '\0'; i++)
s[i] = toupper(s[i]);
}

Most beginners prefer the subscript notation. However, most of the standard string
functions use pointer notation, so if you want to use them (or have to use them) you
have to understand the pointer notation. It is best not to mix notations unless you have
a specific good reason.
Note: One important difference between strings (arrays of char) and char * variables:
Consider the following:
char s[80] = "jim";
char *cptr = s;

We can picture them like this:


s --> j i m
^
|
cptr

So both s and cptr point to the 0th char (the j).


However, the s should not be thought of as a "normal" pointer variable - because of
the following:
We cannot change what s points to, because it is connected with the allocation of the
80 bytes of memory. That is, we cannot change the value of s (the address stored in
it).
So s++; is a compile error, as is any attempt to change s. But of course,
you can change the contents of s[0], s[1], etc., or of *s (which would be s[0]).
We can change what cptr points to, because it is just an area in memory that can
contain an address (the address of any character).
So cptr++; is ok. As is any attempt to store the address of a char into cptr.

Pointer Arithmentic
Incrementing a pointer variable makes it point at the next memory address for its type.
The actual amount depends on the number of bytes an instance of the particular data
type occupies in memory: In many versions of C:
incrementing a char * adds 1
incrementing a int * adds 4
But the important idea is that the pointer now points to a new address, exactly where
the next occurrence of that data type would be. This is exactly what you want when
you write code that goes through an array - just increment a pointer and you are
pointing at the next element. You don't even have to know how big the data type is C++ knows. In fact you could take code that works for Quincy (with 4-byte integers)
which uses a pointer variable that "walks through" an array and recompile it without
change on a system that uses 2-byte or 8-byte integers and the code would still work
fine.
So given a pointer to a memory location, you can add or subtract to/from it and make
it point to a different place in memory.
char str[80] = "beet";

(str) or (str + 0) points to the 'b'


(str + 1) points to 1st 'e'
(str + 2) points to 2nd 'e'
etc.
Notice that these expressions are always of the form (ptr-to-something + int)
So:
*(str + 2) = 'a';

changes "beet" to "beat". It's the same effect as the subscript notation:
str[2] = 'a';

So in general, if ar is the address (the name) of an array and i is an int,


*(ar + i)

and

ar[i]

are alternate ways to reference the same element.


Recall strUpper()? Now we could re-write it using this notation:
void strUpper(char *str)
{
int i;
for (i = 0; *(str + i) != '\0'; i++)
*(str + i) = toupper(*(str + i));
}

Summary:
Given
char str[80] = "frog";
char *cptr;
int sub;

We have several ways to access and alter the chars in str. Suppose we want to
change str to "aaaa" All of the following will work:
for (sub = 0; sub < strlen(str); sub++)
str[sub] = 'a';
for (cptr = str; *cptr != '\0'; cptr++)
*cptr = 'a';
for ( sub = 0; sub < strlen(str); sub++)
*(str + sub) = 'a';

Note: when the C compiler sees subscript notation (as in the first example) it
internally changes it into the *(str + sub) notation. So internally, C is always using
pointer notation.
Now that we have covered pointer variables, be sure that you note the following:

you can change where a pointer points: pointer++ makes pointer point to a different
address in memory

you cannot change the value of an array name. Any attempt to do so will cause a
compile error.

The ANSI C string library functions


C has standard library functions to manipulate strings which are stored as nullterminated arrays of char.
Every vendor who markets a C compiler implements these functions, which are then
callable from any C program.
With C standard string functions you have to
create memory for each string by declaring a char array,
be sure (one way or the other) that the array of chars has a null terminator
be sure that you don't overflow the array.
Here is a summary of the most important functions:
All args called s1, s2, ... are names of arrays of char or char * A "valid string" is one with a
null terminator.
Actually, a char array guarantees that memory is allocated; a pointer-to-a-char can actually point
anywhere in memory, and C++ will regard that place as a char.

char * strcpy(s1, s2) - s1 may or may not have a valid string in it. s2 must be a valid
string. Copies s2 to s1. s2 must not be bigger than s1 or array overflow will result. The
returned value may be useful if the result of the strcpy() is to be used as an argument

to a function call - in other words, if the call to strcpy() is an argument to a function.


(See Example 2 in the box below)
char * strcat(s1, s2) - both s1 and s2 must be valid strings. Concatenates s2 to the end
of s1. There must be room in s1 for the result. The returned value may be useful if the
result of the strcat() is to be used as an argument to a function call - in other words, if
the call to strcat() is an argument to a function. (See box 2below)
For example, suppose you have
char s1[9], s2[9];
and both have valid strings in them. Suppose you want to concatenate s2 to the end of
s1. You could do a test:
if (strlen(s1) + strlen(s2) < 9)
strcat(s1, s2);
else
cout << "not enough room";

Suppose you want to build a big string from a couple of little


ones. Here are two ways to do it:
char s1[10] = "John";
char s2[10] = "Smith";
char r[20];

Version 1:
strcpy(r, s2);
strcat(r, ", ");
strcat(r, s1);

// r: "Smith"
// r: "Smith, "
// r: "Smith, John"

Version 2 uses the return value from strcat() directly, i.e. a pointer
to a char = pointer to 1st argument
strcpy(r, strcat(strcat(s2, ", "), s1));
3
2
1

1 - makes s2 = "Smith, "


2 - adds s1 (John) onto its first argument, the result from 1
("Smith ,") to get "Smith, John"
3 - copies all that to r, so r = "Smith, John"

int strlen(s) - returns the int length of s. This is the number of chars in s, not including
the null terminator.

int strcmp(s1,s2) - compares two strings alphabetically, returns pos, neg, or 0


depending if s1 is greater than, less than, or equal to s2.
char * strchr(s, ch) - returns a pointer to the first occurrence of char ch in s or NULL
if ch is not in s.
char * strstr(s1, s2) - returns a pointer to the first occurrence of s2 in s1 or NULL if s2
is not in s1.
Note in the last two, a pointer is returned (not a subscript). If you wanted to save that
position, you would do something like this:
char *p;
char s[80] = "Some junk";
p = strstr(s, "junk"); // p pts to j in junk
cout << p;

// prints: junk

char * strncpy(s1, s2, max) - like strcpy() but won't copy more than max chars into s1.
If it hits max, it won't copy a '\0' and you will need to do it.
char * strncat(s1, s2, max) - like strcat() but it will not copy more than max chars
from s2 to s1. If it hits max, it won't copy a '\0' and you will need to do it.
In the example above, you could do the following to get as much of s2 into s1 as
possible:
if (strlen(s1) + strlen(s2) < 9)
strcat(s1, s2);
else
{
strncat(s1, s2, 8-strlen(s1));
s1[8] = "\0";
}

Two other useful functions: sscanf() and atoi()


Given a string consisting of digits, it is often useful to be able to convert the string to
numeric format and store it in a numeric variable. Suppose you ask the user to enter a
number. You don't want to use cinbecause it won't detect invalid characters in the
input (recall 2w3). So you read the data as a string, and then write some function (or
use isdigit() in a loop) to validate it.

sscanf() is a C function, and allows some control over the format of the string
constructed (field width and number of decimal places for floats and doubles).
atoi() is a C and C++ function and is simpler to use. It only works for integers. There
is another similar function for floats (atof())
char s[80];
int i;
int num;
cout << "Enter a number: ";
cin.getline(s, 80);
for (i = 0; i < strlen(s); i++)
if (!isdigit(s[i]))
{
cout << "oops";
exit(0);
}
// Now you know it's all digits, so..
sscanf(s, "%d", &num); //now num has int value

/*
sscanf() ("string-scanf") is related to the standard C input function scanf(). We have
not studied it, but it is not hard to use. Its arguments are
the name (address) of the char array (the string) from which it is to get the chars
to convert.
a "format specifier" that tells the function what kind of data the chars in the
array are to be converted to. "%d" is used for integers, "%f" for floats, "%lf" for
doubles.
the address of the variable into which the converted value is to be placed.
*/
//Alternately, in place of sscanf(), you can use the simpler function:
num = atoi(s);

Command Line Arguments


Most systems that support C and C++ allow the user to specify certain values when
the program is run, so that it is easy to alter certain run-time behaviors. Otherwise,

you would have to alter and recompile the program, or resort to some other
inconvenient way to provide this flexibility.
These values are supplied from the command line, after the program name. An
example might be:
c:/>myprog dog cat frog

Obviously, the user cannot just type anything. The user must know what valid values
are, and the program must be coded to detect and respond to them.
When the program starts running, it can look at the user-supplied command line
values and do one thing or another (a decision) based on their values.
(Note that this is different than I/O redirection, which used the special < or > symbols
and only allowed a file to be specified.)
For example, you may want to control how many numbers per line print in the
program. You might invoke the program as:
c:/>myprog 10

or
c:/>myprog 20

The program can get this value (as a string) and set a program variable (as an integer)
so that it can control the number of lines displayed.
How does a program get these values?
They are passed to the main() program as arguments.
int main(int argc, char *argv[])
{
...
}

These arguments are usually named as shown.


The first argument tells how many separate arguments were passed - it is the count of
arguments.

The second argument represents an array of strings. That is, an array of pointers to
char.
The strings in the array are the values passed: "10" or "dog" "cat" "frog"
Actually, the first (i.e. the 0th) argument is the name and path of the
program: c:\myprog. So argc is always 1 or more because there is always at least the
program name.
You could access and display these arguments with the following code:
for (i = 0; i < argc; i++)
cout << "\n" << argv[i];

You could store the nth string (if argc is at least n-1) into another string:
char s[80];
strcpy(s, argv[n]);

Example: get a number from the command line to use to control the number of
number per line to print:
int main(int argc, char *argv[])
{
int numPerLine;
char cmdArg[4];
int i;
if (argc > 1 && strlen(argv[1] <= 3)
{
strcpy(cmdArg, argv[1]);
for (i = 0; i < strlen(cmdArg); i++)
if (!isdigit(cmdArg[i])
{
cout << "bad char in cmd line arg":
exit(0):
}
sscanf(cmdArg, "%d", &numPerLine); // or atoi()
}
else
{
cout << "missing or too-long cmd line arg);
exit(0):
}
// now use numPerLine as an int in code...
...
return 0;
} // end main()

You might also like