CSCI 240 Lecture Notes
CSCI 240 Lecture Notes
C strings
In standard C strings are created as arrays of char.
A null character (ASCII 0) in the array signals the effective end of the array of char;
that is, it marks the end of the string. Characters in the array beyond the null are
ignored by the functions designed to work on these strings.
A set of standard string manipulation functions are provided in the interface defined in
<string.h>.
Since a string is an array of char, you can manipulate it with array notation
(subscripts).
However, it is often easier to use the string functions - which mostly
use pointer notation rather than array notation.
We will look at both approaches.
So how to you create a string?
1. Define an array of char.
char s[80];
Note that we hereby reserve 80 bytes of memory for this string. We are not obligated
to use it all, but we better not overfill it or we will be in trouble (because we would
overwrite memory after it).
2. Put some chars in it and end it with a null. For now, let's look at the string as an
array. So we could do this:
s[0]
s[1]
s[2]
s[3]
=
=
=
=
'J';
'i';
'm';
0;
// J as a char
Note that because chars and ints are interchangeable in C, we can assign the integer 0
to s[3]. That is the null character.
However, we cannot do this to insert the null char:
s[3] = '0'; // Char '0' is ASCII 48, not ASCII 0
// backslash-zero
If we had forgotten to put in the terminating null, this cout could print something like:
Jim. %x6^^* #8, ,d @ b,,.....
That is, it would print bytes in memory starting at s, interpreted as chars, until it found
(by chance) a null (zero-valued byte).
There are simpler ways to initialize a string:
1. Based on the fact that a string is an array of char, we can do the following:
char s[80] = {'J', 'i', 'm', '\0'}; // here we must use '\0'
in this case, the rest of the array would contain all 0's.
2. There is a shortcut that is allowed by C:
char s[80] = "Jim";
in this case also, the rest of the array would contain all 0's.
3. Looking ahead, there is a string function that will copy one string value into
another:
char s[80];
...
strcpy(s, "Jim");
in this case, the rest of the array (after the terminating null) would contain whatever
was there before the strcpy().
Since a string is an array of chars, we can look at individual chars in the string using
array notation.
cout << s[2]; // displays: m
Suppose we want a function that will return the Ith character of a string. We could
write it as follows:
char IthChar(char s[], int i)
{
return s[i];
}
// displays: m
// or
// if j has a value > 3
It would return some random byte (char) that happened to be in memory at 10 bytes
(or j bytes) beyond s - at s[10] or s[j]. But the actual effective string is just "Jim". So
we don't know what is in s[10] or s[j]. And so we don't know what it would be.
We could find out, of course, if we cout-ed the character returned, but it would be
meaningless. It was never assigned a value, so its value is random.
One way to fix the function so we couldn't make this mistake is to re-design the
function to return a code (0 = fail, 1 = ok, for example), and to pass back the char (if
found). We need to know if the subscripti is beyond the end of the string (the null).
int IthChar(char s[], int i, char *ch)
{
int p;
for (p = 0; p <= i; p++)
// if null is before ith pos in s, return(fail)
if (s[p] == '\0')
return 0;
// if we're here, we got to ith char
// without encountering '\0'
*ch = s[i];
return 1;
}
This considers s simply as an array of char, recognizes that the 0 terminates the
effective string, and makes use of array notation to do all this.
We could (again, looking ahead) use one of the standard C string functions to help. In
particular, there's a function called strlen() that can help:
int IthChar(char s[], int i, int *ch)
{
if (i > strlen(s)-1) // ?? check this carefully
return 0;
// fail
*ch = s[i];
return 1;
}
Let's check the if-condition carefully. Suppose s has the string "dog"
subscript: 0 1 2 3 4
string:
d o g 0 ? ? etc.
strlen(s) is 3 so we want this condition to be true if i is 3 or more (and then return 0).
Another way to say it is that we want the condition to be false if i is 0, or 1, or 2, but
not if it's more than that (so that we can go on and pass back one of the chars in the
string).
So if (i > 2) - for this example - we want the condition to be true so that we will
execute the return(0).. Since strlen(s) is 3, we can code strlen(s) - 1 to represent 2.
Alternately (verify this yourself) we could use >= strlen(s).
Ok. Now:
str is a ptr to a char (1st char in the passed string = 1st in the char array)
i also is a ptr to a char
in the for loop, i is initialized to point to the same place str points to: store the
address in str into i.
the loop continues as long as "the thing pointed to by i" is not the null
character. That is, as long as we have not hit the end of the string
each time through the loop, "the char pointed to by i" is given to
the toupper() function which returns the upper-case version of the char, and
then that returned char is stored into "the char pointed to by i"
and then i is incremented so that it points to the next char in the array. See
below - Pointer Arithmetic.
Note: if you're wondering if we could have avoided the use of i and just used str, the answer is
yes; the for loop would look like this:
for (; *str != '\0'; str++) // no initialization needed
*str = toupper(*str);
-- but there are some subtle issues here that we won't cover for a while, so you can ignore this
for now.
The exact same result could also be done using the more familiar subscript notation:
void strUpper(char s[])
{
int i;
// used as a subscript here
for (i = 0; s[i] != '\0'; i++)
s[i] = toupper(s[i]);
}
Most beginners prefer the subscript notation. However, most of the standard string
functions use pointer notation, so if you want to use them (or have to use them) you
have to understand the pointer notation. It is best not to mix notations unless you have
a specific good reason.
Note: One important difference between strings (arrays of char) and char * variables:
Consider the following:
char s[80] = "jim";
char *cptr = s;
Pointer Arithmentic
Incrementing a pointer variable makes it point at the next memory address for its type.
The actual amount depends on the number of bytes an instance of the particular data
type occupies in memory: In many versions of C:
incrementing a char * adds 1
incrementing a int * adds 4
But the important idea is that the pointer now points to a new address, exactly where
the next occurrence of that data type would be. This is exactly what you want when
you write code that goes through an array - just increment a pointer and you are
pointing at the next element. You don't even have to know how big the data type is C++ knows. In fact you could take code that works for Quincy (with 4-byte integers)
which uses a pointer variable that "walks through" an array and recompile it without
change on a system that uses 2-byte or 8-byte integers and the code would still work
fine.
So given a pointer to a memory location, you can add or subtract to/from it and make
it point to a different place in memory.
char str[80] = "beet";
changes "beet" to "beat". It's the same effect as the subscript notation:
str[2] = 'a';
and
ar[i]
Summary:
Given
char str[80] = "frog";
char *cptr;
int sub;
We have several ways to access and alter the chars in str. Suppose we want to
change str to "aaaa" All of the following will work:
for (sub = 0; sub < strlen(str); sub++)
str[sub] = 'a';
for (cptr = str; *cptr != '\0'; cptr++)
*cptr = 'a';
for ( sub = 0; sub < strlen(str); sub++)
*(str + sub) = 'a';
Note: when the C compiler sees subscript notation (as in the first example) it
internally changes it into the *(str + sub) notation. So internally, C is always using
pointer notation.
Now that we have covered pointer variables, be sure that you note the following:
you can change where a pointer points: pointer++ makes pointer point to a different
address in memory
you cannot change the value of an array name. Any attempt to do so will cause a
compile error.
char * strcpy(s1, s2) - s1 may or may not have a valid string in it. s2 must be a valid
string. Copies s2 to s1. s2 must not be bigger than s1 or array overflow will result. The
returned value may be useful if the result of the strcpy() is to be used as an argument
Version 1:
strcpy(r, s2);
strcat(r, ", ");
strcat(r, s1);
// r: "Smith"
// r: "Smith, "
// r: "Smith, John"
Version 2 uses the return value from strcat() directly, i.e. a pointer
to a char = pointer to 1st argument
strcpy(r, strcat(strcat(s2, ", "), s1));
3
2
1
int strlen(s) - returns the int length of s. This is the number of chars in s, not including
the null terminator.
// prints: junk
char * strncpy(s1, s2, max) - like strcpy() but won't copy more than max chars into s1.
If it hits max, it won't copy a '\0' and you will need to do it.
char * strncat(s1, s2, max) - like strcat() but it will not copy more than max chars
from s2 to s1. If it hits max, it won't copy a '\0' and you will need to do it.
In the example above, you could do the following to get as much of s2 into s1 as
possible:
if (strlen(s1) + strlen(s2) < 9)
strcat(s1, s2);
else
{
strncat(s1, s2, 8-strlen(s1));
s1[8] = "\0";
}
sscanf() is a C function, and allows some control over the format of the string
constructed (field width and number of decimal places for floats and doubles).
atoi() is a C and C++ function and is simpler to use. It only works for integers. There
is another similar function for floats (atof())
char s[80];
int i;
int num;
cout << "Enter a number: ";
cin.getline(s, 80);
for (i = 0; i < strlen(s); i++)
if (!isdigit(s[i]))
{
cout << "oops";
exit(0);
}
// Now you know it's all digits, so..
sscanf(s, "%d", &num); //now num has int value
/*
sscanf() ("string-scanf") is related to the standard C input function scanf(). We have
not studied it, but it is not hard to use. Its arguments are
the name (address) of the char array (the string) from which it is to get the chars
to convert.
a "format specifier" that tells the function what kind of data the chars in the
array are to be converted to. "%d" is used for integers, "%f" for floats, "%lf" for
doubles.
the address of the variable into which the converted value is to be placed.
*/
//Alternately, in place of sscanf(), you can use the simpler function:
num = atoi(s);
you would have to alter and recompile the program, or resort to some other
inconvenient way to provide this flexibility.
These values are supplied from the command line, after the program name. An
example might be:
c:/>myprog dog cat frog
Obviously, the user cannot just type anything. The user must know what valid values
are, and the program must be coded to detect and respond to them.
When the program starts running, it can look at the user-supplied command line
values and do one thing or another (a decision) based on their values.
(Note that this is different than I/O redirection, which used the special < or > symbols
and only allowed a file to be specified.)
For example, you may want to control how many numbers per line print in the
program. You might invoke the program as:
c:/>myprog 10
or
c:/>myprog 20
The program can get this value (as a string) and set a program variable (as an integer)
so that it can control the number of lines displayed.
How does a program get these values?
They are passed to the main() program as arguments.
int main(int argc, char *argv[])
{
...
}
The second argument represents an array of strings. That is, an array of pointers to
char.
The strings in the array are the values passed: "10" or "dog" "cat" "frog"
Actually, the first (i.e. the 0th) argument is the name and path of the
program: c:\myprog. So argc is always 1 or more because there is always at least the
program name.
You could access and display these arguments with the following code:
for (i = 0; i < argc; i++)
cout << "\n" << argv[i];
You could store the nth string (if argc is at least n-1) into another string:
char s[80];
strcpy(s, argv[n]);
Example: get a number from the command line to use to control the number of
number per line to print:
int main(int argc, char *argv[])
{
int numPerLine;
char cmdArg[4];
int i;
if (argc > 1 && strlen(argv[1] <= 3)
{
strcpy(cmdArg, argv[1]);
for (i = 0; i < strlen(cmdArg); i++)
if (!isdigit(cmdArg[i])
{
cout << "bad char in cmd line arg":
exit(0):
}
sscanf(cmdArg, "%d", &numPerLine); // or atoi()
}
else
{
cout << "missing or too-long cmd line arg);
exit(0):
}
// now use numPerLine as an int in code...
...
return 0;
} // end main()