Characters and Strings

Background

As you likely know from your work in Scheme, strings (and their constituent characters) are a very important data type in most any programming language. There is little in the way of extra packaging when it comes to working with strings in the C programming language, as your reading from the textbooks will explain:

Character Arrays, Strings, char* and Storage

Program string-intro.c shows several variations related to the declaration of character arrays, strings, and char* variables.

string-intro.c
/* program illustrating arrays, strings, and pointers */

#include <stdio.h>

int
main (void)
{
  char first [4] = {'C', 'o', 'l', 'd'};    /* first as an array of characters */
  char second[6] = "World"; /* second as a string: char array ending with null */
  char third [16] = {'C', 'o', 'm', 'p', 'u', 't', 'e', 'r', /* third as array */
                     ' ', 'S', 'c', 'i', 'e', 'n', 'c', 'e'}; /* of characters */
  char * fourth = second;     /* fourth as a pointer to an array of characters */
  char * fifth = "Hello";            /* fifth as a pointer to a string literal */


  printf ("first 3 characters in each array\n");
  printf ("   first: %c%c%c\n",  first[0],  first[1],  first[2]);
  printf ("  second: %c%c%c\n", second[0], second[1], second[2]);
  printf ("   third: %c%c%c\n",  third[0],  third[1],  third[2]);
  printf ("  fourth: %c%c%c\n", fourth[0], fourth[1], fourth[2]);
  printf ("   fifth: %c%c%c\n",  fifth[0],  fifth[1],  fifth[2]);

  printf ("Variable addresses and array base addresses\n");
  printf ("   first address: %p,   array base address: %p\n", &first,  first);
  printf ("  second address: %p,   array base address: %p\n", &second, second);
  printf ("   third address: %p,   array base address: %p\n", &third,  third);
  printf ("  fourth address: %p,   array base address: %p\n", &fourth, fourth);
  printf ("   fifth address: %p,   array base address: %p\n", &fifth,  fifth);

  printf ("variables printed as strings\n");
  printf ("   first: %s\n", first);
  printf ("  second: %s\n", second);
  printf ("   third: %s\n", third);
  printf ("  fourth: %s\n", fourth);
  printf ("   fifth: %s\n", fifth);

  return 0;
} // main

One run of this program produced the following output (hexadecimal addresses have been converted to base-ten integers for readability):

first 3 characters in each array
   first: Col
  second: Wor
   third: Com
  fourth: Wor
   fifth: Hel
Variable addresses and array base addresses
   first address: 359157264,   array base address: 359157264
  second address: 359157248,   array base address: 359157248
   third address: 359157232,   array base address: 359157232
  fourth address: 359157224,   array base address: 359157248
   fifth address: 359157216,   array base address: 4196464
variables printed as strings
   first: Cold\ufffd
  second: World
   third: Computer ScienceWorld
  fourth: World
   fifth: Hello

Understanding this program and output can provide substantial insights to how C works with arrays, characters, strings, and pointers.

Storage

The right column shows (in extreme detail) the allocation of memory for program string-intro.c, based upon the above run. Starting at the top of the program:

  • first is allocated space for four characters, beginning in storage location 359157264 (see bottom part of the table). Following the normal approach of initializing arrays, the letters, C, o, l, and d are stored in these locations. The program does not specify what data might be located after this part of memory.
  • second is allocated space for six characters, beginning in storage location 359157248. In C, a string contains a sequence of characters, followed by a null character (code zero). Since World contains five characters, the string requires six characters to include the code 0 at the end.
  • In organizing memory, the compiler decided not to use the space between second and first for data storage. Although these memory locations are present, the data in those unallocated memory addresses may be left over from the work of previous programs.
  • third is allocated space for sixteen characters, beginning in storage location 359157232. As with first, this space is initialized with specified characters. As an array of characters (not a string), no code zero is placed in memory at the end of this array.
  • fourth specifies the address of a character (e.g., a pointer to the character). In this case, fourth is given the address that begins the string second defined earlier. Note that fourth refers to a location in memory (359157224), and the address of second (359157248) is stored in the variable fourth.
  • fifth specifies the address of a character. The address of a character can be the base address of a character array. A char * may be considered either the location of a single character or the starting point for a string. In this case, information for variable fifth is located at 359157216, and that location contains the starting location 4196464 for the literal string "Hello" — compilers often reserve a separate part of main memory for literal data, such as literal strings.

Output

The first set of printf statements access the first three characters in each character array. Within a printf statement, the %c format prints exactly one data element as a character, so that three characters are printed for each printf statement here. Note that arrays and subscripts work the same whether the variable is declared as an array or as the base address of an array found elsewhere.

The second set of printf statements display where each variable is mapped in main memory. The output shown above maps to the memory schematic on the right.

The third set of printf statements print data as C strings. In C, a string variable identifies a starting or base address, and the string is considered to continue until a code 0 or null character is encountered.

  • For variables, second, fourth, and fifth, the character data were stored with a null character at the end, and these character strings are printed without difficulty.
  • For the variable third, the initialization placed characters in the array, but no null character was at the end. Rather, from the mapping of memory identified in the table, the string "World" was located immediately after the characters in the third array. When printing third, the printf started with the first character of third (i.e., the C character) and continued character by character until reaching a null. Since no null character was encountered in the processing of the third array, printing continued with the data from the second array.
  • For the variable first, the array declaration specified four characters, without a null character at the end. Although this works fine for arrays, work with strings requires processing to continue until a null is found. In this case, first is stored in memory at the end of the program area, and we have no idea what might follow. Thus, processing proceeds with the printing of random material until a null is found.

Schematic Memory Diagram

variable value stored memory address
section
of
memory
for
literal
strings
H 4196464
e 4196465
l 4196466
l 4196467
o 4196468
\0 (number) 4196469
fifth integer
value
4196464
359157216
359157217
359157218
359157219
359157220
359157221
359157222
359157223
fourth integer
value
359157248
359157224
359157225
359157226
359157227
359157228
359157229
359157230
359157231
third C 359157232
o 359157233
m 359157234
p 359157235
u 359157236
t 359157237
e 359157238
r 359157239
<space> 359157240
S 359157241
c 359157242
i 359157243
e 359157244
n 359157245
c 359157246
e 359157247
second W 359157248
o 359157249
r 359157250
l 359157251
d 359157252
\0 (number) 359157253
  not specified 359157254
  not specified 359157255
  not specified 359157256
  not specified 359157257
  not specified 359157258
  not specified 359157259
  not specified 359157260
  not specified 359157261
  not specified 359157262
  not specified 359157263
first C 359157264
o 359157265
l 359157266
d 359157267
  not specified 359157268
  not specified 359157269