Scheme-like Lists in C

Summary

Your experience with Scheme through the prior course should indicate that lists can be a particularly helpful structure for the storage and processing of many data types. Lists provide a very flexible context for processing, and unlike arrays, we do not need to specify a maximum size of a list when we create it. This reading discusses how lists might be implemented in C, following an approach analogous to the lists we studied previously in Scheme.

Because Scheme incorporates lists as a built-in data structure, Scheme supplies several built-in operations (e.g., cons, car, and cdr) for list processing, and we could program in Scheme using lists without considering the mechanics of how lists were implemented. To process lists in C, however, we first need to consider some internal details of Scheme lists. We then can translate these details to C.

Background: Lists in Scheme

When you call cons in Scheme, you are building a structure in memory with two parts, one of which refers to the first argument to cons and the other of which refers to the second. This structure is called a cons cell or a pair. Such lists in Scheme can be modelled graphically with a box-and-pointer representation. The basic idea is to use a rectangle—divided in half—to represent the result of the cons operation. From the first half of the rectangle, we draw an arrow to the head of a list; from the second half of the rectangle, we draw an arrow to the rest of the list. For example, (cons 'a '()) would be represented as follows:

the list (a)

Here, the line to the symbol 'a indicates that 'a is the head of the list. The diagonal line through the right half of the rectangle indicates that nothing comes later in this list. Since (cons 'a '()) gives the list '(a), this diagram represents '(a) as well.

Now consider the list (cons 'b (cons 'a null)) or the list '(b a). Here, we draw another rectangle, where the head points to the symbol 'b and the tail points to the representation of '(a) that we already have seen. The result is:

the list (b a)

Similarly, the list '(d c b a) is constructed as

(cons 'd (cons 'c (cons 'b (cons 'a '()))))
and would be drawn as follows:

the list (d c b a)

We may use a similar approach for lists having sublists as components. For example, consider the list '((a) b (c d) e) This is a list with four components, so at the top level we will need four rectangles, just as in the previous example for the list '(d c b a). Here, however, the first component designates the list '(a), which itself involves the box-and-pointer diagram already discussed. Similarly, the list '(c d) has two boxes for its two components, just as we discussed for '(b a) previously. The resulting diagram follows:

the list ((a) b (c d) e)

Throughout these diagrams, the null list is represented by a null pointer or line. Thus, the list containing the null list, ( ( ) )—that is (cons '() '())—is represented by a rectangle with lines through both halves:

the list containing the null list

Representing a Box-and-Pointer Diagram in C

In computer science, this box-and-pointer representation is a primary mechanism used to describe lists—not just in Scheme, but in most contexts. An implementation of lists in C typically utilizes this graphical perspective and involves three main elements:

A typical node contains two elements—one for data and the other to identify the next node on a list. Because C requires that we declare the type of data fields, we must tailor the data to the application at hand. For the remainder of this unit (today's reading and lab), we assume the data will be a string. The following declarations capture these elements.

#define STRMAX 50    /* maximum size of a string within a list node */

typedef struct node node_t;

struct node {
  char data [STRMAX+1]; /* Additional space for the terminating character */
  node_t *next;
};

To clarify these lines, a node is a structure with data and a next field, and node_t is a synonym for struct node. (It is simpler to write node_t than the two keywords struct node, and conceptually it seems cumbersome to have to write the struct keyword for each declaration.)

Some Designs for Scheme-like Lists

While the struct node or node_t type provides appropriate support to build lists that implement box-and-pointer representations, the design of a list may combine these node_ts in one of several ways. For example, here are some basic issues:

To clarify this second point, consider the Scheme statements:

(define x '(b c))
(define y (cons 'a x))

Thus, we can consider y to be the list '(a b c). The following figure shows two possible structures that could result:

Two alternatives

In the first option, the nodes of the original list are copied, and thus are explicitly distinct from those in the new list. In the second option, a new node is created for the cons node, a new value is added within that node, but the next part of that list refers to the old list.

For the example shown, both options may be reasonable. However, suppose we now change the second element of x from c to d, using Scheme's set-cdr! operation. (That is, the new x is the list (b d)). In the first option, y is not affected, while in the new approach y becomes the list (a b d). Because y refers to x when nodes are reused, any change to x also affects y. This may or may not be the desired result of changing x.

Overall both approaches have some advantages in certain cases. However, the first approach requires considerable overhead to duplicate nodes. Furthermore, in a purely functional context, lists are not altered during processing. In such a context, we could reuse nodes without fear of altering other lists unexpectedly, as old lists are never changed. Both of these observations explain why Scheme uses the second approach—reusing nodes when possible.

Implementing Scheme List Operations in C

To illustrate how to implement Scheme-style list operations in C, program scheme-lists.c shows implementations of the operations cons, car, and cdr. In addition, the program contains function listInit that initializes a list and listPrint that prints the elements of a list in Scheme format. Finally, whereas C requires programmers to handle all issues of memory allocation and deallocation, the program contains function listDelete that deallocates all nodes in a list and then sets the list variable to NULL.

Before considering specific details of the C functions, we review some elements of C syntax, based on the box-and-pointer representation of the Scheme list (a b c).

Modeling the list (a b c)

In this diagram, first is a variable that points to a node_t. C notates this type by adding an asterisk * to the declaration:

struct node * first;

Alternatively, since we used a typedef statement to define node_t as struct node, we could declare first as

node_t * first;

With either of these declarations, first is a pointer to a node_t - meaning it is a variable containing an address. The dereferencing operation *first accesses the node_t struct itself. Within this node_t, (*first).data yields the data field within the struct node, and (*first).next yields the next field. Alternatively, an arrow notation accomplishes the same result in a slightly cleaner form: first->data and first->next.

With this notation, we now review various details of the C functions. Full details of these functions are in program scheme-lists.c.

car

Because a node contains a string as data, the car function must return a pointer to a string (i.e., a char *) as its result. Altogether, we can access and return the car of a struct node as:

return list->data;

cdr

The cdr operation returns the next - a pointer to a struct node which has type node_ t*. Accessing and returning this field follows the same approach as car.

cons

For the cons, C first requires that we allocate space explicitly. C's malloc function accomplishes this task when we give it the amount of space to allocate. The relevant line is:

node_t * newNode =  malloc (sizeof(node_t));

Once the space is allocated, we need to fill the data and next fields. Following the above discussion, the next field will point to the head of the next node. For the data field, we copy the head string into the array.

listPrint

To print, we need a temporary variable current that starts at the beginning of a list and then progresses node-by-node until the end. By convention in C, a pointer that does not specify any node has the value NULL. Also, given one position in the list, the next node is obtained by looking in the next field. Putting these details together, the main structure of a list iteration loop is:

for ( node_t * current = list ; current != NULL ; current = current->next )
{
  /* printing details go here */
}

If we are to print results in the format given by Scheme, we should enclose an entire list in parentheses and separate successive list elements by a space. These details require some care.

listInit

Initialization requires some thought and care. One approach would be to assign

first = NULL;

in the main program.

Although this will work fine, we might want to accomplish initialization in a procedure. In this case, passing first makes a copy of first within the listInit procedure. Instead, we must pass the address of first and within the function we must place NULL at that address.

For an explanation of the ** notation (which is a pointer to a pointer), see section 17.6 of King and this example: doublepointers.c

listDelete

Deletion requires that we explicitly deallocate space for each node and then set the first pointer to NULL. Because the last step changes the first variable, we must pass the address of first to listDelete paralleling the approach used for listInit.

For completeness, we should first deallocate space for the rest of a list before deallocating the space for the first node. (If we proceeded in the other order, once we deallocated the first node, we could not be confident that the next field had valid data, so working down the list would be unreliable.) Proceeding recursively down the list handles subsequent nodes cleanly and easily.

Finally, the deallocation of memory uses the standard C function free.

Full Program

These definitions and methods combine to give program scheme-lists.c:

scheme-lists.c
/* Definition of data structure and operations
   for working with Scheme-like lists
*/

/* libraries for standard I/O, strings, and memory allocation */
#include <stdio.h>
#include <string.h>
#include <stdlib.h>


#define STRMAX 50    /* maximum size of a string within a list node */

typedef struct node node_t;

struct node {
  char data [STRMAX+1]; /* Additional slot for sentinel */
  node_t * next;
};


/* ---------------------------------------------------------------- */
/* function prototypes, listed in alphabetical order */

char *
car (node_t * list);
/* pre-condition:  list is an initialized list (actually list node)
   post-condition: head of the list is returned
*/

node_t *
cdr (node_t * list);
/* pre-condition:  list is an initialized list (actually list node)
   post-condition: tail of the list is returned
*/

node_t *
cons (char * newData, node_t * rest);
/* pre-condition:  newData is an initialized string
                   rest is an intialized list
   post-condition: returns new node with newData copied to data field
                   and next field pointing to rest
*/

/* function prototypes, listed in alphabetical order */

void
listDelete (node_t ** listPtr);
/* pre-condition:  *listPtr is an initialized list
   post-condition: *listPtr is changed to a NULL list
                   with any previously-defined nodes deallocated
*/

void
listInit (node_t ** listPtr);
/* pre-condition:  none
   post-condition: *listPtr is initialized to a null list
*/

void
listPrint (node_t * list);
/* pre-condition:  list is an initialized list
   post-condition: data in each node is printed, Scheme-style, within ()
*/

/* ---------------------------------------------------------------- */
/* function definitions */

char *
car (node_t * list) {
/* pre-condition:  list is an initialized list (actually list node)
   post-condition: head of the list is returned
*/
  return list->data;
} // car

node_t *
cdr (node_t * list) {
/* pre-condition:  list is an initialized list (actually list node)
   post-condition: tail of the list is returned
*/
  return list->next;
} // cdr

node_t *
cons (char * newData, node_t * rest) {
/* pre-condition:  newData is an initialized string
                   rest is an initialized list
   post-condition: returns new node with newData copied to data field
                   and next field pointing to rest
                   if new node cannot be allocated, returns NULL and prints error
*/
  node_t * newNode = (node_t*) malloc (sizeof(node_t));

  if (newNode == NULL) {
    perror ("Unable to allocate node");
    return NULL;
  }
  strncpy (newNode->data, newData, STRMAX+1); /* limit copy to avoid overflow */
  newNode->next = rest;  /* define next field of node as the rest of the list */
  return newNode;        /* return newly initialized node */
} // cons

void
listDelete (node_t ** listPtr) {
/* pre-condition:  *listPtr is an initialized list
   post-condition: *listPtr is changed to a NULL list
                   with any previously-defined nodes deallocated
*/
  if (*listPtr != NULL)
  {
    listDelete ( &((*listPtr)->next)); /* recursively remove rest of list */
    free (*listPtr);            /* deallocate the space for the node itself */
    *listPtr = NULL;    /* nullify list pointer, avoiding wayward refereces */
  }
} // listDelete

void
listInit (node_t ** listPtr) {
  /* pre-condition:  none
     post-condition: *listPtr is initialized to a null list
  */
  *listPtr = NULL;
} // listInit

void
listPrint (node_t * list) {
/* pre-condition:  list is an initialized list
   post-condition: data in each node is printed, Scheme-style, within ()
*/
  
  char * separator = ""; /* Initialize to no separator */

  printf ("(");
  for ( node_t * current = list ; current != NULL ; current = current->next )
  {
    printf ("%s\"%s\"", separator, current->data);
    separator = " "; /* Set separator for intermediate elements */
  }
  printf (")\n");
} // listPrint
  
/* ---------------------------------------------------------------- */
/* main program: testing for lists-like-Scheme operations */

int
main (void)
{
  printf ("testing of lists-like-scheme operations\n");
  
  /* declarations */
  node_t * a;
  node_t * b;
  node_t * c;
  node_t * d;
  node_t * e;
  char * str;

  /* create lists */
  listInit (&a);
  b = cons ("node B", a);
  c = cons ("node C", b);
  d = cons ("node D", c);
  e = cdr (d);
  
  /* check list creation */
  printf ("list a:  ");
  listPrint (a);
  
  printf ("list b:  ");
  listPrint (b);
  
  printf ("list c:  ");
  listPrint (c);
  
  printf ("list d:  ");
  listPrint (d);
  
  printf ("list e:  ");
  listPrint (e);
  
  /* test car operation (cdr tested earlier) */
  str = car (d);
  printf ("car of list d:  %s\n", str);

  /* clean up */
  listDelete (&d);
  a = NULL;
  b = NULL;
  c = NULL;  
  e = NULL;
  printf ("list d:  ");
  listPrint (d);

  printf ("end of testing\n");
} // main