Linked List Algorithm

     A linked list, is an object oriented list of objects that have three required properties: prev, next, and data. Each link is an object with prev, a pointer to the previous link, next, a pointer to the next link, and data, the data or object which the link contains.

     Why would one want to implement a linked list? A linked list is an efficient way to handle ordered or relative data. Think of a group of children; if the teacher wanted them to line up in a specific order for an event on PTA night, all she has to do is line them up and instruct each child to remember who is in front of them (prev) and who is behind them (next). The child him/herself is the data property. The child in front and back would remember their position and assign null to their prev and next objects respectively.

     Now, what if eight new children joined the class two days before PTA night? If a linked-list approach had not been taken and one student was responsible for remembering the positions of every child, it would be much harder for her to re-remember this new list (eight more children's names/faces and where they are positioned). In the linked-list approach, each child next to the new student(s) would only have to re-remember one, possibly two new children. The children had to remember two names anyway, in place of one child remembering before+8 names and their order.

     The singly linked list is a convenient way to store an unbounded array, that is create an array where one doesn't know in advance how large the array will need to be. The disadvantage of the linked list is that data can only be accessed sequentially and not in random order. To read the millionth element of a linked list, you must read the 999,999 elements that precede it. A linked list usually consists of a root node, allocated on the stack (an ordinary C variable) and one or more records, allocated on the heap (by calling malloc). The record has two parts, the link field to the subsequent record and the data field[s] containing the information you want stored in the linked list. The end of the linked list is indicated by setting the link field to NULL.

This explanation will use a short example program containing the basic operations on a linked list. It will maintain a database in an ascii file and allow the user to search, add, and delete records. The database is read from an ascii file and written back to it at conclusion of the program. While running, the database is held in memory as a singly linked list. The data structure describing the singly linked list is defined by the following code.

struct dbase_rec {
    struct dbase_rec	*next;
    char		*record;
};

typedef struct dbase_rec dbase;

The next field is a link to the next record and the record field is a pointer to a string containing the information read from the ascii file. Note that the type of the next field is dbase_rec *, or pointer to dbase_rec. The type of the link field in a linked list must be a pointer to whatever type the list element is. Usually I prefer defining structures as new types with a typedef statement. Normally this can be done as a single statement, such as

typedef struct {
int                     foo;
char                    bar[64];
} foobar;

However, this will not work with defining a linked list structure because the new datatype is needed to define the link's datatype before the definition is complete. So it is necessary to define the structure first and then create the new type in a separate statement, as shown above.

Before explaining the linked list's code, I will first explain several of the functions needed to manipulate the records. Since the dbase structure only contains a pointer to a record, each record must be allocated on the heap. When the program is finished with the record, the memory that held it must be released. These operations are performed by the functions new_record and drop_record.

char *
new_record (char *buffer) {
    char *rec = (char *) malloc (SZ_RECORD+1);

    strcpy (rec, buffer);
    return rec;
}

void
drop_record (char *rec) {
    free (rec);
}

Each record is divided into fields separated by blanks. The first field in the record is defined to be the record key. When searching the database, the code nees to know if a specified key matches the record's key. This function is performed by cmp_record, which is patterned on the C library function strcmp.

int 
cmp_record (char *key, char *rec) {
    for ( ;*key && *rec && *rec == *key; rec ++, key ++)
	;

    if ((*key == ' ' || *key == '\0') && 
	(*rec == ' ' || *rec == '\0'))
	return 0;

    return *key - *rec;
}

The simplest operation on a linked list is a search, implemented by function find_dbase. This function returns the record whose key matches the specified key or NULL if no record matches the key.

char *
find_dbase (dbase *db, char *key) {
    dbase *ptr = NULL;

    for (ptr = db; ptr != NULL; ptr = ptr->next) {
        if (cmp_record (key, ptr->record) == 0)
            return ptr->record;
    }

    return NULL;
}

The first argument to the function is the root of the linked list. It has type dbase *, as it points to the first record of the list. The for loop shows the typical way of stepping through a linked list. A pointer is initialized to the list root. Then on each iteration of the loop the pointer is set to the link field in the node currently being examined. When the link field NULL, all elements in the list have been examined and the loop terminates. Notice that the fields of the linked list element are accessed by the arrow notation instead of the dot notation. The is because ptr contains a pointer to the element and not the element itself.

The next operation on a linked list is element deletion. The function del_dbase illustrates this operation by deleting a record from the database. The code for this function is:

int
del_dbase (dbase **db, char *key) {
    dbase *ptr = NULL;
    dbase *prevptr = NULL;

    for (ptr = *db; ptr != NULL; ptr = ptr->next) {
        if (cmp_record (key, ptr->record) == 0)
            break;

       prevptr = ptr;
    }

    if (ptr == NULL)
        return 0;

    if (prevptr != NULL) {
        prevptr->next = ptr->next;
    } else {
        *db = ptr->next;
    }

    drop_record (ptr->record);
    free (ptr);

    return -1;
}

The function returns an integer value to indicate if a record has been deleted or no record was deleted because the record was not found. The root node is passed to the function as a doubly indirect pointer (dbase **db) because the function may modify the value of the linked list root. If the first element is deleted from the linked list, the root must be modified to point to the second element, or be set to NULL if there is no second element. The for loop that searches for the matching record is identical to the code in find_dbase, except that an additional pointer, prevptr, is being kept to the list. This ponter points to the list element before ptr. It is set at the end of the for loop for the next iteration of the loop. On the first iteration of the loop it has the value NULL, indicating there is no previous element in the list. On exiting the loop, the function checks to see if the element matching the specified key was found. If the for loop took a premature exit by executing the break statement, ptr will have a value other than NULL. If the loop exitied normally, its value will be NULL. If the element was not found, there is nothing to delete and so the function returns to its caller.

The next section of code deletes the element from the linked list. This is done by setting the link field of the element pointed to by prevptr to the link field of the element pointed to by ptr. This has the result of removing the element from the list. To illustrate this process in a graph, the list

    +------+    +------+    +------+
    |      |--->|      |--->|      |
    +------+    +------+    +------+
       ^           ^
    prevptr       ptr

becomes the list

                +--------+
    +------+   / +------+ \   +------+
    |      |--+  |      |  +->|      |
    +------+     +------+     +------+
       ^            ^
    prevptr        ptr

Note that this operation works properly even if the element pointed to by ptr is the last element in the list. In this case, the link field of the element pointed to by prevptr is set to NULL. If the element to be deleted is the first element in the linked list, then the root of the linked list must be modified rather than the previous element.

All that is left is to free the memory that holds the deleted element and the record it point to, which the last section of code in del_dbase does.

The last and most complicated operation on the linked list is adding an element. This combines the two distinct operations of inserting an element into a list that does not contain it and replacing an element where the element is already in the list. Thus adding an element to the linked list combines the operation of searching, deletion, and the new operation of insertion. The function add_dbase illustrates adding an element to a linked list.

int
add_dbase (dbase **db, char *rec) {
    int cmp;
    int status;
    dbase *ptr = NULL;
    dbase *row = NULL;
    dbase *prevptr = NULL;
    dbase *nextptr = NULL;

    row = (dbase *) malloc (sizeof (dbase));
    row->next = NULL;
    row->record = new_record (rec);

    cmp = 1;
    for (ptr = *db; ptr != NULL; ptr = ptr->next) {
        cmp = cmp_record (rec, ptr->record);
        if (cmp <= 0) 
            break;

        prevptr = ptr;
    }

    if (cmp != 0) {
        nextptr = ptr;
        status = 1;

    } else {
        nextptr = ptr->next;
        status = 0;

        drop_record (ptr->record);
        free (ptr);
    }

    row->next = nextptr;

    if (prevptr != NULL) {
        prevptr->next = row;
    } else {
        *db = row;
    }

    return status;
}

The first section of code creates the new list element and sets its fields. Then the for loop searches for the elements that bracket the location of the new element. The pointer prevptr points to the element whose key is less than the key of new element and the pointer ptr points to the element whose key is greater than or equal to the key of the new element. If the key of the new element is less than the keys of all the elements on the list then prevptr will be NULL when the for loop exits. Similarly, if the key of the new element is greater than all the keys on the list, then ptr will be NULL. In the case that the list is empty, both pointers will be NULL. The variable cmp is set to 1 before the loop so the case where the list is empty is handled properly. In this case it is arbitrary to say whether the new element is less than or greater than the elements on the list, though it would be an error to say that it is equal to an element on the list.

The next block of code distinguishes between the case where the new element is less than the element pointed to by ptr and the case where the new element is equal to the element pointed to by ptr. In the first case, the function adds the new element to the list and in the second case, the function replaces the existing element with the new element. Both branches of the code set nextptr to point to the element that is strictly greater than the new element. The second branch also releases the memory held by the element to be replaced.

At the end of the if block prevptr points to the element that precedes the new element and nextptr points to the element that follows it. All that is left is to update the pointers of the linked list to reflect this. The last section of code in add_dbase does this, paying attention to the case where prevptr is null, in which case the root of the database must be changed. If you study the code yo will see that the case where ptr and nextptr are null are also handled correctly.