CS360 Lecture notes -- Malloc Lecture #2

  • James S. Plank
  • Directory: /home/plank/cs360/notes/Malloc2
  • Lecture notes: http://web.eecs.utk.edu/~jplank/plank/classes/cs360/360/notes/Malloc2/lecture.html
  • Original Notes: 1996.
  • Last updated: Mon Mar 28 17:31:50 EDT 2022
  • The material Stephen Marz used when he taught this lecture in 2017.

    More on malloc() and free().

    Last class I discussed how malloc keeps a big buffer of memory in the heap and carves it up and doles it out whenever the user calls malloc(). If there is not enough room in the buffer for what the user desires, then sbrk() is called to get heap storage for a big enough buffer.

    Malloc actually works in a different way. What it really does is maintain a linked list of free memory. When malloc() is called, it looks on its list for a piece of memory that is big enough. If it finds one, then it removes that memory from the linked list and returns it to the user. When free() is called, the memory is put back on the linked list. Now, to be efficient, if there is a chunk of memory on the free list that much bigger than what is requested, then it breaks up that chunk into two chunks -- one which is the size of the request (padded to a multiple of 8), and the remainder. The remainder is put on the free list and the one the size of the request is returned to the user. This is the standard way that you view malloc -- it manages a free list of memory. Malloc() takes memory from the free list and gives it to the user, and free() puts memory back to the free list.

    Initially, the free list is empty. When the first malloc() is called, we call sbrk() to get a new chunk of memory for the free list. This memory is split up so that some is returned to the user, and the rest goes back onto the free list.

    Before going into an example, I should say something about how the free list is implemented. There will be a global variable malloc_head, which is the head of the free list. Initially, malloc_head is NULL. When malloc() is first called, sbrk() is called so that some memory can be put on the free list. The way to turn memory into a free list element is to use the first few bytes as a list structure. In other words, if you have a chunk of memory on the free list and that chunk always has at least 12 bytes, then you can treat the first twelve bytes of the memory as a list structure. In other words:

    How do you do this? You set up a typedef like the following:
    typedef struct flist {
       int size;
       struct flist *flink;
       struct flist *blink;
    } *Flist;
    
    And when you need to treat a chunk of memory starting at location s (where s is a (char *) or (void *) or caddr_t) as a free list element, you cast it:
      Flist f;
      
      f = (Flist) s;
    

    A Detailed Example

    Now, we're going to run through an example, of running the program program.c. I put the whole program below, but we're going to show it working in bits and pieces.

    #include <stdio.h>
    #include <stdlib.h>
    
    int main()
    {
      int *p;
      unsigned char *q;
      int *r;
      int i;
    
      p = (int *) malloc(sizeof(int) * 4);      // Allocate 16 bytes for p and fill some of them in
      p[0] = 678;
      p[1] = 0x6824abcd;
      p[3] = 5555;
    
      q = (unsigned char *) malloc(sizeof(unsigned char) * 6);   // Now 6 bytes for q
      q[2] = 0x23;
      q[7] = 0xfe;              // This writes past the "allocated" region but doesn't harm anything
    
      free(p);                  // free p to see its memory chunk go onto the free list
    
      p = (int *) malloc(sizeof(int) * 6);      // Allocate 24 bytes and fill them in
      for (i = 0; i < 6; i++) p[i] = 100+i;
    
      r = (int *) malloc(sizeof(int) * 2);      // Allocate 8 bytes and fill in 12 bytes past what's allocated
      for (i = 0; i < 5; i++) r[i] = 200+i;
      
      q[8] = 127;                               // This is going to damage the next allocated chunk
      return 0;
    }
    

    Heap memory starts as follows: malloc_head equals NULL. In other words, the free list is empty. We run the first malloc() in the program:

      p = (int *) malloc(sizeof(int) * 4);  
    

    Your malloc program sees that there is no memory on the free list, so it calls sbrk(8192) to get 8K of heap storage for free memory. Suppose sbrk(8192) returns 0x6100. Then you have the following view of memory:

              malloc_head == NULL
    
             |---------------|
             |               | 0x6100 (start of heap)
             |               | 0x6104
             |               | 0x6108
             |               | 0x610c
             |               | 0x6120
             |               |
                   .....        
             |               |
             |---------------| 0x8100 (end of heap -- sbrk(0))
    
    To put this hunk of memory on the free list, you typecast it to an Flist, and then do the work to link it onto the list. At the end of this process, memory will look as follows (in this and the remaining pictures, I'm going to color memory that has been changed to blue):
              malloc_head == 0x6100
    
             |---------------|
             |    8192       | 0x6100 (start of heap)
             |    NULL       | 0x6104
             |    NULL       | 0x6108
             |               | 0x610c
             |               | 0x6110
             |               |
                   .....        
             |               |
             |---------------| 0x8100 (end of heap -- sbrk(0))
    
    You need to satisfy the user's request for 16 bytes. You split this chunk of memory into two chunks -- one that is 24 bytes (16 for the user and 8 for bookkeeping), and one that is the remaining 8192-24 = 8168 bytes. You put the latter chunk onto the free list, and return a pointer to the 16 bytes allocated for the user to the user:
              malloc_head == 0x6118
    
             |---------------|
             |      24       | 0x6100 (start of heap)
             |     NULL      | 0x6104
             |     NULL      | 0x6108  <-- beginning of 16 bytes for the user
             |               | 0x610c
             |               | 0x6110
             |               | 0x6114
             |     8168      | 0x6118
             |     NULL      | 0x611c
             |     NULL      | 0x6120
                   .....        
             |               |
             |---------------| 0x8100 (end of heap -- sbrk(0))
    
    So after the malloc() call, p is set to 0x6108. You'll note, I left the NULL's in memory at 0x6104 and 0x6108. That's because they are typically left there. Sometimes that can help you debug. The next few lines set some of p's elements:

      p[0] = 678;
      p[1] = 0x6824abcd;
      p[3] = 5555;
    

    Here's memory:

              malloc_head == 0x6118
    
             |---------------|
             |      24       | 0x6100 (start of heap)
             |     NULL      | 0x6104
           p |     678       | 0x6108  
             |   0x6824abcd  | 0x610c
             |               | 0x6110
             |     5555      | 0x6114
             |     8168      | 0x6118
             |     NULL      | 0x611c
             |     NULL      | 0x6120
                   .....        
             |               |
             |---------------| 0x8100 (end of heap -- sbrk(0))
    
    Now, in the next piece of code, the user calls malloc(6).

      q = (unsigned char *) malloc(sizeof(unsigned char) * 6);
    

    You do the same thing -- carve 16 bytes off of the chunk of memory in the free list, and return a pointer to 8 of them as the return value for malloc. The other 8 bytes are for bookkeeping. The remaining 8152 bytes are put back on the free list. In other words, after malloc(6) is called, the heap looks like:

              malloc_head == 0x6128
             |---------------|
             |      24       | 0x6100 (start of heap)
             |     NULL      | 0x6104
           p |     678       | 0x6108
             |   0x6824abcd  | 0x610c
             |               | 0x6110
             |     5555      | 0x6114
             |      16       | 0x6118
             |     NULL      | 0x611c
           q |     NULL      | 0x6120 <----------- This is returned from malloc
             |               | 0x6124
             |     8152      | 0x6128
             |     NULL      | 0x612c
             |     NULL      | 0x6130
                   .....        
             |               |
             |---------------| 0x8100 (end of heap -- sbrk(0))
    
    I left in the two "NULL's" at 0x611c and 0x6120, because they were part of the free node before the 16 bytes were carved off, and they aren't overwritten.

    Next, we set two bytes:

      q[2] = 0x23;
      q[7] = 0xfe;              // This writes past the "allocated" region but doesn't harm anything
    

    Let's see the effect on memory:

              malloc_head == 0x6128
             |---------------|
             |      24       | 0x6100 (start of heap)
             |     NULL      | 0x6104
           p |     678       | 0x6108
             |   0x6824abcd  | 0x610c
             |               | 0x6110
             |     5555      | 0x6114
             |      16       | 0x6118
             |     NULL      | 0x611c
           q |   0x00230000  | 0x6120 
             |   0xfe??????  | 0x6124
             |     8152      | 0x6128
             |     NULL      | 0x612c
             |     NULL      | 0x6130
                   .....        
             |               |
             |---------------| 0x8100 (end of heap -- sbrk(0))
    
    Remember that NULL is 0x00000000. So when we set q[2] to 0x23, that will set byte 2 to 0x23, but the others will remain at zero. Why 0x00230000 rather than 0x00002300? Because our machines are little endian.

    When we set q[7] to 0xfe, there are two things to notice. First, I put question marks for the other three bytes in the word, because we don't know what they were initially. Second, no harm was done with our misuse of memory -- we allocated 6 bytes, but we received 8 bytes. So when we set q[7], we're lucky and nothing is wrong. You shouldn't rely on this -- who knows when they'll change malloc!

    The next thing we do is:

      free(p);                  // free p to see its memory chunk go onto the free list
    

    Remember, p is 0x6108, so we will back up eight bytes to see the size of the chunk: 24 bytes. We need to put this 24-byte chunk onto the free list. We'll put it on the front, since that's nice and easy. So, we'll:

    Here's what memory looks like:
              malloc_head == 0x6100
             |---------------|
             |      24       | 0x6100 (start of heap)
             |    0x6128     | 0x6104
             |     NULL      | 0x6108
             |   0x6824abcd  | 0x610c
             |               | 0x6110
             |     5555      | 0x6114
             |      16       | 0x6118
             |     NULL      | 0x611c
           q |   0x00230000  | 0x6120 
             |   0xfe??????  | 0x6124
             |     8152      | 0x6128
             |     NULL      | 0x612c
             |    0x6100     | 0x6130
                   .....        
             |               |
             |---------------| 0x8100 (end of heap -- sbrk(0))
    
    Next, we allocate p again, this time requesting 24 bytes:

      p = (int *) malloc(sizeof(int) * 6);  
    

    We need a chunk with 32 bytes, so we have to carve 32 bytes off the big 8152-byte chunk. That will leave a chunk with 8120 bytes as the second node of the free list. As before, I'll color the changed values of memory in blue.

              malloc_head == 0x6100
             |---------------|
             |      24       | 0x6100 (start of heap)
             |    0x6148     | 0x6104
             |     NULL      | 0x6108
             |   0x6824abcd  | 0x610c
             |               | 0x6110
             |     5555      | 0x6114
             |      16       | 0x6118
             |     NULL      | 0x611c
           q |   0x00230000  | 0x6120 
             |   0xfe??????  | 0x6124
             |      32       | 0x6128
             |     NULL      | 0x612c
           p |    0x6100     | 0x6130 <------ Value returned from malloc(24)
             |               | 0x6134
             |               | 0x6138
             |               | 0x613c
             |               | 0x6140
             |               | 0x6144
             |     8120      | 0x6148
             |     NULL      | 0x614c
             |    0x6100     | 0x6150
                   .....        
             |               |
             |---------------| 0x8100 (end of heap -- sbrk(0))
    
    The next chunk of code simply writes those 24 bytes:

      for (i = 0; i < 6; i++) p[i] = 100+i;  
    

    Here's memory:

              malloc_head == 0x6100
             |---------------|
             |      24       | 0x6100 (start of heap)
             |    0x6148     | 0x6104
             |     NULL      | 0x6108
             |   0x6824abcd  | 0x610c
             |               | 0x6110
             |     5555      | 0x6114
             |      16       | 0x6118
             |     NULL      | 0x611c
           q |   0x00230000  | 0x6120 
             |   0xfe??????  | 0x6124
             |      32       | 0x6128
             |     NULL      | 0x612c
           p |     100       | 0x6130 
             |     101       | 0x6134
             |     102       | 0x6138
             |     103       | 0x613c
             |     104       | 0x6140
             |     105       | 0x6144
             |     8120      | 0x6148
             |     NULL      | 0x614c
             |    0x6100     | 0x6150
                   .....        
             |               |
             |---------------| 0x8100 (end of heap -- sbrk(0))
    
    Our last chunk of code allocates 8 bytes:

      r = (int *) malloc(sizeof(int) * 2);  
    

    We need 16 bytes for this, and we can get it from the first node on the free list, since it has 24 bytes. However there is a question -- should we carve off 16 bytes from it, or just use all 24? The answer is that we should use all 24 bytes (give the user 16 bytes even though he or she will only think it's 8). Why? Because if we carve off 16 bytes, we'll only have 8 leftover, and that's not enough to store the pointers to hook it into the free list. Thus, we take the entire 24 bytes off the free list and give it to the user:

              malloc_head == 0x6148
             |---------------|
             |      24       | 0x6100 (start of heap)
             |    0x6148     | 0x6104
           r |     NULL      | 0x6108 <------------ Return value of malloc(8)
             |   0x6824abcd  | 0x610c
             |               | 0x6110
             |     5555      | 0x6114
             |      16       | 0x6118
             |     NULL      | 0x611c
           q |   0x00230000  | 0x6120 
             |   0xfe??????  | 0x6124
             |      32       | 0x6128
             |     NULL      | 0x612c
           p |     100       | 0x6130 
             |     101       | 0x6134
             |     102       | 0x6138
             |     103       | 0x613c
             |     104       | 0x6140
             |     105       | 0x6144
             |     8120      | 0x6148
             |     NULL      | 0x614c
             |     NULL      | 0x6150
                   .....        
             |               |
             |---------------| 0x8100 (end of heap -- sbrk(0))
    
    Note how malloc_head and the pointers in the chunk starting at 0x6148 are changed to reflect the chunk being removed from the free list.

    The last few lines of code are really bad, because they write over bytes that aren't allocated for r and q.

      for (i = 0; i < 5; i++) r[i] = 200+i;  
      
      q[8] = 127;
    

    Let's see what happens:

              malloc_head == 0x6148
             |---------------|
             |      24       | 0x6100 (start of heap)
             |    0x6148     | 0x6104
           r |     200       | 0x6108
             |     201       | 0x610c
             |     202       | 0x6110
             |     203       | 0x6114
             |     204       | 0x6118  -- This is the "size" of q's chunk
             |     NULL      | 0x611c
           q |   0x00230000  | 0x6120 
             |   0xfe??????  | 0x6124
             |     127       | 0x6128  -- This is the "size" of p's chunk
             |     NULL      | 0x612c
           p |     100       | 0x6130 
             |     101       | 0x6134
             |     102       | 0x6138
             |     103       | 0x613c
             |     104       | 0x6140
             |     105       | 0x6144
             |     8120      | 0x6148
             |     NULL      | 0x614c
             |     NULL      | 0x6150
                   .....        
             |               |
             |---------------| 0x8100 (end of heap -- sbrk(0))
    
    I've colored 8 of the bytes green -- these are bytes that the user shouldn't have written, but since he/she was allocated 16 bytes instead of 8, it doesn't hurt. I've colored 8 of the bytes red, as these are disaster mistakes. In each case, the user has overwritten the size field of the next chunk in memory. Think about what happens next if q or p are freed? Disaster.

    More on free()

    Free() must be called with addresses that have been returned from malloc(). Why? Because free expects that eight bytes behind the address will contain the size of the memory to be freed. If you call free with an address that wasn't allocated with malloc(), then free() may well do something strange.

    For example, look in badfree.c. It calls free() with an address in the middle of an array, and after a while, this causes malloc() to dump core. It takes a while since malloc() seems to allocate from its buffer first, before attempting to use the (bogus) free'd blocks.

    In class it was suggested that malloc() keep track of the addresses that it allocates, and free() should check its argument to see whether or not it is a valid one. This is a good suggestion. However, malloc() will have to keep track of these addresses in a red-black tree structure or a hash table to make lookup fast. Otherwise, free() will take too long. You should note that red-black trees as implemented in libfdr.a cannot be used for this purpose, because the jrb routines call malloc() and free().

    Another way for free() to do error checking is to use the bookkeeping bytes to help you. You'll notice that the word 4 bytes before the address returned by malloc() is unused. What you can do is set that word to a checksum when you call malloc(). Then when free() is called, it will check that word to see if it has the desired checksum. If not, it will know that free() is being called with a bad address and can flag the error.


    Coalescing free blocks

    When you call free(), you put a chunk of memory back on the free list. There may be times when the chunk immediately before it in memory, and/or the chunk immediately after it in memory are also free. If so, it makes sense to try to merge the free chunks into one free chunk, rather than having three continguous free chunks on the free list. This is called ``coalescing'' free chunks.

    Here is one way that you can perform coalescing. (You should not do this in your lab). However, if you want to give it a try on your own, it is a very good exercise in hacking. Instead of having all 8 of your bookkeeping bytes before the memory allocated for the user, keep 4 before and 4 after. When a memory chunk is allocated, both words are set to be the size of the chunk.

    Free chunks on the free list have a different layout. They must be at least 16 bytes in size. The first 12 bytes should be as described above, except (-size) should be kept in the first four bytes instead of size. The last 4 bytes of the chunk should also hold -size.

    What this lets you do is the following. When you free a chunk, you can look 8 bytes behind the pointer passed to free(). That should be either:

    In other words, if the word 8 bytes before the given pointer is positive, then the chunk before the one being freed has been allocated. Otherwise, it is free, and can be coalesced. If the word after the current chunk is negative, then the subsequent chunk is free and it can be coalesced. Otherwise, it has been allocated. Cool, no?

    Obviously, you need to take care at the beginning and end of the heap to make sure that your checks all work. This is not difficult.