Skip to content

Latest commit

 

History

History
338 lines (250 loc) · 15.6 KB

File metadata and controls

338 lines (250 loc) · 15.6 KB

What's News

The zoning commission for Stroustroupville has formalized a proposal for new procedures on determining how to adjudicate wasted property. After a public comment period, the commissioners will vote to adopt the rules which are intended to discourage property profligacy.

Dynamic Memory -- Leaky, Leaky

Dynamic memory is a powerful tool for programmers. There are some times where it is just impossible to know the amount of memory that a task will require until the task is already in process. Because that is a requirement for the compiler to be able to generate code to safely and correctly manage space for us, we have no other choice but to resort to dynamic memory where we handle the allocations and deallocations ourselves.

Here is a completely fictional, useless function named liddy that

  1. takes a size parameter (as an int)

and

  1. allocates a size-sized array of integers, and
  2. returns that to the caller for their use.
int *liddy(int size) {
    int *dynamic_integer_array{new int[size]};
    return dynamic_integer_array;
}

Note: Be sure that you are familiar with the syntax of this function before reading further in this edition of the C++ Times. In particular, make sure that you are comfortable with the new operator that forms the new expression that performs the memory allocation.

liddy will create a variable with dynamic storage duration on the heap and store a pointer to that variable in dynamic_integer_array. liddy returns the pointer to that dynamically allocated variable (which is an array of int-egers) to the caller before it deallocates it. If liddy allocated memory on the heap but did not release it back to the system before completing execution, someone else must be responsible for doing that cleanup.

Use It Or (And?) Lose It

Here's a (yes, again, simple) example of code that would use the liddy function:

int main() {
    auto *hunt{liddy(5)};
    std::cout << "I made space!\n";

    hunt = liddy(7);
    std::cout << "I made (more) space!\n";

    return 0;
}

The code really doesn't do anything but ask for memory (twice) from liddy. And liddy is only too happy to oblige.

Depending on how good your eyes are tuned for spotting memory leaks, you may already see the problem.

One If By Stack, Two If By Heap

We have looked at length at the difference between the stack and the heap and what determines where a variable's space for its values are created. All automatic variables' space for their values are on the stack and all dynamic variables's space for their values are on the heap.

But that's really only half the story. In most cases, when a programmer uses dynamic memory, they write several C++ statements that really create/involve two variables. There is the anonymous variable (a variable without a name) whose space for its values is allocated on the heap (the result of the function call to liddy in this case) and then there is a named variable (hunt in this example) whose space for its values is allocated on the stack which points to the anonymous variable. Programmers usually think that the former is the most important piece. However, without the latter variable (hunt, the variable that provides the programmer the link with that variable in the heap), the programmer would be unable to use their newly acquired resource.

(Again, what we are describing here is what happens in most cases -- there are plenty of exotic ways to use pointers.)

The space for the value of the named variable, what programmers call the pointer variable, is actually on the stack (again, hunt in this example). The former is, obviously, on the heap. That's great, but does it matter?

It matters a great deal. The compiler generates code for automatically managing the space for storing values of automatic variables. When an automatic variable goes out of scope, the space that variable uses to hold its values is deallocated by code generated by the compiler.

If the automatic variable that the compiler deallocates holds the only (last?) pointer to a dynamic variable, then, when the compiler deallocates it, the programmer will lose all access to the space in memory associated with the anonymous variable and, therefore,

  1. have no way to use it and, more importantly,
  2. have no way to release its space back to the operating system.

In other words, the programmer has created a memory leak!

Plug The Leak

Memory leaks are relatively easy to fix but that's only if you can find them! And finding them is sometimes nearly impossible!

Enter valgrind, an amazing tool that you can use to help you find the source of memory leaks.

valgrind is incredibly powerful and easy to use.

For the remainder of this edition of the C++ Times, we will assume that entering

$ ./liddy

will execute the program that we wrote above and that we have entered all the code for the program in a file named liddy.cpp.

For example,

$ ./liddy
I made space!
I made (more) space!

The program looks like it runs okay, but we suspect a massive memory leak. Let's use valgrind to confirm. Running valgrind could not be easier:

$ valgrind ./liddy
==2989== Memcheck, a memory error detector
==2989== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2989== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==2989== Command: ./liddy
==2989== 
I made space!
I made (more) space!


==2989== 
==2989== HEAP SUMMARY:
==2989==     in use at exit: 48 bytes in 2 blocks
==2989==   total heap usage: 4 allocs, 2 frees, 73,776 bytes allocated
==2989== 
==2989== LEAK SUMMARY:
==2989==    definitely lost: 48 bytes in 2 blocks
==2989==    indirectly lost: 0 bytes in 0 blocks
==2989==      possibly lost: 0 bytes in 0 blocks
==2989==    still reachable: 0 bytes in 0 blocks
==2989==         suppressed: 0 bytes in 0 blocks
==2989== Rerun with --leak-check=full to see details of leaked memory
==2989== 
==2989== For lists of detected and suppressed errors, rerun with: -s
==2989== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

valgrind is loquacious but we can cut through the extraneous information and find the most important piece of that output:

==2989==    definitely lost: 48 bytes in 2 blocks

This output confirms our suspicion that we have a memory leak!

Note: We leak 48 bytes because we call liddy twice. The first time we call it with a size argument of 5 -- which ultimately allocates 20 bytes (four bytes for each of the 5 integers in the array that liddy allocates). The second time we call it with a size argument of 7 -- which ultimately allocates 28 bytes (four bytes for each of the 7 integers in the array that liddy allocates).

Knowing Is Half The Battle

In a program larger than the one that we are debugging here, it is possible that the spot of the allocation of the memory being leaked is hard to find (more code gives more places for errors to hide out!).

But, don't worry, valgrind can (in cases where our programs contain so-called debugging symbols) tell us where the leaking allocation occurs. To have valgrind tell us that information, we use a slightly more complicated invocation:

$ valgrind --leak-check=full ./liddy

When we run that command, we see

$ valgrind --leak-check=full ./liddy
==3013== Memcheck, a memory error detector
==3013== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3013== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==3013== Command: ./liddy
==3013== 
I made space!
I made (more) space!
==3013== 
==3013== HEAP SUMMARY:
==3013==     in use at exit: 48 bytes in 2 blocks
==3013==   total heap usage: 4 allocs, 2 frees, 73,776 bytes allocated
==3013== 
==3013== 20 bytes in 1 blocks are definitely lost in loss record 1 of 2
==3013==    at 0x48462F3: operator new[](unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3013==    by 0x1091FE: liddy(int) (liddy.cpp:4)
==3013==    by 0x10921E: main (liddy.cpp:9)
==3013== 
==3013== 28 bytes in 1 blocks are definitely lost in loss record 2 of 2
==3013==    at 0x48462F3: operator new[](unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3013==    by 0x1091FE: liddy(int) (liddy.cpp:4)
==3013==    by 0x109245: main (liddy.cpp:13)
==3013== 
==3013== LEAK SUMMARY:
==3013==    definitely lost: 48 bytes in 2 blocks
==3013==    indirectly lost: 0 bytes in 0 blocks
==3013==      possibly lost: 0 bytes in 0 blocks
==3013==    still reachable: 0 bytes in 0 blocks
==3013==         suppressed: 0 bytes in 0 blocks
==3013== 
==3013== For lists of detected and suppressed errors, rerun with: -s
==3013== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

Again, there is one section of output more important than the others:

==3013== 20 bytes in 1 blocks are definitely lost in loss record 1 of 2
==3013==    at 0x48462F3: operator new[](unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3013==    by 0x1091FE: liddy(int) (liddy.cpp:4)
==3013==    by 0x10921E: main (liddy.cpp:9)

We interpret this output as if it were an outline of the runtime stack (recall: stack frames [a.k.a. activation records]) at the moment the leaking allocation occurs:

  1. main is the function that started executing first and it is paused at line 9 waiting for a call to
  2. liddy to complete but liddy is paused at line 4

We can interpret that as a treasure map where x marks the spot, in the source code, of the allocation that is being leaked:

    int *dynamic_integer_array{new int[size]};

from the body of the liddy function! Wow, how cool!?

Doing Is The Other Half

Now that we have found the allocation that is the source of the leak, let's clean it up. Even though the example code in main does not actually do anything with the memory allocated by the liddy function, it could. In other words, the engineering of the liddy function is not at fault. In fact, you will often see this idiom: A function dynamically allocates some space during the course of its computation to hold the results and returns a pointer to that space to the caller as a means of returning the result. Obviously, therefore, we cannot release the memory allocated dynamically in the function itself. If we cleaned up that allocation before we left the function that allocated it, it would be completely useless to the caller. Plus, we might fall victim to a use-after-free error -- a serious bug and potential security vulnerability.

It's fairly reasonable to assume that we are done using the memory allocated by the first call to liddy at the time that we make the second call to liddy. We are, after all, intentionally overwriting the value of the pointer to that memory. So, let's add some code to clean up the memory allocated in the first liddy call right before the second liddy call:

int main() {
    auto *hunt{liddy(5)};
    std::cout << "I made space!\n";

    delete[] hunt;

    hunt = liddy(7);
    std::cout << "I made (more) space!\n";

    return 0;
}

The delete[] syntax deallocates the memory dynamically allocated for an array. Note that caveat -- for an array. The [] after the delete keyword is required when you are freeing space previously dynamically allocated for an array. It is omitted when you are freeing space allocated for anything else (even a std::vector).

To keep up the bug hunting, we certainly know that the memory from the second allocation is no longer needed at the time that the program terminates. So, let's free the memory allocated in the second call to liddy right before the program ends:

int main() {
    auto *hunt{liddy(5)};
    std::cout << "I made space!\n";

    delete[] hunt;

    hunt = liddy(7);
    std::cout << "I made (more) space!\n";

    delete[] hunt;
    return 0;
}

To confirm that we successfully plugged the leak, let's run valgrind again and check its output:

$ valgrind ./liddy
==2835== Memcheck, a memory error detector
==2835== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2835== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==2835== Command: ./a.out
==2835== 
I made space!
I made (more) space!
==2835== 
==2835== HEAP SUMMARY:
==2835==     in use at exit: 0 bytes in 0 blocks
==2835==   total heap usage: 4 allocs, 4 frees, 73,776 bytes allocated
==2835== 
==2835== All heap blocks were freed -- no leaks are possible
==2835== 
==2835== For lists of detected and suppressed errors, rerun with: -s
==2835== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Now. That. Is. Cool!

BONUS: Detecting That Nasty Use-After-Free

Okay, true confession: I am not always the most careful programmer. Sometimes I commit the dreaded use-after-free error. Let's say that I make such a mistake in the application we are writing. What would that error look like? Something like this:

int main() {
    auto *hunt{liddy(5)};
    std::cout << "I made space!\n";

    delete[] hunt;

    std::cout << "The value of the 0th element in the hunt array is " << hunt[0] << ".\n";

    hunt = liddy(7);
    std::cout << "I made (more) space!\n";

    delete[] hunt;
    return 0;
}

See it? I am accessing the memory allocated by the first call to the liddy function after I have released it! That's not good!

Yes, it is very hard to spot these types of errors. Fortunately, valgrind helps us in these cases, too:

$ valgrind ./liddy
==3014== Memcheck, a memory error detector
==3014== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3014== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==3014== Command: ./a.out
==3014== 
I made space!
==3014== Invalid read of size 4
==3014==    at 0x1092AF: main (liddy.cpp:14)
==3014==  Address 0x4db4c80 is 0 bytes inside a block of size 20 free'd
==3014==    at 0x4848A8F: operator delete[](void*) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3014==    by 0x10928E: main (liddy.cpp:13)
==3014==  Block was alloc'd at
==3014==    at 0x48462F3: operator new[](unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3014==    by 0x10923E: liddy(int) (liddy.cpp:4)
==3014==    by 0x10925E: main (liddy.cpp:9)
==3014== 
The value of the 0th element of hunt is 0
I made (more) space!
==3014== 
==3014== HEAP SUMMARY:
==3014==     in use at exit: 0 bytes in 0 blocks
==3014==   total heap usage: 4 allocs, 4 frees, 73,776 bytes allocated
==3014== 
==3014== All heap blocks were freed -- no leaks are possible
==3014== 
==3014== For lists of detected and suppressed errors, rerun with: -s
==3014== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

As usual with valgrind, one bit of the output is more important than the rest:

==3014== Invalid read of size 4
==3014==    at 0x1092AF: main (liddy.cpp:14)
==3014==  Address 0x4db4c80 is 0 bytes inside a block of size 20 free'd
==3014==    at 0x4848A8F: operator delete[](void*) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3014==    by 0x10928E: main (liddy.cpp:13)
==3014==  Block was alloc'd at
==3014==    at 0x48462F3: operator new[](unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3014==    by 0x10923E: liddy(int) (liddy.cpp:4)
==3014==    by 0x10925E: main (liddy.cpp:9)

Would you look at that? It tells us exactly what we did wrong and where we did it!! Wow. Talk about a super tool!