Vous êtes sur la page 1sur 8

Advanced Memory Management:

Dynamic Allocation, Part 1


By Andrei Milea

malloc and free, new and delete


Dynamic allocation is one of the three ways of using memory provided by the C/C++ standard. To
accomplish this in C the malloc function is used and the new keyword is used for C++. Both of them
perform an allocation of a contiguous block of memory, malloc taking the size as parameter:
int *data = new int;
int *data = (int*) malloc(sizeof(int)); //notice the use of sizeof for
portability

This memory block can be used whenever needed during the program execution or until explicitly
deallocating it, unlike the automatic memory which is available only inside the function or block of
instructions where it was declared. Allowing a program to allocate dynamic storage every time it
needs more until the program stops can cause it eventually to run out of available space. To prevent
this behavior C++ provides the delete operator with the job of recycling a segment of memory
allocated with new:
delete (data);

The C function for memory deallocation is free() and has the same behavior as delete, it frees the
space pointed by data for future use:
free(data);

If the allocated memory is not freed when it's no longer necessary it will result in a memory leak. It is
not specified what will happen to the leaked memory, but contemporary operating systems collect it
when the program terminates. Memory leaks can be very dangerous because the system may run out
of memory. To eliminate them, C++ provides a destructor member for every class where the
programmer should deallocate all the memory it allocates inside the class. In other languages like
Java or C# a garbage collector is used that figures out which memory blocks are no longer needed
and deletes them, taking the burden of deallocation from the programmer's shoulders, but adding
some overhead in runtime. In C++, you can use smart pointers that hold on to a piece of memory and
deallocate that memory in their destructors.

Even though malloc and free are available in C++, their use is not recommended; it is always
preferred to use new and delete, especially when working with objects. Also, notice the cast to int in
the malloc allocation. This is not required in C because the C standard allows implicit cast between
void *, which is the type returned by malloc, and other pointer types. In fact, in C, casting malloc is
considered undesirable. But if we want to use malloc in C++ code we must explicitly cast the pointer
returned by malloc to the appropriate type. Actually the use of void* pointers in C++ is not
recommended because it can breakmultiple inheritance, that is, in a multiple inheritance hierarchy
some of the classes can view different values of the this pointer (the original value with some offset)
and casting to void* can break the protection mechanisms of C++. If a void* cast is needed in this
situation it is recommended to use a dynamic_cast orstatic_cast as soon as possible, to adjust the this
pointer automatically.

The backward portability with malloc and free is useful in C++ for supporting legacy C code and to
allow implementing (overloading) the new and delete operators using calls to malloc/free. Because the
operator new does more than just allocating memory (it also calls the object's constructor), it is not
allowed to use free for data allocated with new or vice versa (delete with malloc).

The advantages of using new and delete over their older relatives malloc/free are the following:

• new and delete point to the correct memory type (they are type safe), so casts are not
necessary.
• new invokes the constructor, allowing the initialization of objects, and delete invokes the
destructor.
• new figures out the size it needs to allocate, so the programmer doesn't have to specify it.
• new throws an exception of type std::bad_alloc when it fails, instead of only returning a NULL
pointer like malloc, so the programmer doesn't have to write the deprecated check

• if(NULL==data) error();

• You can specialize the behavior of new by overloading it for your specific class (this will be
discussed further in a following article) or even replacing the global one (this can generate
problems, however, because someone might rely on the default behavior).

In some cases throwing an exception is not desirable (probably for working with legacy codebases that
do not expect or handle exceptions, and also perhaps to avoid the overhead of supporting exceptions).
For this situation the standard provides an exception-free version of new and new[]:
void* operator new(std::size_t size, const std::nothrow_t&) throw();
void* operator new[](std::size_t size, const std::nothrow_t&) throw();

Dynamic memory can be used not only to store data for your application; you can use it for functions
too. A pointer that points to a function (function pointer) can be declared like this:
int (*f)(int,int); //this is a pointer to a function taking 2 ints as arguments
and returning one
f = &pow; //now function pow can be called using the pointer f

Even if in C++ you have other means to avoid the using of function pointers, the new operator can be
used to allocate memory for a pointer to a function:
int (**f)(int,int) = new (int (*) (int,int));

The above line can be quite confusing for people not used with function pointers. In order to get a
better view on them, see the function pointers tutorial.

Dynamic arrays
Creating a dynamic object is different than creating an array of objects and C++ handles the two
situations differently. The new and delete operators are used for creation of dynamic instances of
classes or built-in types, while new[] and delete[] create and destroy dynamic arrays.
Myclass *my_class;
my_class = new Myclass [size]; //size must be of type int
//do work with my_class
delete [] my_class;

The call to new [] in the above example allocates memory for the entire array and then calls the
default constructor for every object in the array in an increasing order. The returned value is a pointer
to the beginning of the allocated storage, the first element in the array. In the end, the delete []
operator calls the destructor for each object in reversed order and then deallocates the memory.

There are two limitations in creating a dynamic array of objects. One of them is that you can't create a
multidimensional array explicitly, like for automatic (stack-based) arrays:
Myclass **my_class;
my_class = new Myclass[size1][size2]; //this will yield a compilation error

It is a consequence of the way the memory is allocated, in fact the storage for a multidimensional
array is not contiguous; each element contains a pointer to another array.

In order to obtain a 2-dimensional array you can do something similar with this:
my_class = new Myclass* [size1]; //note that the type is a Myclass
pointer
for(int i=0; i

To deleted the allocated memory you must go through the first array and apply delete[] to every
element and then delete[] the main one:
for(int i=0; i

The above methods can easily be generalized to obtain and destroy an n-dimensional array.

The other important restriction is that the explicit initialization is banned, so you can't
pass parameters when building an array of objects with new; only the default
constructor gets called.

Dynamic Memory Allocation and


Virtual Memory
by Andrei Milea

Virtual Memory - looking "Under the Hood"


Every application running on your operating system has its unique address space, which it sees as a
continuous block of memory. In fact the memory is not physically continuous (it is fragmented), this is
just the impression the operating system gives to every program and it's called virtual memory. The
size of the virtual memory is the maximum size of the maximum size your computer can address
using pointers (usually on a 32-bit processor each process can address 4 GB of memory). The natural
question that arises is what happens when a process wants to access more memory than your
machine physically has available as RAM? Due to having a virtual address space, parts of the hard disk
can be mapped together with real memory and the process doesn't have to know anything about
whether the address is physically stored in RAM or on the hard disk. The operating system maintains a
table, where virtual addresses are mapped with their correspondent physical addresses, which is used
whenever a request is made to read or write to a memory address.

Typically, in each process, the virtual memory available to that process is called its address space.
Each process's address space is typically organized in 6 sections that are illustrated in the next
picture: environment section - used to store environment variables and command line arguments; the
stack, used to store memory for function arguments, return values, and automatic variables; the heap
(free store) used for dynamic allocation, two data sections (for initialized and uninitialized static and
global variables) and a text section where the actual code is kept.

The Heap
To understand why the dynamic memory allocation is time consuming let's take a closer look at what
is actually happening. The memory area where new gets its blocks of memory for allocation (usually
called free store or heap) is illustrated in the following picture:
When new is invoked, it starts looking for a free memory block that fits the size for your request.
Supposing that such a block of memory is found, it is marked as reserved and a pointer to that
location is returned. There are several algorithms to accomplish this because a compromise has to be
made between scanning the whole memory for finding the smallest free block bigger than the size of
your object, or returning the first one where the memory needed fits. In order to improve the speed of
getting a block of memory, the free and reserved areas of memory are maintained in a data structure
similar to binary trees called a heap. The various algorithms for finding free memory are beyond the
scope of this article and you can find a thorough discussion about them in D. Knuth's monographThe
Art of Computer Programming -- Vol.1, Fundamental Algorithms). This overhead combined with the
risk for memory leaks makes the use of automatic memory (allocated on the stack) preferred
whenever possible and the allocation is not large.

How much Virtual Memory do you get


Even though every application has its own 4 GB (on 32-bit systems) of virtual memory, that does not
necessarily mean that your program can actually use all of that memory. For example, on Windows,
the upper 2 GB of that memory are allocated to the operating system kernel, and are unavailable to
the process. (Therefore, any pointer starting with 0x8xxxxxxx is unavailable in user space.) On Linux,
the upper 1 GB is kernel address space. Typically, operating systems provide means for changing
these defaults (such as the /3GB switch on Windows. It is rare, however, that you really want or need
to do so.
Address Space Fragmentation
Another concern with memory allocation is that if you allocate memory in non-contiguous blocks, over
time "holes" will develop. For example, if you allocate 10 KB and it is taken from the middle of a 20
MB chunk of memory, then you can no longer allocate that 20 MB a one chunk of memory. Doing this
enough times will cause you to no longer be able to allocate 20 MB at once. This can cause allocation
failures even when there is free memory. Note that this is true even with virtual memory because
what matters is that you need a continuous block of addresses, not a continuous block of physical
memory.

One way to address this problem is to avoid doing things that have problems due to fragmentation,
such as avoiding large allocations--anything more than a few tens of MB is certainly asking for trouble.
Second, many heap implementations help you with this already by allocating a large chunk of virtual
address space and carving it up for you (usually the heap allocates address space from the operating
system and then provides smaller chunks when requested). But if you know that you will have a class
that has a lot of small instances, you could overload operator new and preallocate a large continuous
chunk of memory, splitting off small pieces for each class from that chunk.

Why Customize Memory Allocation by


Overloading New and Delete?
At times, you will have classes for which you want to specialize memory allocation. Why? You know
something about how the class is used. For instance, you might specialize memory allocation for a
class in order to squeeze some extra performance out of your program. Suppose you have a linked
list and you want to speed up the allocation of new nodes. One way to do this is to maintain a list of
deleted nodes, whose memory can be reused when new nodes are allocated. Instead of using the
default new and delete, new will be overloaded to try to get a node from the list of deleted nodes;
only if no deleted nodes are available would it dynamically allocate memory. Delete will simply add the
node to the deleted nodes. This way instead of allocating new memory from the heap (which is pretty
time consuming) we will use the already allocated space. The technique is usually called caching.

Another useful reimplementation of new/delete operators is to provide garbage collection for your
objects. That is, you can implement a garbage collector similar to the one that Java or C# uses so
your objects will be deleted automatically when they are no longer used. (You might wonder why you
need to provide your own allocator to write a garbage collector.

One more example usage is creating an arena allocator that makes memory allocations and
deallocations lightning fast at the cost of temporarily holding on to more memory than necessary by
allocating a large block up front and then carving out one piece at a time.

In order to specialize allocation, you overload operator new and operator delete. (For more on
operator overloading, see introduction to operator overloading.) Operator new is used to perform all
memory allocation when the new keyword is used, and operator delete is used to deallocate that
memory when delete is used. As with the rest of the operators, new and delete can be overloaded to
specialize them for a specific class. But first, let's clarify the exact distinction between operator new
and the new keyword.
The relationship between Operator New and the New Keyword
Don't be confused by the fact that there is both a new keyword and an operator new. When you
write:

MyClass *x = new MyClass;

there are actually two things that happen--memory allocation and object construction; the new
keyword is responsible for both. One step in the process is to call operator new in order to allocate
memory; the other step is to actually invoke the constructor. Operator new lets you change the
memory allocation method, but does not have any responsibility for calling the constructor. That's the
job of the new keyword.

As it turns out, it is actually possible to invoke the constructor without calling operator new. That's the
job of placement new (covered below).

Changing the default behavior of new and


delete
Here's an example of what it would look like to overload new and delete for a particular class.

class Myclass
{
public:
void* operator new(size_t);
void operator delete(void*);
};

Both of them are by default static members and do not maintain a this pointer. Overloading can be
used for many purposes. For example, we may need to alter the exception thrown in case of failure--
std::bad_alloc--and throw something else:

void* Myclass::operator new(size_t size)


{
void *storage = malloc(size);
if(NULL == storage) {
throw "allocation fail : no free memory";
}
}

Usually we do this in a base class in order to have the functionality of the overloaded new in all
derived classes. Implicitly the definition of new is included in every file, but in order to use the size_t
declaration and other new related types you must include the header <new> .

The new operator can be implemented using malloc or another C function, called realloc, that handles
memory allocation:

void * realloc ( void * ptr, size_t size );

This can be used to change the size of the allocated block at address ptr with the size given in the
second parameter. The address pointed to by ptr can actually be changed and the block moved
someplace else, in which case the new address will be the returned value. If realloc fails, like malloc, it
returns NULL. But realloc will not free the original memory if your memory allocation request fails.
Therefore, when you use realloc, be sure to save the previous pointer value in case allocation fails, so
that you do not leak memory.

Note that in general, if you overload new, you will likely need to overload delete, and vice versa,
because by changing how you allocate memory, you will typically also change how you free memory.

Operator new is invoked implicitly when new is called; there is no syntax for calling operator new
explicitly.

Placement New
Standard C++ also supports a second version of new, called placement new, which constructs an
object on a preallocated storage. In order for this to work we must provide the address where we
want the object to be allocated as a pointer parameter:

(my_class = new (place) Myclass);

So why would you want to use placement new? Placement new is useful for constructing objects in a
pre-allocated block of memory. This bypasses the work of operator new by allowing the person
constructing the object to choose the memory that it is initialized into. You might do this if you have a
pool of memory you want to use for constructing some objects of a class, but don't want to overload
operator new for the whole class.

Vous aimerez peut-être aussi