C++’s Memory Areas
June 30, 2008
The following table summarizes a C++ program’s major distinct memory areas. Note that some of the names (for example, “heap”) do not appear as such in the standard; in particular, “heap” and “free store” are common and convenient shorthands for distinguishing between two kinds of dynamically allocated memory.
| Memory Area | Characteristics and Object Lifetimes |
|---|---|
| Const Data | The const data area stores string literals and other data whose values are known at compile-time. No objects of class type can exist in this area. All data in this area is available during the entire lifetime of the program. Further, all this data is read-only, and the results of trying to modify it are undefined. This is in part because even the underlying storage format is subject to arbitrary optimization by the implementation. For example, a particular compiler may choose to store string literals in overlapping objects as an optional optimization. |
| Stack | The stack stores automatic variables. Objects are constructed immediately at the point of definition and destroyed immediately at the end of the same scope, so there is no opportunity for programmers to directly manipulate allocated but uninitialized stack space (barring willful tampering using explicit destructors and placement new). Stack memory allocation is typically much faster than for dynamic storage (heap or free store) because each stack memory allocation involves only a stack pointer increment rather than more-complex management. |
| Free Store | The free store is one of the two dynamic memory areas allocated/freed by new/delete. Object lifetime can be less than the time the storage is allocated. That is, free store objects can have memory allocated, without being immediately initialized, and they can be destroyed, without the memory being immediately deallocated. During the period when the storage is allocated but outside the object’s lifetime, the storage may be accessed and manipulated through a void*, but none of the proto-object’s nonstatic members or member functions may be accessed, have their addresses taken, or be otherwise manipulated. |
| Heap | The heap is the other dynamic memory area allocated/freed by malloc()/free() and their variants.
Note that while the default global operators new and delete might be implemented in terms of malloc() and free() by a particular compiler, the heap is not the same as free store, and memory allocated in one area cannot be safely deallocated in the other. Memory allocated from the heap can be used for objects of class type by placement new construction and explicit destruction. If so used, the notes about free store object lifetime apply similarly here. |
| Global/Static | Global or static variables and objects have their storage allocated at program startup, but may not be initialized until after the program has begun executing. For instance, a static variable in a function is initialized only the first time program execution passes through its definition. The order of initialization of global variables across translation units is not defined, and special care is needed to manage dependencies between global objects (including class statics). As always, uninitialized proto-objects’ storage may be accessed and manipulated through a void*, but no nonstatic members or member functions may be used or referenced outside the object’s actual lifetime. |
It’s important to distinguish between the “heap” and the “free store,” because the standard deliberately leaves unspecified the question of whether these two areas are related. For example, when memory is deallocated via ::operator delete(), the final note in section 18.4.1.1 of the C++ standard states:
“It is unspecified under what conditions part or all of such reclaimed storage is allocated by a subsequent call to operator new or any of calloc, malloc, or realloc, declared in <cstdlib>.”
UTF–8 and Extended characters
June 12, 2008
|
Character Range (hex) |
Unicode (UCS-2/UTF-16) |
UTF-8 |
|
0-7F |
00000000 0xxxxxxx |
0xxxxxxx |
|
80-7FF |
00000xxx xxxxxxxx |
110xxxxx 10xxxxxx |
|
800-FFFF |
xxxxxxxx xxxxxxxx |
1110xxxx 10xxxxxx 10xxxxxx |
|
10000-1FFFFF |
- out of range - |
11110xxx 10xxxxxx 10xxxxxx |
|
200000-3FFFFFF |
- out of range - |
111110xx 10xxxxxx 10xxxxxx |
|
4000000-7FFFFFFF |
- out of range - |
1111110x 10xxxxxx 10xxxxxx |
Note that all bytes of multi-byte UTF-8 characters have the high-bit set to one, and only the first byte of a multi-byte character has both its highest bits set. This means there can never be confusion about where a character starts. So in UTF-8, the combined Greek and Latin sequence aβcδe is represented by the following seven bytes, and looking at the high bits you can pick out the extended characters without too much trouble:
01100001 11001110 10110010 01100011 11001110 10110100 01100101
Now the really clever bit about UTF-8 is that it is capable of passing unharmed through ASCII only systems [programs which don’t even recognize UTF-8], thanks to the fact that each character beyond U+007F looks like a valid sequence of extended ASCII when read as a byte-per-character. This is in stark contrast to other Unicode encodings such as UCS-2, which are full of zero bytes and therefore wreak havoc with ASCII processing systems. To an ASCII system, the UTF-8 representation of aβcδe parses as aβcδe . On the surface this may seem like a corruption, but the important thing to note is that no illegal ASCII characters appear in a UTF-8 bytestream, and so the same string can be read and written out again as raw ASCII and then decoded later as the original UTF-8. With the exception of 7-bit text systems [a legacy email standard, unfortunately, for which the hideous UTF-7 had to be invented] UTF-8 should be able to pass through ASCII systems unscathed.
Exception safety
June 9, 2008
Never allow an exception to escape from a destructor or from an overloaded operator delete() or operator delete[](); write every destructor and deallocation as though it had an exception specification of “throw()”
In each function, take all the code that might emit an exception and do all the work safely off to the side. Only then, when you know that the real work has succeeded, should you modifiy the program state (and clean up) using only non-throwing operations.