C++: Deciphering Pointer Declarations
July 30, 2008
The right-left rule is a simple rule that allows you to interpret any declaration. It runs as follows:
Start reading the declaration from the innermost parentheses, go right, and then go left. When you encounter parentheses, the direction should be reversed. Once everything in the parentheses has been parsed, jump out of it. Continue till the whole declaration has been parsed.
One small change to the right-left rule: When you start reading the declaration for the first time, you have to start from the identifier, and not the innermost parentheses.
Take the example given in the introduction:
int * (* (*fp1) (int) ) [10];
This can be interpreted as follows:
- Start from the variable name ————————–
fp1 - Nothing to right but
)so go left to find*————– is a pointer - Jump out of parentheses and encounter (
int) ——— to a function that takes anintas argument - Go left, find
*—————————————- and returns a pointer - Jump put of parentheses, go right and hit
[10]——– to an array of 10 - Go left find
*—————————————– pointers to - Go left again, find
int——————————–ints.
Here’s another example:
int *( *( *arr[5])())();
- Start from the variable name ———————
arr - Go right, find array subscript ——————— is an array of 5
- Go left, find
*———————————– pointers - Jump out of parentheses, go right to find
()—— to functions - Go left, encounter
*—————————– that return pointers - Jump out, go right, find
()———————– to functions - Go left, find
*———————————– that return pointers - Continue left, find
*—————————– toints.
Continued reading at: http://www.codeproject.com/KB/cpp/complex_declarations.aspx#right_left_rule
More: http://www.codeproject.com/KB/cpp/PointerArticle.aspx#11
Network Archtectural Model
July 25, 2008
The TCP/IP model is basically a shorter version of the OSI model. It consists of four instead of seven layers. Despite their architectural differences, both models have interchangeable transport and network layers and their operation is based upon packet-switched technology. The diagram below indicates the differences between the two models:
- Application Layer: The Application layer deals with representation, encoding and dialog control issues. All these issues are combined together and form a single layer in the TCP/IP model whereas three distinctive layers are defined in the OSI model.
- Host-to-Host: Host-to-Host protocol in the TCP/IP model provides more or less the same services with its equivalent Transport protocol in the OSI model. Its responsibilities include application data segmentation, transmission reliability, flow and error control.
- Internet: Again Internet layer in TCP/IP model provides the same services as the OSIs Network layer. Their purpose is to route packets to their destination independent of the path taken.
- Network Access: The network access layer deals with all the physical issues concerning data termination on network media. It includes all the concepts of the data link and physical layers of the OSI model for both LAN and WAN media.
For OSI Model:
Application Layer
- Serves as a window for applications to access network services.
- Handles general network access, flow control and error recovery.
Presentation Layer
- Determines the format used to exchange data among the networked computers.
- Translates data from a format from the Application layer into an intermediate format.
- Responsible for protocol conversion, data translation, data encryption, data compression, character conversion, and graphics expansion.
- Redirector operates at this level.
Session Layer
- Allows two applications running on different computers to establish use and end a connection called a Session.
- Performs name recognition and security.
- Provides synchronization by placing checkpoints in the data stream.
- Implements dialog control between communicating processes.
Transport Layer
- Responsible for packet creation.
- Provides an additional connection level beneath the Session layer.
- Ensures that packets are delivered error free, in sequence with no losses or duplications.
- Unpacks, reassembles and sends receipt of messages at the receiving end.
- Provides flow control, error handling, and solves transmission problems.
Network Layer
- Responsible for addressing messages and translating logical addresses and names into physical addresses.
- Determines the route from the source to the destination computer.
- Manages traffic such as packet switching, routing and controlling the congestion of data.
Data Link Layer
- Sends data frames from the Network layer to the Physical layer.
- Packages raw bits into frames for the Network layer at the receiving end.
- Responsible for providing error free transmission of frames through the Physical layer.
Physical Layer
- Transmits the unstructured raw bit stream over a physical medium.
- Relates the electrical, optical mechanical and functional interfaces to the cable.
- Defines how the cable is attached to the network adapter card.
- Defines data encoding and bit synchronization.
Nagle’s algorithm
July 24, 2008
Nagle’s algorithm, named after John Nagle, is a means of improving the efficiency of TCP/IP networks by reducing the number of packets that need to be sent over the network.
Nagle’s document, Congestion Control in IP/TCP Internetworks (RFC896) describes what he called the ’small packet problem’, where an application repeatedly emits data in small chunks, frequently only 1 byte in size. Since TCP packets have a 40 byte header (20 bytes for TCP, 20 bytes for IPv4), this results in a 41 byte packet for 1 byte of useful information, a huge overhead. This situation often occurs in Telnet sessions, where most keypresses generate a single byte of data which is transmitted immediately. Worse, over slow links, many such packets can be in transit at the same time, potentially leading to congestion collapse.
Nagle’s algorithm works by coalescing a number of small outgoing messages, and sending them all at once. Specifically, as long as there is a sent packet for which the sender has received no acknowledgment, the sender should keep buffering its output until it has a full packet’s worth of output, so that output can be sent all at once.
Algorithm
if there is new data to send
if the window size >= MSS and available data is >= MSS
send complete MSS segment now
else
if there is unconfirmed data still in the pipe
enqueue data in the buffer until an acknowledge is received
else
send data immediately
end if
end if
end if
where MSS = Maximum segment size.
This algorithm interacts badly with TCP delayed acknowledgments, a feature introduced into TCP at roughly the same time in the early 1980s, but by a different group. With both algorithms enabled, applications which do two successive writes to a TCP connection, followed by a read, experience a constant delay of up to 500 milliseconds, the “ACK delay”. For this reason, TCP implementations usually provide applications with an interface to disable the Nagle algorithm. This is typically called the TCP_NODELAY option. The first major application to run into this problem was the X Window System.
The tinygram problem and silly window syndrome are sometimes confused. The tinygram problem occurs when the window is almost empty. Silly window syndrome occurs when the window is almost full.
(From Wikipedia)
C++’s Memory Areas
June 30, 2008
The following table summarizes a C++ program’s major distinct memory areas. Note that some of the names (for example, “heap”) do not appear as such in the standard; in particular, “heap” and “free store” are common and convenient shorthands for distinguishing between two kinds of dynamically allocated memory.
| Memory Area | Characteristics and Object Lifetimes |
|---|---|
| Const Data | The const data area stores string literals and other data whose values are known at compile-time. No objects of class type can exist in this area. All data in this area is available during the entire lifetime of the program. Further, all this data is read-only, and the results of trying to modify it are undefined. This is in part because even the underlying storage format is subject to arbitrary optimization by the implementation. For example, a particular compiler may choose to store string literals in overlapping objects as an optional optimization. |
| Stack | The stack stores automatic variables. Objects are constructed immediately at the point of definition and destroyed immediately at the end of the same scope, so there is no opportunity for programmers to directly manipulate allocated but uninitialized stack space (barring willful tampering using explicit destructors and placement new). Stack memory allocation is typically much faster than for dynamic storage (heap or free store) because each stack memory allocation involves only a stack pointer increment rather than more-complex management. |
| Free Store | The free store is one of the two dynamic memory areas allocated/freed by new/delete. Object lifetime can be less than the time the storage is allocated. That is, free store objects can have memory allocated, without being immediately initialized, and they can be destroyed, without the memory being immediately deallocated. During the period when the storage is allocated but outside the object’s lifetime, the storage may be accessed and manipulated through a void*, but none of the proto-object’s nonstatic members or member functions may be accessed, have their addresses taken, or be otherwise manipulated. |
| Heap | The heap is the other dynamic memory area allocated/freed by malloc()/free() and their variants.
Note that while the default global operators new and delete might be implemented in terms of malloc() and free() by a particular compiler, the heap is not the same as free store, and memory allocated in one area cannot be safely deallocated in the other. Memory allocated from the heap can be used for objects of class type by placement new construction and explicit destruction. If so used, the notes about free store object lifetime apply similarly here. |
| Global/Static | Global or static variables and objects have their storage allocated at program startup, but may not be initialized until after the program has begun executing. For instance, a static variable in a function is initialized only the first time program execution passes through its definition. The order of initialization of global variables across translation units is not defined, and special care is needed to manage dependencies between global objects (including class statics). As always, uninitialized proto-objects’ storage may be accessed and manipulated through a void*, but no nonstatic members or member functions may be used or referenced outside the object’s actual lifetime. |
It’s important to distinguish between the “heap” and the “free store,” because the standard deliberately leaves unspecified the question of whether these two areas are related. For example, when memory is deallocated via ::operator delete(), the final note in section 18.4.1.1 of the C++ standard states:
“It is unspecified under what conditions part or all of such reclaimed storage is allocated by a subsequent call to operator new or any of calloc, malloc, or realloc, declared in <cstdlib>.”
UTF–8 and Extended characters
June 12, 2008
|
Character Range (hex) |
Unicode (UCS-2/UTF-16) |
UTF-8 |
|
0-7F |
00000000 0xxxxxxx |
0xxxxxxx |
|
80-7FF |
00000xxx xxxxxxxx |
110xxxxx 10xxxxxx |
|
800-FFFF |
xxxxxxxx xxxxxxxx |
1110xxxx 10xxxxxx 10xxxxxx |
|
10000-1FFFFF |
- out of range - |
11110xxx 10xxxxxx 10xxxxxx |
|
200000-3FFFFFF |
- out of range - |
111110xx 10xxxxxx 10xxxxxx |
|
4000000-7FFFFFFF |
- out of range - |
1111110x 10xxxxxx 10xxxxxx |
Note that all bytes of multi-byte UTF-8 characters have the high-bit set to one, and only the first byte of a multi-byte character has both its highest bits set. This means there can never be confusion about where a character starts. So in UTF-8, the combined Greek and Latin sequence aβcδe is represented by the following seven bytes, and looking at the high bits you can pick out the extended characters without too much trouble:
01100001 11001110 10110010 01100011 11001110 10110100 01100101
Now the really clever bit about UTF-8 is that it is capable of passing unharmed through ASCII only systems [programs which don’t even recognize UTF-8], thanks to the fact that each character beyond U+007F looks like a valid sequence of extended ASCII when read as a byte-per-character. This is in stark contrast to other Unicode encodings such as UCS-2, which are full of zero bytes and therefore wreak havoc with ASCII processing systems. To an ASCII system, the UTF-8 representation of aβcδe parses as aβcδe . On the surface this may seem like a corruption, but the important thing to note is that no illegal ASCII characters appear in a UTF-8 bytestream, and so the same string can be read and written out again as raw ASCII and then decoded later as the original UTF-8. With the exception of 7-bit text systems [a legacy email standard, unfortunately, for which the hideous UTF-7 had to be invented] UTF-8 should be able to pass through ASCII systems unscathed.
Exception safety
June 9, 2008
Never allow an exception to escape from a destructor or from an overloaded operator delete() or operator delete[](); write every destructor and deallocation as though it had an exception specification of “throw()”
In each function, take all the code that might emit an exception and do all the work safely off to the side. Only then, when you know that the real work has succeeded, should you modifiy the program state (and clean up) using only non-throwing operations.
C++: RAII with auto_ptr and shared_ptr
May 16, 2008
RAII: Resource Aquisition Is Initialization - The technique combines acquisition and release of resources with initialization and uninitialization of objects.
auto_ptr: is a pointer-like object (a smart pointer), whose destructor automatically calls delete on what it points to. It’s important that there never be more than one aut_ptr pointing to an object because an auto_ptr automatically delete what it points to when the auto_ptr is destroyed.
std::auto_ptr<Student>p1(new Student); //p1 points to an Student object
std::auto_ptr<Student>p2(p1); //p2 now points to the obj, p1 is now null
p1 = p2 //p1 now points to the obj, p2 is null
shared_ptr: is a reference-counting smart pointer that keeps tracks of how many objects point to a particular resource and automatically deletes the resource when nobody is pointing to it any longer. (Like a garbage collection except that such pointers can’t break cycles of references, e.i two otherwise unused objects that point to one another)
std::tr1::shared_ptr<Student>p1(new Student); //p1 points to a Student object
std::tr1::shared_ptr<Student>p2(p1); //both p1 and p2 point to the object
p1 = p2; //same
Both auto_ptr and tr1::shared_ptr use delete in their destructors, not delete[]. Therefore, they should not be used with dynamically allocated arrays:
std::auto_ptr<std::string> aps(new std::string[10]); //bad
std::tr1::shared_ptr<int>spi(new int[1024]); //bad
C++ placement new
May 15, 2008
Operator new allocates memory from the heap, on which an object is constructed. Standard C++ also supports placement new operator, which constructs an object on a pre-allocated buffer. This is useful when building a memory pool, a garbage collector or simply when performance and exception safety are paramount (there’s no danger of allocation failure since the memory has already been allocated, and constructing an object on a pre-allocated buffer takes less time):
char *buf = new char[1000]; //pre-allocated buffer
string *p = new (buf) string(”hi”); //placement new
string *q = new string(”hi”); //ordinary heap allocation
C++ new cast forms
May 13, 2008
C++ offers four new cast forms (often called new-style or C++ style casts)
1. const_cast<T>(expr): used to case away the constness of objects. It is the only C++ style cast than can do this.
2. dynamic_cast<T>(expr): used to perform “safe downcasting”, i.e., to determine whether an object is of a particular type in an inheritance hierarchy. It is the only cast that cannot be performed using the old-style syntax. It is also the only case that may have a significant runtime cost.
3. reinterpret_cast<T>(expr): used for low-level casts that yield implementation-dependent (i.e unportable) results, e.g., casting a pointer to an int. Such casts should be rare outside love-level code.
4. static_cast<T>(expr): used to force implicit conversions (e.g. non-const object to const object, int to double, etc). It can also be used to perform the reverse of many such conversions (e.g. void* pointers to typed pointers, pointer-to-base to pointer-derived), though it cannot cast from const to non-const objects. (Only const_cast can do that)
Prefer C++ style casts to old-style casts because they are easier to see, and they are more specific about what they do
C/C++ Const-ness
May 13, 2008
const char* : non-const pointer to a const value
char const*: same as const char*
char* const: const pointer to a non-const value
const char* const: const pointer to a const value
