MFC Programmer's SourceBook : Thinking in C++
Bruce Eckel's Thinking in C++, 2nd Ed Contents | Prev | Next

Elimination of the definition block

In C, you must always define all the variables at the beginning of a block, after the opening brace. This is not an uncommon requirement in programming languages, and the reason given has always been that it’s “good programming style.” On this point, I have my suspicions. It has always seemed inconvenient to me, as a programmer, to pop back to the beginning of a block every time I need a new variable. I also find code more readable when the variable definition is close to its point of use.

Perhaps these arguments are stylistic. In C++, however, there’s a significant problem in being forced to define all objects at the beginning of a scope. If a constructor exists, it must be called when the object is created. However, if the constructor takes one or more initialization arguments, how do you know you will have that initialization information at the beginning of a scope? In the general programming situation, you won’t. Because C has no concept of private, this separation of definition and initialization is no problem. However, C++ guarantees that when an object is created, it is simultaneously initialized. This ensures you will have no uninitialized objects running around in your system. C doesn’t care; in fact, C encourages this practice by requiring you to define variables at the beginning of a block before you necessarily have the initialization information.

Generally C++ will not allow you to create an object before you have the initialization information for the constructor, so you don’t have to define variables at the beginning of a scope. In fact, the style of the language would seem to encourage the definition of an object as close to its point of use as possible. In C++, any rule that applies to an “object” automatically refers to an object of a built-in type, as well. This means that any class object or variable of a built-in type can also be defined at any point in a scope. It also means that you can wait until you have the information for a variable before defining it, so you can always define and initialize at the same time:

//: C06:DefineInitialize.cpp
// Defining variables anywhere
#include <cstdio>
#include <cstdlib>
#include "../require.h"
using namespace std;

class G {
  int i;
public:
  G(int ii);
};

G::G(int ii) { i = ii; }

int main() {
  #define SZ 100
  char buf[SZ];
  printf("initialization value? ");
  int retval = (int)gets(buf);
  require(retval != 0);
  int x = atoi(buf);
  int y = x + 3;
  G g(y);
} ///:~ 

You can see that buf is defined, then some code is executed, then x is defined and initialized using a function call, then y and g are defined. C, of course, would never allow a variable to be defined anywhere except at the beginning of the scope.

Generally, you should define variables as close to their point of use as possible, and always initialize them when they are defined. (This is a stylistic suggestion for built-in types, where initialization is optional.) This is a safety issue. By reducing the duration of the variable’s availability within the scope, you are reducing the chance it will be misused in some other part of the scope. In addition, readability is improved because the reader doesn’t have to jump back and forth to the beginning of the scope to know the type of a variable.

for loops

In C++, you will often see a for loop counter defined right inside the for expression:

for(int j = 0; j < 100; j++) {
    printf("j = %d\n", j);
}
for(int i = 0; i < 100; i++)
    printf("i = %d\n", i);
The above statements are important special cases, which cause confusion to new C++ programmers.

The variables i and j are defined directly inside the for expression (which you cannot do in C). They are then available for use in the for loop. It’s a very convenient syntax because the context removes all question about the purpose of i and j, so you don’t need to use such ungainly names as i_loop_counter for clarity.

The problem is the lifetime of the variables, which was formerly determined by the enclosing scope . This is a situation where a design decision was made from a compiler-writer’s view of what is logical because as a programmer you obviously intend i to be used only inside the statement(s) of the for loop. Unfortunately, however, if you previously took this approach and said

for(int i = 0; i < 100; i++)
    printf("i = %d\n", i);
// ....
for(int i = 0; i < 100; i++){
    printf("i = %d\n", i);
}

(with or without curly braces) within the same scope, compilers written for the old specification gave you a multiple-definition error for i. The Standard C++ specification says that the lifetime of a loop counter defined within the control expression of a for loop lasts until the end of the controlled expression, so the above statements will work with a conforming compiler. Watch out, though, for local variables that hide variables in the enclosing scope.

I find small scopes an indicator of good design. If you have several pages for a single function, perhaps you’re trying to do too much with that function. More granular functions are not only more useful, but it’s also easier to find bugs.

Storage allocation

A variable can now be defined at any point in a scope, so it might seem initially that the storage for a variable may not be defined until its point of definition. It’s more likely that the compiler will follow the practice in C of allocating all the storage for a block at the opening brace of that block. It doesn’t matter because, as a programmer, you can’t get the storage (a.k.a. the object) until it has been defined. Although the storage is allocated at the beginning of the block, the constructor call doesn’t happen until the sequence point where the object is defined because the identifier isn’t available until then. The compiler even checks to make sure you don’t put the object definition (and thus the constructor call) where the sequence point only conditionally passes through it, such as in a switch statement or somewhere a goto can jump past it. Uncommenting the statements in the following code will generate a warning or an error:

//: C06:Nojump.cpp {O}
// Can't jump past constructors

class X {
public:
  X() {}
};

void f(int i) {
  if(i < 10) {
   //! goto jump1; // Error: goto bypasses init
  }
  X x1;  // Constructor called here
 jump1:
  switch(i) {
    case 1 :
      X x2;  // Constructor called here
      break;
  //! case 2 : // Error: case bypasses init
      X x3;  // Constructor called here
      break;
  }
} ///:~ 

In the above code, both the goto and the switch can potentially jump past the sequence point where a constructor is called. That object will then be in scope even if the constructor hasn’t been called, so the compiler gives an error message. This once again guarantees that an object cannot be created unless it is also initialized.

All the storage allocation discussed here happens, of course, on the stack. The storage is allocated by the compiler by moving the stack pointer “down” (a relative term, which may indicate an increase or decrease of the actual stack pointer value, depending on your machine). Objects can also be allocated on the heap, but that’s the subject of Chapter 11.

Contents | Prev | Next


Go to CodeGuru.com
Contact: webmaster@codeguru.com
© Copyright 1997-1999 CodeGuru