Wednesday, July 23, 2008

C++ Looping Beauty

(TODO: categorize posts)

/* This is for any programming folks out there! For the record, I realize that very few people reading this will understand more than 2% of this. So I won't be offended if you stop reading right here. I promise. Sometimes I'll write about programming in non-programming lingo, but this is very lingo-infested. */

Every Tuesday at Chief Architect we developers have a development talk (consisting of c++ libraries or Chief program architecture or design patters) over lunch (consisting of Qdoba, Pizza Hut, or a local deli). Earlier this summer, one of our developers gave a talk on high-level coding practices in C++, and made a couple shocking statement in particular.

1) Don't write loops.
2) Don't write conditionals

Loops and conditionals, of course, are two fundamental control structues and the buildling blocks of all computer programs since the 1960s or so. So of course it's not really possible to avoid them, and the shocking claims are better rewritten as follows:

1) Favor standard algorithms over manual loops.
2) Use polymorphism instead of conditionals.

Rule #1-- in C++, this often means using the for_each standard algorithm with functors. Suppose you have a list of images that you want to generate thumbnails of. The manual method, and the first that comes to mind for most programmers, looks something ilke this:

// Method #1
for (int i=0; i < images.size(), ++i) {
Bitmap thumb = shrinkImage(images[i]);
saveThumbnail(thumb);
}

This works fine, but has at least one major downside. Suppose your "images" container is a list, which is good because you can add and remove from lists in constant time. However, to access the i'th element of a list, you have to traverse through each element in the list from the beginning to element i--no big deal if you're at say element 5, big problem if you're at element 10,000. So--under the hood--the first time through the loop you just grab element 0; the second time, you step over #0 to get #1; the third time, you step over #0 and #1 to get #2; the hundredth time, you step over #0 and #1 and all the way to #98 to get #99. (This is the same whether you're in C++ or Java or C# or Lisp or Basic or whatever, whenever you're working with a linked list data type.) Now, in C++ you can use iterators to solve this really quick:

// Method #2
for (list::iterator i = images.begin(); i != images.end(); ++i) {
Bitmap thumb = copyAndShrink(*i);
saveThumbnail(thumb);
}

This time, by dereferencing the iterator (*i) you can work with items of the list as you're walking through it, which is phemonenally faster than before. But this is still a manual loop, and Rule #1 is about avoiding manual loops, like so:

// Method #3
void generateThumbnail(Bitmap& image)
{
Bitmap thumb = copyAndShrink(image);
saveThumbnail(thumb);
}
...
std::for_each(images.begin(), images.end(), generateThumbnail)

THIS is what the rule's all about. Put your looping code into its own function or function object--functor (see example below). Why? For one thing, in Method #2, you had to worry about what kind of iterator you're using, which in that case was a list iterator. When you're using for_each, you can change your "images" datatype from a list to a vector to a raw array to a deque, or any container with a .begin() and a .end() method, without having to change your loop a bit.

This guideline doesn't just apply to for_each (which means, do something to each item in a container). The C++ Standard Template Library has a whole slew of functions that can replace manual loops: find, find_if, remove, remove_if, reverse, etc etc. These functions are written by people whose entire job is to make the most efficient algorithm mathematicaly possible, for whatever kind of iterator you're working with (which means whatever kind of container you're working with). And this is why it's better to use a library function rather than a loop whenever possible.

Now if you've actually understood all this, and unless you've used the STL a lot, you're probably thinking that there's no way this can replace ALL loop cases--which is (for pratical purposes) true. But it works well for a heck of a lot more cases than I originally thought. For instance, suppose your thumbnail loop needs to make thumbnail objects and place them in another collection, say a list. That is, you want to do this:

// Manual loop (a C++ vector is a commonly used container, not a vector like in physics)
vector images = loadImages();
vector thumbnails;
for(vector::iterator i = images.begin(); i != images.end(); ++i) {
Bitmap thumb = copyAndShrink(*i);
thumbnails.push_back();
}

In this case, you can't just write an ordinary to do what you want in the loop, because the function will know nothing about the vector "thumbnails." Soooo you can make a function object--functor--and pass it a reference to thumbnails in its constructor. Then you "call" this function object just like you would an ordinary function, with operator() like so.

// Super cool and effecient loop
class LoadThumbnails : std::unary_function // necessary to use in a STL loop for typedefs
{
public:
LoadThumbnails(vector& thumbsList) : mThumbsList(thumbsList) {}
void operator()(Bitmap& bmp)
{
Bitmap thumb = copyAndShrink(bmp);
mThumbsList.push_back(thumb);
}
private:
vector& mThumbsList;
}
...
vector images = loadImages();
vector thumbnails;
std::for_each(images.begin(), images.end(), LoadThumbnails(thumbnails));

I'll talk about Rule #2 later. But I'll leave with just a couple quick observations (this is so much fun). (I know, I'm a nerd.) First, in the manual loops I used ++i instead of i++, because the pre-increment is one processor cycle faster than post-increment. (Post-increment copies the stored value, increments the stored value, and returns the copied value. Pre-increment increments the stored value and returns the stored value.) Second, these collections of Bitmaps should really just store pointers to bitmaps, because bitmaps are large and you don't want to keep copying them around in memory (which STL containers will do). If you're in a garbage-collected language like Java or C# then you're always got pointers. In C++ you have to keep track of who owns the memory to know who's responsible for deleting it... OR you could use a fancy smart pointer (like a boost::shared_ptr) which manages the memory by keeping a reference count, and every time its destructor is hit, it decrements the reference count, and when the reference count hits zero, voila! Deallocate the memory.

1 comment:

Kyle Ryan said...

For some reason blogger didn't like my spacing... there should of course be nice indenting for each new bracket { } combination...