#078 - The non-modifying side-effect rule in STL algorithms (LWG 242)

The STL algorithms have a quietly subtle rule about the function objects you pass them. Read the standard literally and you’d think std::transform and friends require completely side-effect-free function objects — no modifying any variable, no I/O, no temporaries. Read the standard as it was actually fixed by Library Working Group issue 242, and the rule is far more sensible: the function object may have internal side effects all it wants, but it must not modify the sequences the algorithm is operating on, or invalidate iterators into them.

Knowing where the line really sits — and why an old draft of the standard accidentally drew it in the wrong place — is the difference between writing algorithm calls that are correct by construction and ones that happen to work because the implementation is forgiving.

Three things to take away:

LWG 242 narrowed the side-effect rule for transform, accumulate, inner_product, partial_sum, and adjacent_difference from “no side effects at all” to “no modification of the input/output ranges or their iterators.”
The function object may freely modify captures, locals, and external state; what it cannot do is reach into the algorithm’s own sequences.
The standard also permits the algorithm to copy your function object as it pleases, which means stateful function objects are fragile across calls — prefer pure functions when possible.

The original problem

The first edition of the C++ standard required the function object passed to numeric and transformation algorithms to be free of side effects. As Angelika Langer pointed out in 2000 when she opened LWG 242, that requirement quoted the language’s own definition of “side effect,” which read approximately:

Accessing an object designated by a volatile lvalue, modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects.

The standard’s definition is broad on purpose — it covers anything observable about program execution. Applied verbatim to a function object, it forbids almost everything useful:

// Under the original literal rule, all of these are non-conforming:
std::transform(v.begin(), v.end(), out.begin(),
    [](int x) {
        int tmp = x * 2;  // modifying a local — a "side effect"
        return tmp + 1;
    });

std::transform(v.begin(), v.end(), out.begin(),
    [counter = 0](int x) mutable {
        ++counter;        // modifying a capture — a "side effect"
        return x * counter;
    });

Both lambdas modify objects (tmp and counter respectively), both are technically “side effects” by the standard’s own definition, and both were therefore prohibited by the literal pre-242 reading. The first lambda is one of the simplest meaningful transformations imaginable. If the rule truly excluded it, half the textbook examples of transform would be non-conforming.

In practice no implementation enforced anything that strict — the algorithms worked with stateful and side-effecting function objects, and code relying on that behaviour shipped widely. LWG 242’s job was to fix the wording so the standard described what implementations actually did, and what users actually needed.

What LWG 242 changed

The committee replaced the blanket “no side effects” wording with a precise prohibition. For transform, the new rule reads (paraphrased from the issue’s proposed resolution):

In the ranges [first1, last1], [first2, first2 + (last1 - first1)], and [result, result + (last1 - first1)], op and binary_op shall neither modify elements nor invalidate iterators or subranges.

Three things changed at once:

The rule is scoped to the ranges the algorithm uses. The function object can modify anything it wants outside the input and output ranges — its own captures, global state, a logger, a counter, a temporary it computed.
The forbidden actions are named explicitly. Modifying elements of the algorithm’s own sequences is out; invalidating iterators into those sequences is out. Everything else is in.
The ranges are fully closed ([first, last], not [first, last)). A footnote in the issue makes this intentional: side effects must not modify the one-past-the-end element either, because that would invalidate last itself.

The same revision applies to accumulate, inner_product, partial_sum, and adjacent_difference — the five algorithms LWG 242 specifically addressed. The principle generalises to the rest of the algorithms library: a function object you pass to an algorithm must not interfere with the sequence the algorithm is walking over.

What this means in practice

The post-242 rule is effectively common sense: don’t reach into the algorithm’s own data while it’s running. Concretely, the following patterns are all fine:

// Fine: modifying a captured counter
int callCount = 0;
std::transform(v.begin(), v.end(), out.begin(),
    [&](int x) { ++callCount; return x * x; });

// Fine: writing to an unrelated stream
std::transform(v.begin(), v.end(), out.begin(),
    [](int x) { std::clog << "saw " << x << '\n'; return x + 1; });

// Fine: computing through temporaries
std::accumulate(v.begin(), v.end(), 0,
    [](int sum, int x) {
        int tmp = x * x;
        return sum + tmp;
    });

And the following pattern is the one to avoid:

// Not fine: the function object writes into the same range
//           the algorithm is iterating over.
std::transform(v.begin(), v.end(), v.begin(),  // input == output
    [&v](int x) {
        v.push_back(x);            // mutates v while iterating
        return x * 2;
    });

In-place transforms (v.begin() as both input and output) are allowed — transform is specified to handle that case — but only because the algorithm itself is performing the writes. Reaching around the algorithm to push into the same vector is a different thing entirely, and it invalidates the iterators the algorithm is using.

The LWG 242 rule covers the five named algorithms; a related defect report (LWG 475) clarified that for_each is not subject to this restriction — its function object may modify the elements it’s called with, through the dereferenced iterators. This is one of the few places where the algorithms diverge: for_each is for in-place mutation, transform is for producing a new sequence, and the rules around side effects reflect that intent.

A second, related rule: function objects may be copied

While we’re talking about what algorithms are allowed to do with your function object, there’s a second rule worth internalising: algorithms may copy the function object freely. A function object might be copied at the start of an algorithm, copied again partway through, called on a copy that the algorithm later discards, or invoked through a different copy than the one you passed in. The standard does not require any particular discipline about which copy is used.

The practical consequence is that stateful function objects are brittle:

struct Counter {
    int n = 0;
    bool operator()(int){ ++n; return true; }
};

Counter c;
std::count_if(v.begin(), v.end(), c);
// c.n may be unchanged — the algorithm may have called a copy.

If you need state to survive the call, capture by reference into a lambda, or store the state outside the function object entirely. Better still, prefer functions that depend only on their arguments — the algorithms compose more cleanly when their callable arguments are pure.

Takeaway

The pre-LWG-242 reading of the standard banned almost everything from algorithm function objects; the post-242 reading bans the one thing that genuinely matters — modifying the algorithm’s own sequence or invalidating its iterators. The function object is otherwise free to maintain captures, log, compute through temporaries, and have side effects on the rest of the program. Pair that rule with the one about algorithms being permitted to copy your function object freely, and the practical guidance is straightforward: write algorithm callables that depend only on their arguments where possible, keep external state external, and never reach back into the range the algorithm is currently walking.

The rule is simple: non-modifying doesn’t mean no side effects — it means don’t touch what the algorithm is using.