This one was dense with lessons, so I'll examine my first working version (and the thinking that went into it), and then go through some of the changes that turned that 216-byte program into a 109-byte program.
Rule 30 is a very simple one-dimensional cellular automaton, where successive values of the central cell (https://oeis.org/A051023) appear to work surprisingly well as a pseudorandom number generator. (That is, judged by quality of "randomness" relative to complexity of the algorithm. Judged by speed it seems to take O(n^2) time, which is not very practical.) I decided to output the sequence as a stream of raw bytes which can be output to a data file, each byte containing 8 bits of the sequence, with the first of the 8 in the most significant bit.
The Rule 30 automaton has a 3-cell neighborhood: each cell's value at a given tick depends on its value and those of its left and right neighbors at the previous tick, according this table:
Previous tick values: 111 110 101 100 011 010 001 000
Current tick value: 0 0 0 1 1 1 1 0
It's called Rule 30 because 00011110 is the binary translation of (decimal) 30.
The assumed starting condition for this sequence is an infinite field of 0s with a single 1 cell in the middle, and it's that cell whose successive values form the sequence we're computing.
Right away I notice that, because 000 produces a 0, that infinite field of 0s will remain as 0s except where the influence of that one cell has propagated. I also notice that both 001 and 100 produce a 1, so that influence propagates both left and right at a rate of one cell per tick, disrupting the emptiness in favor of complexity. Corollaries, for the purposes of implementation in brainfuck: after the nth tick, all the nonzero cells are confined to a region 2n+1 cells wide, and the contents of that region are all I need to store/map at a given time; also, since that region grows to the left and right, whereas the standard brainfuck array only extends to the right, I need to move each cell of the automaton right one cell per tick to keep that region within the brainfuck array, or (equivalently) I need to change the automaton's defined neighborhood so each cell's value depends on its previous value and those of the the two cells to its left, using the rule
Previous tick values: 111 110 101 100 011 010 001 000
Current tick value: 0 0 0 1 1 1 1 0
Either way, this means the region of disruption, as stored in the brainfuck array, grows to the right only, at a rate of two automaton cells per tick.
So, the memory layout: for ease and simplicity, I allocated one brainfuck cell per automaton cell, to hold the cell values, and a second brainfuck cell per automaton cell to (A) be nonzero to mark the part of the array I'm actually using, (B) serve as extra space for calculations, and (C) mark the middle cell I'm extracting successive values from. (This second cell is set to 1 for most automaton cells and 2 for the middle cell. And yes, this is similar to the T cells from fib.b.)
Also similar to fib.b: since the new value of each cell is dependent only on its previous value and those of cells to the left, I can do the update in place, one cell at a time, provided I do it from right to left. (I.e., I don't need to allocate extra space for all the updated values, calculate and store all the new values, and then swap them into the old values' places.) And incidentally, I can extract the old value of the center cell right before updating it.
Besides the values of the automaton cells, I need to keep the accumulated center bits that I'm grouping into a byte to output; and I need to keep a counter to let me know when each output byte is finished. I stash these at the left to keep them out of the way of the data growing to the right.
Thus the first working draft has the basic memory layout:
0 c n 0 0 0 t v t v ... t v 0 0 0 ...
where c is the bit counter, n is the partly completed byte to output later, v cells hold the value of the the automaton cells, and t cells are nonzero when not being used as temporary variables.
Now, let's look at this draft first, and second-guess it later.
>++++++++>>>>>++>+<[
Initialization. I start the bit counter at 8 (number of bits left to extract before I output a byte). I start with t=2 and v=1, since the lone 1 cell is the center cell by definition. Then I start the main loop, executed once per tick; it starts at the rightmost nonzero t cell.
>>+>>+[
Here I lengthen the array by two automaton cells, by setting two more t cells as markers, and start the update loop that executes once per cell to compute its new value.
-[->>+<[<<<[<<]<<+>>>>[>>]+>-]<[>+<-]]
If this t cell was 1, I zero it and skip this loop. If it was 2, this is the (old) center cell, so I need to extract it; I still zero this t cell, and increase the next t cell to the right, to 2. (Because it will be the center cell next tick, and that t cell won't be messed with until then, since I update from right to left.)
Having updated the center marker, I then extract the value from the old center cell; if it's 1, I go back to the left and increment n, and also set this t cell to 1 before zeroing this v cell. If the center v cell was zero, I skip all that. Either way, I've moved the current value from the v cell to the t cell as well as adding it to the n cell, so I use another loop to move it back from t to v. Now I have the center bit extracted, the t cell zeroed, and the v cell restored.
<<<[>>>++++<<<-]>>[>++<-]>>[<+>-]
Now to update the current v cell, based on its current value and the two values to the left. First I combine all three old values into a single number from 0 to 7 (call it P; these numbers correspond to the "previous tick values" line of the automaton's transition table, treated as a 3-digit binary number). I put P in the zeroed t cell. Then I'll use a case statement to set all three to their correct values based on the combination in P (the left two get set to their old values, the right/current one to its new value). Note that I don't need to make separate copies of the (old) values of the left two cells: they are recoverable from P.
So, to make the combination P, the left-hand bit of the three gets multiplied by 4 and moved to t, because that bit is in the fours place of P (in binary). The middle bit is the twos place and the right-hand bit is the ones place. All three v cells are zeroed while combining them into P, which encodes all three.
<[>+<-[<+>-[-[<-<<+>>>-[>-<-[<+>-[-]]]]]]]+<<
This is the case statement that restores the v bit values based on P. Notice that, for instance, if P is 2, we will go into the outermost 2 loops but skip the 3rd and those inside it; not coincidentally, that 2nd loop leaves the three values at the right values for the 2 case, i.e. 0 1 1. "0 1" from the old values of 2 = 010, and the right-hand "1" the new value according to the automaton's transition table. That table again, for quick reference and comparison to the case statement:
Previous tick values: 111 110 101 100 011 010 001 000
(Decimal equiv in P:) 7 6 5 4 3 2 1 0
Current tick value: 0 0 0 1 1 1 1 0
(I.e. set values to:) 110 110 100 101 011 011 001 000
(Because I'm updating the right-hand cell at present; for the moment the left two stay the same as in the first row, so I restore them to those values, the values from the previous tick. They'll be updated on the next two passes through the update loop, and in that context they'll each BE the right-hand cell.)
Notice that this case statement would generalize easily to any automaton of this type (Rules 0-255). Each case would just have to set the right-hand value to the correct new value; I'd still update the left two in cases 2, 4, and 6 exactly as I do here. Rule 30 is a fairly concise one, since all 4 of those 1s are contiguous in the "current tick" line of the table; the case statement only needs to change the right-hand cell in cases 1 and 5, as all others have the same new-tick value as the previous case.
Anyway, after restoring the left two v cells to their old values and the setting the right-hand one to its new value, I restore the t cell to 1 and move on to the next t cell for the next iteration of the update loop, which will update the next v cell to the left.
Incidentally, in that memory map: 0 c n 0 0 0 t v t v ... t v 0 0 0 ...
Notice that these two zero cells: ^ ^ ...have their values checked when calculating new values for the leftmost two v cells (incidentally, always "1"); but I never calculate a new value for those zeroes themselves, as the update loop terminates before that. These two zeroes represent the zero cells just to the left of the "disrupted" range of cells whose values I update; if I were to calculate new values for them, they would always calculate as 000 -> 0, and these cells would still remain zero. (Plus I would have to have even more zero cells still farther left to look at during that calculation.)
]<+<<-[>>-<<[<+>-]]<[>+<-]
When the update loop exits, all the "disrupted" cells have been updated to their status at the next tick, and I've extracted the old middle bit and put it in n. So the next question is whether it's time to output n yet. First, I tentatively and temporarily set the fourth cell to 1 as a flag. Then I decrement the bit counter c, and use a loop to check if that brings it down to zero. If so, I skip that loop; but if c is still 1-7, I zero the flag and move c left to the first cell. Then, whether c is zero or not, I copy it from the first cell back to the second. After this line, the fourth cell flag is nonzero if and only if c is zero.
>>>[<.[-]<++++++++>>-]<[>+<-]>[<++>-]>>>[>>]<<
Now I use that flag to control a loop: if c has been reduced to 0, that means I've gathered 8 bits in n; and in that case the flag will be 1. So I output n, zero it (after it's output, the user has it and I don't need to store it internally), and restore c to 8 in order to count the next 8 bits. (Then clear the flag.) If I haven't gathered 8 bits yet, the flag will be 0 and I skip all that.
Next I need to shift all the bits in n one bit left, in order to make space for the next bit I'll extract. A 1-bit left shift of a binary number is the same as multiplying by two (and vice versa); so I move n to the right and then multiply it by 2 when moving it back left. (Slightly faster than the other order; if n is 90, for instance, I do these loops 90 and 90 times respectively, whereas they'd be done 90 and 180 times if I doubled n in the first loop rather than the second. A tiny detail.)
After that, I go back to the rightmost t cell in order to start the outermost loop again and compute the following tick.
]
End of main loop. Again, this program never terminates, because the sequence doesn't terminate. The main loop executes once per tick (updating all the cells in the growing disrupted range each time, right-to-left, using the update loop, and extracting the center bit when the pointer comes to it); every 8 ticks I output the byte accumulated from those 8 center bits.
So. That's the first working version. Here it is again without comments.
>++++++++>>>>>++>+<[
>>+>>+[
-[->>+<[<<<[<<]<<+>>>>[>>]+>-]<[>+<-]]
<<<[>>>++++<<<-]>>[>++<-]>>[<+>-]
<[>+<-[<+>-[-[<-<<+>>>-[>-<-[<+>-[-]]]]]]]+<<
]<+<<-[>>-<<[<+>-]]<[>+<-]
>>>[<.[-]<++++++++>>-]<[>+<-]>[<++>-]>>>[>>]<<
]
This is not terrible. There are some tiny improvements that could be made (e.g. I should change that -[-] to just [-] and that [>>>++++<<<-] to [>>++<<-] and I should move that [>>]<< to the second line to make [>>]<<>>+>>+[ which becomes [>>]+>>+[). But there are bigger and more interesting issues.
First, look at the technique in lines 3 and 6. We use a value as the condition for an "if", move it elsewhere to zero its cell so we can exit the "if" loop, then move the value back after. This is popular, but not the best way to do things. If we need a zero to get out of the loop, why not just move the pointer to a zero and end the loop there? That would save the overhead of moving things back and forth.
The obvious downside is that then we won't know where the pointer is after the loop: it will be in one place if we went through the loop, and a different place if we skipped the loop. But! Since those are two different places, they'll have different patterns of surrounding data, and probably even different patterns of nearby zeroes vs nonzeroes. The program can look at those to figure out where the pointer is, and reunify the possibilities with a second loop. Writing the program, we can choose our data layout to make that easy.
Two simple forms of this:
A. Make a second loop that will execute only if the first one did, which moves the pointer to where it'd be if the first loop had been skipped.
B. Make a second loop that will execute only if the first was skipped, which moves the pointer to where it'd be if the first loop had been executed.
(We can also put more actions into either of these loops; B in particular lets us use the second loop as an "else" for our "if".)
I have previously called this technique "nondestructive flow control", because it means we don't have to zero the cell holding the condition. Maybe "pointer juggling" would be a better name, since we lose track of the pointer's location in a controlled way and then regain it.
For example, in line 3, we can and should replace
-[->>+<[<<<[<<]<<+>>>>[>>]+>-]<[>+<-]]
with something like
-[->>+<[<<<[<<]<<+>]<[>>>>[>>]]] (type A, incidentally)
whereas in lines 6-7, we should swap out
<+<<-[>>-<<[<+>-]]<[>+<-]>>>[<.[-]<++++++++>>-]
in favor of something like
<<<-[>>>]<[>++++++++>.[-]>] (type B)
(which also requires initializing the leftmost cell of the brainfuck array to 1 rather than 0).
This may seem like a small thing, but it will prove perpetually valuable. It's maybe the most vital technique I learned from writing this program.
Next: it's a bad sign that we're having to specify 8 as the value of the bit counter in two different places. This hints at a bad loop structure: it's cleaner to put the 8 near the start, use it as the counter for an inner loop that's executed 8 times (computing one tick each time), and then output n after leaving that loop. (So we have a "per cell" loop inside a "per bit" loop inside a "per byte" loop.)
Also, the generality of the way we compute the new cell values is a bit suspicious. Since we're computing the rule 30 automaton specifically, maybe we can do better if we take specifics into account? The transition rule turns out to be equivalent to "right = left XOR (center OR right)". We can compute that without having to spell out all 8 cases.
Actually, we can go beyond shoring up weaknesses, and grab some non-obvious opportunities. Rephrase that rule as ((right) OR center) XOR left. The three old values can have their effects sequentially from right to left. And that means, instead of a loop done once per cell that looks at three old values and finds one new value, we can write a loop done once per cell that looks at one old value and processes its effects on three new values. Rather than "right = ((right) OR center) XOR left", we can have the update loop do "left = left; center = center OR left; right = right XOR left". We can process all the effects of an old value at once rather than processing all the causes of a new value at once. Each cell then gets updated piecemeal over the course of three successive iterations of the update loop. This is nice because looking at values is somewhat expensive (requiring at least two loops, with or without pointer-juggling); it is also nice because a zero value turns out not to have any effects! That "left = left" is superfluous on the face of it, but if left is 0, then "center = center OR left" reduces to "center = center" and "right = right XOR left" reduces to "right = right". Thus the whole thing reduces to "if (left) {center = 1; right = NOT right}". As an extra bonus, this means we can dispense with those two zeroes to the left of the disrupted portion. We know they won't change any values so we don't even need to store them.
One more thing. With this scheme, the leftmost value cell is never changed; we only look at it to notice it's 1 (and consequently set the second value cell to 1 and flip the third value cell). Really it only needs to be nonzero. And our bit counter c is always nonzero during the inner loop where we're processing this transition rule. So we can put c in our leftmost v cell. It will serve both functions smoothly.
Our new memory layout is
0 n c/v t v t v t ... v t 0 0 0 ...
at the start of the main loop; n is doubled and moved into the leftmost cell almost immediately though.
>>>++[
Set first t=2 (central cell) and start per-byte loop
<++++++++[
Set counter/value to 8 and start per-bit loop
<[<++>-]
Double n to shift left, moving to leftmost cell
>>[>>]+>>+
go to right end of array and lengthen by 2 cells (initial values 0)
[
start update loop
-[
decrement t; if still nonzero start loop to extract central bit
->>+
move the middle-cell marker one cell right
<<<[<[<<]<+>]
If current v cell nonzero, go increment n;
>[>[>>]]
If we did, this brings us back to t;
]
End bit extraction loop. t is now 0.
<[>>[-]]
If left v cell is nonzero, go to middle and clear it. Using pointer juggling again.
>[
If left v cell is nonzero, that will have put us at the next t cell right and we'll enter this loop. Whereas if left v cell was zero we will be back at our zeroed t cell and skip this loop. Thus, this loop needs to (1) handle the processing appropriate for a nonzero left v cell, and (2) leave the pointer at the zeroed t cell to reunify the strands.
>[-<<]
Since left was nonzero we need to flip right and set center=1. If right was 1 this loop will zero it and move left to the (pre-zeroed) center.
>[<+<]
The initial > brings us to one nonzero t cell or another. If right was 1, it got zeroed and we're at middle's t cell; then this loop will execute once, setting middle to 1, and drop us at left's t cell. If right was 0, we're at right's t cell; this loop will execute twice, setting right AND middle to 1, and drop us at left's t cell anyway. This is another variation on the pointer juggling theme. And nested inside another one as a bonus :)
]
In any case, we've now set middle and inverted right, and returned to left's zeroed t cell. So we end this loop; the update calculation has been done properly and concisely. (If this is at all fuzzy, trace through the different possible cases with a handwritten map of the brainfuck cells and values.)
+<<
Restore t=1 and go to next t cell for the next iteration of update loop.
]
End update loop. After this, the whole array has been updated 1 tick.
<[>+<-]
Move n back to 2nd brainfuck cell.
>>-
Decrement v/c (bit counter also serving as cell value)
]
End per-bit loop. On leaving this we have 8 bits gathered.
<.[-]>>
output those 8 bits from n, clear, n, go back to t.
(Note that if we did not clear n, these bits would continue to be shifted left. This would still give the right answers on most implementations--those with byte cells and also those with bigger cells that only output the last 8 bits on a '.' command--but would cause mild slowdown on implementations with byte cells, baddish slowdown on 16-bit cells, and really horrific slowdown on 32-bit. Because I aim to avoid making programs depend on cell size to run at reasonable speed, we clear n here.)
]
And again this terminates the once-per-byte loop; the program doesn't terminate.
Here it is again, without comments:
>>>++[
<++++++++[
<[<++>-]>>[>>]+>>+[
-[->>+<<<[<[<<]<+>]>[>[>>]]]
<[>>[-]]>[>[-<<]>[<+<]]+<<
]<[>+<-]>>-
]<.[-]>>
]
Some lessons from this program:
Pointer juggling! It turns out to be indispensable for good brainfuck programming. It's been useful in pretty much every substantial program I've written since then.
Solve problems at the right level of specificity.
Get down into the weeds! This is an absurdly low-level language, and to use it effectively you need to handle it directly. Don't build layers of abstraction between yourself and the language.
Turning your algorithm into code, you want to be thinking less "how do I instantiate this structure" and more "what code will have the needed behavior". If you can cut parts out and it will still do the right thing in all cases, great!
Try turning your program inside-out/rightside out. Maybe the loop structure wants to be less involuted. (Sometimes it wants to be MORE involuted. The most intuitive or natural order to do things in is not always the best.)
Even if a program looks good, it can probably be improved. It can probably be improved a lot, in a whole variety of ways. Stare at it longer. Second-guess every part. Second-guess how the parts are arranged. Then do that some more over time. This one, and many of mine, have been improved intermittently over the course of weeks and years.
(I won't always note explicitly how many different data layouts I've tried for a given program, but it's usually a good number. E.g. here the v cells ended up left of the matching t cells because I couldn't make the other way be as concise.)