[golden.b -- compute golden ratio
(c) 2019 Daniel B. Cristofani
http://brainfuck.org/]

This program computes the "golden ratio" (https://oeis.org/A001622).

The core of this one is a square root algorithm. This is another one where it was worthwhile to bang the algorithm into shape before coding.

Trying to use simple machinery, I first thought of the progression for natural n: n^2=1+3+...+(2n-1). This uses addition in place of multiplication: promising for reducing apparatus and storage space. (It'd be too slow for square roots of large numbers, but we're thinking of small ones here.) Visualize this as a square cut into (or built up out of) n nested L shapes in its grid.

   /\
  / /\
 / / /\
/ / / /\
\ \ \ \/
 \ \ \/
  \ \/
   \/

The algorithm, then, given we want to compute sqrt(x), would be to store x, store n (initially 0), and store our sum so far of 1+3+...+(2n-1) (call that z). We check whether z+2n+1<=x; if so, we increase z by 2n+1, increment n, and check again. As soon as z+2n+1>x, we know we have the right value for this digit of n. (Notice we don't need a separate upper bound here.)

Then to get the next digit we sort of magnify that image: (10(n/10))^2=1+3+...(20(n/10)-1); so we multiply the n we've built up so far by 10, and multiply x and z by 100, and start adding thinner L-shapes than before, until it overflows again to give us our next digit. That's easy. These numbers are most easily stored in binary again, since they won't be output directly; and we already have a nice multiply-binary-number-by-10 algorithm left over from e.b, along with some other useful binary arithmetic code. Overall this program is a close heir to e.b and is maybe best studied after it.

Trying to cut down on storage space and calculations, I notice first that we don't need to keep n; it's enough to keep 2n and the current decimal digit of n. Second, we don't need to keep x and z, and compare z+2n+1 with x; instead, we can just store x-z and compare that with 2n+1; if x-z>=2n+1 we set x-z to (x-z)-(2n+1), and set 2n to 2n+2. Basically we're storing only the not-yet-extracted residue of x, as in e.b.

Working up to each digit gradually, by checking 1, 2, 3..., is reasonably fast: we check a maximum of 10 terms per output digit, an average of 5.5; if we wanted to do a little binary search for our next digit, by checking 5 first, then 8 or 3 depending whether 5 overflowed and so on, we'd still add an average of 3.4 terms per digit. This might possibly be faster--dunno--but even so it doesn't seem worth the extra hassle. Likewise this much simpler possible tweak: if our digit is 9, we'll find that out by pushing on to check if it's 10 and then failing; we could instead keep a counter set to 9, which becomes 0 either after checking 9 digits or by being explicitly zeroed as soon as we overflow, to save <1/55 of our time. Not worthwhile.

For this program, we're computing sqrt(5)/2+.5. Rather than divide explicitly, we'll just compute sqrt(5/4)+.5, so we keep some bits left of the binary point. We'll keep the halves bit of 2n, and the halves and quarters bits of x-z. The .5 gets added specially at the same time as we output the decimal point.

This time, rather than check which number is larger before subtracting, we will proceed with the subtraction speculatively, and on underflow, reverse the current subtraction in progress and output the current digit.

The basic memory layout is 
f d 0 0 0 0 0 a b a b a b a ... 
where f is a flag indicating that we haven't output the decimal point yet, d is the next digit, a is x-z and b is 2n or often 2n+1, both stored in nonzero binary (cell value 2 (cv2) = 1-bit, cell value 1 (cv1) = 0-bit). Remember, the ones bit of a is that third a, and the ones bit of b is the second b.

In the course of this program, we'll generally keep b free of leading zeroes (cv1), whereas a will pick some up during subtractions and tend to get them cleared out in the multiplications.

+>>>>>>>++>+>+>+>++<[
Initialize f=1 a=5/4 b=0.0

    +[
    Increment b (known to be even, so no carry) and start subtracting b from a speculatively.

        --[++>>--]->--[
        find 1-bit of b (cv2); set to -1 and look for a matching 1-bit of a. Enter loop if not a 1-bit.

            +[
            if 0-bit (cv1) we skip this loop and go on looking for 1-bits of a.
            If cv0 (underflow past the end of a) we enter it, because in that case b>a and
            we need to reverse this last subtraction.

                +<+[-<<+]++
                Restore cv0, go back to current 1-bit of b and reset it to cv2.

                <<[-[->-[>>-]++<[<<]++<<-]+<<]
                For each previous 1-bit of b, add it back to a.
                This relies on the leftmost bit of b being a 0 (cv1).
                Again these techniques are basically the same as in e.b.

                >>>>-<<<<
                Reverse the incrementing of b since we'll need to multiply it by 10 soon.

                <++<-<<
                This is slightly convoluted: because we are now deep in the code for a normal,
                successful subtraction, we set up a tiny artificial habitat to let the outer
                loops terminate smoothly.

                We could, instead, do the output and the multiplications by 10 in here, and
                re-increment b, and do the first little bit of the next subtraction up to the
                point where it's time to re-enter this loop. So far, I haven't managed to make
                that version quite as concise as this one.

                So. We now have:
                f d 0' 0 -1 2 0 a b a b a
                and you can see it play out soon.

                ++++++[<++++++++>-]<.---<[->.[-]+++++>]>[[-]>>]
                Now we go ahead and add 48 to d and output it. If flag was 1, we clear it and
                output the decimal point and then set d to 5, i.e. we're adding 5 to what will
                be the second digit output, or .5 to the whole number, to complete it.
                This turns ~1.12 into ~1.62, so it works correctly, no carries needed.

                If flag was 0 we just clear d.

                We will do the multiplications by 10 and 100 later, outside the subtraction loop.
 
            ]+>>--
            In the normal subtraction case we are still looking for a 1-bit (cv2) of a to 
            subtract the current 1-bit of b from. (In the underflow case we step forward to
            our artificial 2, zero it, and exit the loop.)

        ]+<+[-<+<+]++>>
        In the normal subtraction case we now set the found 1-bit of a to 0 (cv1), then go back
        to the -1 marker at the 1-bit of b we're subtracting, adding the borrowed 1s to a on
        the way. Then restore that 1-bit of b to cv2 and go on looking for the next 1-bit of b.

        In the underflow case, we increment, increment our artificial -1 to 0, skip that little
        loop, set to 2, move right 2, exit the next loop. So in the underflow case we end up with:
        f d 0 1 2 1 0' a b a b a
        and some of this clutter will be useful soon.

    ]<<<<[[<<]>>[-[+++<<-]+>>-]++[<<]<<<<<+>]
    Now, in the normal case, we're going back to the left end, and incrementing b again (it's odd
    now so we need the full carry apparatus this time, including the number-lengthening part).
    This brings it to 2n+2, or 2n for the next value of n.

    Notice: the first bit of b is the halves bit so we want to leave it alone and start adding
    at the ones bit. Fortunately, by merely omitting a - before the increment loop it will treat
    the halves bit of b (always cv1) as a special case and skip it smoothly.

    After incrementing b we go back to the left and increment d, our current digit.

    In the underflow case, we went back to the third cell and then skipped the rest of the line.

    >[->>[[>>>[>>]+[-[->>+>>>>-[-[+++<<[-]]+>>-]++[<<]]+<<]<-]<]]>>>>>>>
    Conversely, we skip this line in the normal case but do it in the underflow case.
    Again, in that case we have 
    f d 0' 1 2 1 0 a b a b a
    and we use the second 1 and the 2 as counters, to multiply b by 10 once, and a twice. 
    Then in either case we go back to the ones bit of b to restart the main loop.
]
And that's it.

Without comments:

+>>>>>>>++>+>+>+>++<[
    +[
        --[++>>--]->--[
            +[
                +<+[-<<+]++<<[-[->-[>>-]++<[<<]++<<-]+<<]>>>>-<<<<
                <++<-<<++++++[<++++++++>-]<.---<[->.[-]+++++>]>[[-]>>]
            ]+>>--
        ]+<+[-<+<+]++>>
    ]<<<<[[<<]>>[-[+++<<-]+>>-]++[<<]<<<<<+>]
    >[->>[[>>>[>>]+[-[->>+>>>>-[-[+++<<[-]]+>>-]++[<<]]+<<]<-]<]]>>>>>>>
]