A thing to remember is that the customer is interested in the probability of correct reception after the error-correcting-code has done its magic.

The 5-dimensional Reed-Muller code of length 16 and minimum Hamming distance 8 is capable of correcting 3 errorneous bits. The 6-dimensional Reed-Muller code of length 32 and minimum Hamming distance 16 is capable of correcting 7 errorneous bits. Let us assume a very simplistic model in which all the bits are received errorneously at the same probability $p\in(0,1)$, independently from each other. The probability of correctly decoding a received word of $R(4,1)$ is thus

$$P_4(p)=\sum_{i=0}^3{16\choose i}p^i(1-p)^{16-i},$$

and the same probability with the code $R(5,1)$ is

$$P_5(p)=\sum_{i=0}^7{32\choose i}p^i(1-p)^{32-i}.$$

Unless I made mistake, we have $P_5(p)>P_4(p)$ for most small values of $p$.
For example, $P_5(0.1)=0.9883$ and $P_4(0.1)=0.9316$. And at $p=0.01$ we have
$1-P_5(0.01)=8.5\cdot10^{-10}$ and $1-P_4(0.01)=1.7\cdot10^{-5}$.

So when transmitting an image of 1000 x 1000 pixels at $p=0.01$, we expect to receive an image free of errors, when using $R(5,1)$, but expect a few dozen garbled pixels, when using $R(4,1)$. Furthermore, to correctly receive 5 pixels worth of image data, we need to correctly receive 5 blocks of $R(5,1)$ instead of 6 blocks of $R(4,1)$. In other words, in terms of payload the fair comparison should be made between $P_4(p)^6$ and $P_5(p)^5$.

Above I assumed a decoding logic decoding up to the guaranteed error-correction probability only. One might attempt a more complicated receiver (using soft input) doing full soft decision decoding (which in this case amounts to a simple Walsh-Hadamard transformation). I don't know whether that would change the verdict, though.

**Moral:** long codes often work better. The reason is that in a short block the number of errorneous bits has a higher (relative) variance, and thus it is easier for the number of errors to exceed the error-correcting-capability of the code. Or yet in other words, a short code with the same relative Hamming distance will run into problems handling a burst of errors.

But the question of the code rate is not without merit either!!! In the Mariner application it would have meant that it takes a longer time to transmit a single image using $R(5,1)$ than it would with $R(4,1)$. The eager astronomers can wait a bit longer to get the image, but a serious concern is that the probe uses a fixed amount of battery power per transmitted bit. This would also need to be taken into account, so my figures are not fair to the shorter code. In terrestrial communication systems we carry out extensive simulations before we choose one coding scheme over another, and plot the probability of an error vs. energy per transmitted bit. With Mariner, we could try and estimate the relation between $p$ and energy consumption per bit, but I don't have the time to get into that.

Trying to add a more meaningful comparison. Let's have Mariner transmit 5 pixels worth of bits. Using $R(5,1)$ it needs to transmit a total of $5\cdot32=160$ bits as opposed to $6\cdot 16=96$ bits required when using $R(4,1)$. Therefore, for a fair comparison, Mariner can spend $160/96=5/3$ times as much power per bit when using $R(4,1)$. So we can assume that using $R(5,1)$ Mariner transmits a real number $+1$ or $-1$ according to whether a bit zero or one is intended. Then using $R(4,1)$ it can transmit $\pm\sqrt{5/3}$ for the same total power consumption. The receiver interprets a positive received number as the bit zero and a negative as the bit one.

Assume that noise has deviation $\sigma=0.5$. With $R(5,1)$ we then get a bit error, when noise exceeds $+2\sigma$, so this happens with probability $p_5=1-\Phi(2.0)=0.0228$. The corresponding bit error probability when using $R(4,1)$ is then $p_4=1-\Phi(2.0\sqrt{5/3})=1-\Phi(2.58)=0.0049$, because this time we need $\sqrt{5/3}$ times as much noise
as earlier to receive a bit incorrectly. The test is then to compare
$$
1-P_5(0.0228)^5=2.35\cdot10^{-6}
$$
to
$$
1-P_4(0.0049)^6=6.01\cdot10^{-6}.
$$
We see that we do have a better chance of correctly receiving 5 pixels worth of data using the longer code, but the difference is not nearly as dramatic as the earlier figures, disregarding the energy consumption, would have indicated.

7more comments