Performance of checksums and CRCs over real data

Abstract
Checksum and cyclic redundancy check (CRC) algorithms have historically been studied under the assumption that the data fed to the algorithms was uniformly distributed. This paper examines the behavior of checksums and CRCs over real data from various UNIX file systems. We show that, when given real data in small to modest pieces (e.g., 48 bytes), all the checksum algorithms have skewed distributions. These results have implications for CRCs and checksums when applied to real data. They also can cause a spectacular failure rate for both the TCP and ones-complement Fletcher (1983) checksums when trying to detect certain types of packet splices. When measured over several large file systems, the 16 bit TCP checksum performed about as well as a 10-bit CRC. We show that for fragmentation-and-reassembly error models, the checksum contribution of each fragment are, in effect, colored by the fragment's offset in the splice. This coloring explains the performance of Fletcher's sum on nonuniform data, and shows that placing checksum fields in a packet trailer is theoretically no worse than a header checksum field. In practice, the TCP trailer sums outperform even Fletcher header sums.

This publication has 8 references indexed in Scilit: