Abstract
The authors present novel scaled discrete cosine transform (DCT) and inverse scaled DCT algorithms designed for fused multiply/add architecture. Since the most popular case used in image processing involves 8*8 blocks (both emerging JPEG and MPEG standards call for DCT coding on blocks of this size), the authors discuss this case in detail. The scaled DCT and inverse scaled DCT each use 416 operations, so that, combined with scaling or descaling, each uses 480 operations. For the inverse, the descaling can be combined with computation of the IDCT (inverse DCT). If multiplicative constants, which depend on the quantization matrix, can be computed offline, then the descaling and IDCT can be computed simultaneously with 417 operations.

This publication has 4 references indexed in Scilit: