Fast and accurate skew detection algorithm for a text document or a document with straight lines
- 23 March 1994
- proceedings article
- Published by SPIE-Intl Soc Optical Eng
- p. 133-140
- https://doi.org/10.1117/12.171101
Abstract
Bit-mapped images are becoming more popular in offices. Skew is a major problem for many otherwise promising applications. To remove the skew, we propose a new algorithm that makes use of both printed characters and straight line(s). Lines on a document are decomposed into small segments of black runs. By checking their connectivities, we can easily tell whether those runs are from the same line or not. To remove any bad effect from variation in line width, we sample a number of different x-y coordinates along the black runs, adjacent to white pixels. Those coordinates determine a correlation function which is used to find the correlation value. If the value is close to 1.0, we compute the higher-probability regression coefficient using the same parameters. The algorithm is effective both for horizontal and vertical lines. The coefficients can also be used to align character lines. The rectangles formed by connected black pixel are extracted using two or three different compression ratios. We can tell whether those characters are from the same character line or not, by checking the coordinates of rectangles in multiple compression images.Keywords
This publication has 0 references indexed in Scilit: