IDR/QR: an incremental dimension reduction algorithm via QR decomposition
- 1 August 2005
- journal article
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Knowledge and Data Engineering
- Vol. 17 (9) , 1208-1222
- https://doi.org/10.1109/tkde.2005.148
Abstract
Dimension reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-dimensional data. In the literature, a well-known dimension reduction algorithm is linear discriminant analysis (LDA). The common aspect of previously proposed LDA-based algorithms is the use of singular value decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there has been little work on designing incremental LDA algorithms that can efficiently incorporate new data items as they become available. In this paper, we propose an LDA-based incremental dimension reduction algorithm, called IDR/QR, which applies QR decomposition rather than SVD. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification error rate on the reduced dimensional space. Our experiments on several real-world data sets reveal that the classification error rate achieved by the IDR/QR algorithm is very close to the best possible one achieved by other LDA-based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are inserted dynamically.Keywords
This publication has 31 references indexed in Scilit:
- An algorithm for suffix strippingProgram: electronic library and information systems, 2006
- CSVD: clustering and singular value decomposition for approximate similarity search in high-dimensional spacesIEEE Transactions on Knowledge and Data Engineering, 2003
- Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression DataJournal of the American Statistical Association, 2002
- Learn++: an incremental learning algorithm for supervised neural networksIEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 2001
- PCA versus LDAPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2001
- Merging and splitting eigenspace modelsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2000
- Dimensionality Reduction for Similarity Searching in Dynamic DatabasesComputer Vision and Image Understanding, 1999
- On self-organizing algorithms and networks for class-separability featuresIEEE Transactions on Neural Networks, 1997
- Indexing by latent semantic analysisJournal of the American Society for Information Science, 1990
- Regularized Discriminant AnalysisJournal of the American Statistical Association, 1989