An efficient implementation of the direct-SCF algorithm on parallel computer architectures