Abstract
Generalization is one of the most important problems in neural-network research. It is influenced by several factors in the network design, such as network size, weight decay factor, and others. We show here that the initial weight distribution (for gradient decent training algorithms) is one other factor that influences generalization. The initial conditions guide the training algorithm to search particular places of the weight space. For instance small initial weights tend to result in low complexity networks, and therefore can effectively act as a regularization factor. We propose a novel network complexity measure, which is helpful in shedding insight into the phenomenon, as well as in studying other aspects of generalization.