Abstract
We prove that neural networks with a single hidden layer are capable of providing an optimal order of approximation for functions assumed to possess a given number of derivatives, if the activation function evaluated by each principal element satisfies certain technical conditions. Under these conditions, it is also possible to construct networks that provide a geometric order of approximation for analytic target functions. The permissible activation functions include the squashing function (1 − e−x)−1 as well as a variety of radial basis functions. Our proofs are constructive. The weights and thresholds of our networks are chosen independently of the target function; we give explicit formulas for the coefficients as simple, continuous, linear functionals of the target function.