Network properties of written human language

Abstract
We investigate the nature of written human language within the framework of complex network theory. In particular, we analyze the topology of Orwell’s 1984 focusing on the local properties of the network, such as the properties of the nearest neighbors and the clustering coefficient. We find a composite power law behavior for both the average nearest neighbor’s degree and average clustering coefficient as a function of the vertex degree. This implies the existence of different functional classes of vertices. Furthermore, we find that the second order vertex correlations are an essential component of the network architecture. To model our empirical results we extend a previously introduced model for language due to Dorogovtsev and Mendes. We propose an accelerated growing network model that contains three growth mechanisms: linear preferential attachment, local preferential attachment, and the random growth of a predetermined small finite subset of initial vertices. We find that with these elementary stochastic rules we are able to produce a network showing syntacticlike structures.
All Related Versions

This publication has 17 references indexed in Scilit: