Prediction of the subcellular localization of eukaryotic proteins using sequence signals and composition

Abstract
A tool called Locfind for the sequence-based prediction of the localization of eukaryotic proteins is introduced. It is based on bidirectional recurrent neural networks trained to read sequentially the amino acid sequence and produce localization information along the sequence. Systematic variation of the network architecture in combination with an efficient learning algorithm lead to a 91% correct localization prediction for novel proteins in fivefold cross-validation. The data and evaluation procedure are the same as the non-plant part of the widely used TargetP tool by Emanuelsson et al. The Locfind system is available on the WWW for predictions (http://www.stepc.gr/~synaptic/locfind.html).