Molecular linguistics: Extracting information from gene and protein sequences