Technical terminology: some linguistic properties and an algorithm for identification in text
- 1 March 1995
- journal article
- research article
- Published by Cambridge University Press (CUP) in Natural Language Engineering
- Vol. 1 (1) , 9-27
- https://doi.org/10.1017/s1351324900000048
Abstract
This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text. The grammatical properties discussed are preferred phrase structures: technical terms consist mostly of noun phrases containing adjectives, nouns, and occasionally prepositions; rerely do terms contain verbs, adverbs, or conjunctions. The discourse properties are patterns of repetition that distinguish noun phrases that are technical terms, especially those multi-word phrases that constitute a substantial majority of all technical vocabulary, from other types of noun phrase.The paper presents a terminology indentification algorithm that is motivated by these linguistic properties. An implementation of the algorithm is described; it recovers a high proportion of the technical terms in a text, and a high proportaion of the recovered strings are vaild technical terms. The algorithm proves to be effective regardless of the domain of the text to which it is applied.Keywords
This publication has 14 references indexed in Scilit:
- Binary classification by stochastic neural netsIEEE Transactions on Neural Networks, 1995
- TermightPublished by Association for Computational Linguistics (ACL) ,1994
- Lexical knowledge representation and natural language processingArtificial Intelligence, 1993
- Generating and evaluating domain-oriented multi-word terms from textsInformation Processing & Management, 1993
- Surface grammatical analysis for the extraction of terminological noun phrasesPublished by Association for Computational Linguistics (ACL) ,1992
- Slot GrammarPublished by Springer Nature ,1990
- Syntactic approaches to automatic book indexingPublished by Association for Computational Linguistics (ACL) ,1988
- A stochastic parts program and noun phrase parser for unrestricted textPublished by Association for Computational Linguistics (ACL) ,1988
- The use of titles for automatic document classificationJournal of the American Society for Information Science, 1980
- General Principles of Classification and Nomenclature in Folk BiologyAmerican Anthropologist, 1973