External Validation of a Measurement Tool to Assess Systematic Reviews (AMSTAR)
Top Cited Papers
Open Access
- 26 December 2007
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLOS ONE
- Vol. 2 (12) , e1350
- https://doi.org/10.1371/journal.pone.0001350
Abstract
Thousands of systematic reviews have been conducted in all areas of health care. However, the methodological quality of these reviews is variable and should routinely be appraised. AMSTAR is a measurement tool to assess systematic reviews. AMSTAR was used to appraise 42 reviews focusing on therapies to treat gastro-esophageal reflux disease, peptic ulcer disease, and other acid-related diseases. Two assessors applied the AMSTAR to each review. Two other assessors, plus a clinician and/or methodologist applied a global assessment to each review independently. The sample of 42 reviews covered a wide range of methodological quality. The overall scores on AMSTAR ranged from 0 to 10 (out of a maximum of 11) with a mean of 4.6 (95% CI: 3.7 to 5.6) and median 4.0 (range 2.0 to 6.0). The inter-observer agreement of the individual items ranged from moderate to almost perfect agreement. Nine items scored a kappa of >0.75 (95% CI: 0.55 to 0.96). The reliability of the total AMSTAR score was excellent: kappa 0.84 (95% CI: 0.67 to 1.00) and Pearson's R 0.96 (95% CI: 0.92 to 0.98). The overall scores for the global assessment ranged from 2 to 7 (out of a maximum score of 7) with a mean of 4.43 (95% CI: 3.6 to 5.3) and median 4.0 (range 2.25 to 5.75). The agreement was lower with a kappa of 0.63 (95% CI: 0.40 to 0.88). Construct validity was shown by AMSTAR convergence with the results of the global assessment: Pearson's R 0.72 (95% CI: 0.53 to 0.84). For the AMSTAR total score, the limits of agreement were −0.19±1.38. This translates to a minimum detectable difference between reviews of 0.64 ‘AMSTAR points’. Further validation of AMSTAR is needed to assess its validity, reliability and perceived utility by appraisers and end users of reviews across a broader range of systematic reviews.Keywords
This publication has 69 references indexed in Scilit:
- Epidemiology and Reporting Characteristics of Systematic ReviewsPLoS Medicine, 2007
- Initial management strategies for dyspepsiaPublished by Wiley ,2003
- Healing and relapse rates in gastroesophageal reflux disease treated with the newer proton-pump inhibitors lansoprazole, rabeprazole, and pantoprazole compared with omeprazole, ranitidine, and placebo: evidence from randomized clinical trialsClinical Therapeutics, 2001
- Short-term treatment of gastric ulcerDigestive Diseases and Sciences, 1996
- MISINTERPRETATION AND MISUSE OF THE KAPPA STATISTICAmerican Journal of Epidemiology, 1987
- Meta-Analyses of Randomized Controlled TrialsNew England Journal of Medicine, 1987
- STATISTICAL METHODS FOR ASSESSING AGREEMENT BETWEEN TWO METHODS OF CLINICAL MEASUREMENTThe Lancet, 1986
- Measuring nominal scale agreement among many raters.Psychological Bulletin, 1971
- Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.Psychological Bulletin, 1968
- A Coefficient of Agreement for Nominal ScalesEducational and Psychological Measurement, 1960