The Basics: How to interpret a chart review

I can’t stress enough how important it is to understand how to read the literature. It makes a world of difference in interpreting the results. When I read a study I always read the methods and the results section only. Then I make my interpretations and see if they match the authors conclusion. But to make conclusions you need to know a little something about methods. In this review, I want to go over what methods are important for the interpretation of chart reviews. Remember that chart reviews are not only the weakest type of study, they are unable to provide cause and effect. They are observational studies and therefore can only show ASSOCIATION not CAUSATION. There are lots of different checklists but unlike RCT’s there is no universally agreed upon methods for chart reviews (unlike the PRISMA for systematic reviews). Fortunately, we have two great papers in our own journal to base appropriate chart reviews on. The papers are by Goodman and Lowenstein (1996) and by Kaji and Schriger (2014). Definitely a must read for methodologists. In this review I want to just summarize them and formulate my own simple checklist (See Table 1) for validating chart reviews. The table below gives a summary of the 5 sources of biases that should be accounted for when looking at a chart review. Without rigorous methods studies severely lack the ability to answer their stated question. For purposes of statistics here when I refer to the word bias I mean the tendency of a measured sample (study) to INCORRECTLY estimate the population (true effect).

Is there INVESTIGATOR bias?

Why did the investigators do the study? There could be lots of reasons and so looking for financial bias or even intellectual bias (maybe they published in that area in the past) is important. Check for financial disclosures or honorariums given. Is a chart review the appropriate outlet to answer the proposed question? Or even more importantly ensure a primary outcome was proposed.

Is there CHART bias?

What method was utilized to find the charts. Were ICD codes or chief complaints used. Was there a sufficient sample of charts used? Were the charts chosen in a convenience sample or consecutively. The best way to look for charts is to perform random sampling of the charts. This is often not possible but the method of how and what charts were chosen should be discussed. How many charts are needed to arrive at the intended answer? A power calculation should be supplied by the paper. The included and excluded charts should be delineated. Also, why were charts excluded or included. Inclusion and exclusion criteria for picking charts should be determined ahead of time (a priori).  The paper should have a table of the clinical/baseline characteristics of the charts or patients being sampled. Finally, there should be a flow diagram showing how the charts were picked.

Is there DATA bias?

Once the charts are picked the data that is being extracted needs to be determined. This should be done ahead of time. Data collections tables (DAT) should be made and then trialed in a small pilot study to ensure they are adequate to get the intended data. That same DAT should be provided in the paper or in a supplement. What if the some of the data are missing? How this should be handled should also be determined a priori. A sensitivity analysis should be done to see how much missing data will negatively impact the results of the study. What if there is conflicting data in the same the chart? How will that be handled and what impact will it have?

Is there ABSTRACTOR bias?

The abstractors should not be the study investigators. Often times they aren’t even medical providers and therefore they should be trained appropriately in how to collect the data and how to use a data collection tool (DAT). The abstractors should also be blinded to the study hypothesis so there is no bias. Finally, the abstractors should be monitored and the monitoring process should be described in the paper especially if it’s a long trial. Maybe even repeat training should be done. How well do the abstractors agree with each other? Will different abstractors be able to get the same results? This is inter-rater reliability. A measure of this reliability should be given in the study. Both the percent agreement and a kappa score should be reported, at minimum. Lastly, the ability of the abstractor to agree with himself should be measured, that is the Intra-rater reliability.

Is there RELIABILITY bias?

The data should be reproducible and reliable. Kappa values and percent agreement should be reported for the data as well. How many variables should be looked at for reliability? Ideally all of them however at minimum the ones important for the study hypothesis should be checked. Also, how many of the data points should be checked for reliability? The consensus seems to be at minimum 10% of the data should be checked for reliability. Is the kappa level chosen appropriate or should it be higher or lower?

Table 1.

The Checklist:
1.     Investigator bias
        a.     Question appropriate for chart review?
        b.     Financial/Intellectual disclosures supplied?
2.     Charts bias
        a.     Methods of Chart Identification (CC vs ICD-10)
        b.     Sufficiently sampled?
        c.     Was a power calculation supplied?
        d.     A priori inclusion criteria?
        e.     A priori exclusion criteria?
        f.      Table of clinical characteristics?
        g.     Flow diagram delineating how the study population was derived?
3.     Data bias
        a.     Defined A priori?
        b.     Coding guide for abstractors?
        c.     Coding guide provided?
        d.     Standardized Data Collection tool (DAT)?
        e.     Was the DAT pilot tested?
        f.      Was the DAT provided?
        g.     Is there missing or conflicting data?
        h.     How is missing data handled (sensitivity analysis)?
4.     Abstractor bias
        a.     Blinded to study hypothesis?
        b.     Trained appropriately?
        c.     Monitored?
        d.     Inter-rater Reliability?
        e.     Intra-rater reliability?
5.     Reliability bias
        a.     Kappa and percent agreement calculated for the data?
        b.     What level of reliability and why was that level chosen?
        c.     Which of the collected variables were checked for reliability?
        d.     What percent of the data was checked for reliability?


Gilbert EH, Lowenstein SR, KozioI-McLain J, Barta DC, Steiner J. Chart reviews in emergency medicine research: Where are the methods? Ann Emerg Med March 1996; 27: 305-308.

Kaji AH, Schriger D, Green S. Looking Through the Retrospectoscope: Reducing Bias in Emergency Medicine Chart Review Studies. Ann Emerg Med. 2014; 64: 292-298.

One thought on “The Basics: How to interpret a chart review

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.