Welcome to the third in our series on Literature Search and Analysis. In the first two articles we covered the search process – how to decide the best approach to your question, and then how to retrieve your data effectively and ethically. In this article we address the actual data extraction; how to get a good worksheet that best supports evaluation and analysis.

We should point out here that this advice is NOT intended for researchers undertaking Cochrane reviews, which are in a class of their own and have a set methodology for data extraction and analysis; as well as software for statistical analysis. In this series, we assume you are a beginner.

So here you are, with a huge pile of papers and some sort of report to write at the end of it that answers your research question. What’s the best way to begin to make sense of it all?

There are 5 key factors to consider for the analysis of your data that will help ensure you assess them properly:

  • Decide your parameters – go large first?
  • Spreadsheets and filters
  • Consistency in your definitions
  • An independent eye
  • Keeping the end in mind

The items that you want to include in your data analysis and extraction are decided by the research question and the inclusion and exclusion criteria.

The research question will undoubtedly have the most influence on the parameters you want in your spreadsheet, and the degree to which you want to analyse them. For example, if you are considering the quality of life associated with a condition, do you need to separate out:The methods used to measure it? The age of the patient? The comorbid conditions? The types of treatment the patients are receiving? If you are not very familiar with your topic to start off with, you may find that there are factors that could impinge on the answer to your research question that you were unaware of (let’s face it, this is probably the reason you are doing the literature review in the first place!) at the start of the project. The point here is to ensure that you don’t analyse a large number of papers and then find out that you have to go back and review them all again, either because you omitted to analyse them in enough detail or you overlooked an important factor that may affect the interpretation of your findings. Hence our suggestion to “go large”, at least initially, rather than keep your number of parameters (columns on a spreadsheet) small. A pilot study of about 10 representative papers, before you perform the research in earnest, can help you decide on which parameters to include or exclude.

Let’s consider the inclusion and exclusion criteria. These should have been set clearly at the beginning, but sometimes it’s only when you get right into the data that you find you should have been more stringent (or expansive)on some of these elements.

For example, we have been working with a client on a particular project that was designed to look at the methodology used to determine costs associated with a common hospital-acquired infection. The aim was to determine true costs of the infection, by looking at when and how the infection was contracted – either as a consequence of hospitalisation, or as the original reason the patient was hospitalised. However, as the data were collated it was found that, in a large number of studies, costs could not be reasonably attributed to infection from the way they were reported. In this instance, perhaps the criteria could be refined to exclude these studies.

However, more often you may find that your exclusion criteria have been too rigid, and there is so little evidence that no conclusion can be formed. In this instance you may need to go back and re-think your inclusion data – perhaps, for example, also including datafrom observational studies.

When thinking about inclusion criteria for studies, you should think in terms of levels of evidence for your own clarity of mind. While the perception of value of data at different levels has changed in recent years, it still forms a useful classification, particularly when considering inclusion criteria. This URL will give you a classification used by the Centre of Evidence-Based Medicine http://www.cebm.net/index.aspx?o=1025 which is very complex; there are plenty of others out there that are simpler. Just look for levels of evidence in Google images and you will find a large number to choose from. My advice is to find one from a reasonable source and stick to it. And of course, explain what you have done in your report methodology, which we will discuss in the next issue.

The important issues when having to rethink inclusion or exclusion criteria are:

  • Do we need to revisit all the studies again (the answer to this is usually yes, just to be sure)
  • Will the alteration in the criteria affect the analysis or assessment method? (probably)
  • Can we still answer our question with certainty? (yes or you shouldn’t alter them)

The second key success factor is the design of the spreadsheet to be used for the data extraction. The important thing here is to try to make the content of the worksheet as simple and as similar as possible without deviating from what the authors possibly meant. The aim is to be able to create something that is filterable (e.g. by patient characteristics, stage of disease, treatment etc.) Perhaps this isn’t as important when you only have 10 studies to analyse, but it’s vital when you have a large number and are trying to draw conclusions from it all.

Some simple words of advice:

    • Keep the flow of the spreadsheet columns logical and consistent with your papers, and you will find it easier to fill. The less jumping between different worksheets you have to do, the fewer the inaccuracies you will generate by putting data inadvertently in the wrong columns. If more than one person is extracting data, make sure that you have the columns in the same order in everyone’s copy of the worksheets.


    • If you are dealing with a lot of papers, use a system that keeps them under control and easily found again. Rather than put them in alphabetical order, I tend to give them a number and ensurethat the number column is on every worksheet, particularly if I have so many parameters that I need a number of worksheets to keep them manageable.


    • Keep one worksheet exclusively for the citation reference, breaking this down so that authors are in different columns to the title, the date, thejournal and citation details. It’s very helpful if you can put all this information straight into your worksheet from PubMed; thiskeeps the details accurate which will save you time in constructing your bibliography. You can pull citation details straight from PubMed without having to cut and paste, by using reference management software – and if you have this, and find it easy to use, it’s certainly an advantage. However, it’s unlikely that you will be able to add and evaluate all your parameters within such a database, unless you are using the REVMAN database used for Cochrane reviews. (And if you are, this article is WAY below your level of capability – stop reading now!) So normally another step is required to extract the information from the reference management software into your working spreadsheet; not a difficult task in itself, but make sure you know what you are doing before you attempt it. Nothing worse than ruining all your work by carrying out an extraction that corrupts your data….


  • Don’t forget copyright constraints and be tempted to cut and paste parts of the text of the article straight into your worktable – this is illegal. You can add an author’s comment in quotes if you plan to quote directly in your report, but even this is bordering on breach of copyright if too extensive. Paraphrase to be on the safe side, but make this as accurate a reflection of the authors intention as you can. See below for hints on consistency..

Always record the definitions that you are using somewhere on your worksheet. For example, definitions of disease severity (if these are not standardised), types of procedures, etc. It can be easy, particularly where the area is complex and researchers often vary in their definitions, for these to slide during your data extraction and analysis. You will be grateful to have these in front of you when you are working on your 300th paper in the dead of night.

If you intend to do statistical analyses on your findings, rather than just a descriptive evaluation, you need to be extra careful about your parameters. And that your data entry is absolutely accurate.
Setting your definitions will be helped by…

Having another experienced person check your reasoning, your definitions of your parameters, and your first dozen or so data extractions is of infinite value.

This person should be unassociated with the project, so they bring a fresh perspective to it. Our writers, when doing these reviews, always check out the parameters and the first ten entries in two separate stages, so our clients get two chances to amend the parameters before the bulk of the work is done. And our internal reviewers have a lot to say at this point, also. Nonetheless, often a third independent look at the work in progress will discover a potential issue in the analysis that the two earlier ones did not.

In formal systematic literature reviews, a second reviewer is often part of the methodology; and what to (or what not to) include is a collaborative decision. However, I would advocate also having someone else review this, as it is easy, particularly with a very large project, for both researchers to become so involved in the study that they no longer see the bigger picture and may wander from the point, which brings me to the last key success factor: ..

The last thing to remember is to keep your research question uppermost in your mind at all times. It can be very easy to follow interesting but distracting side routes with the data that leave you feeling intrigued and informed, but haven’t helped you find the answer to your research question!

Our advice to “go large” with your parameters at first can encourage this; it is always a choice between wandering from the point (by including too many parameters) or having to re-analyse your papers again (because you didn’t make it detailed or comprehensive enough). Keeping your research question at the forefront will do a lot to help you steer your way through the extraction and assessment processes successfully, and ensure that your final report answers the research question as fully as possible.