
Info 435/635: Seminar on Applications of Information Science
Notes on Reading a Research Paper
These notes are based on observing my own reading
habits, which are influenced by the types of paper that I read, the disciplines
that are of interest to me, and the fact that almost everything that I want
to refer to is online.
Why am you reading the paper?
There is never enough time
to read everything in depth. Here are some reasons for reading a paper:
- Awareness – to
know that there is a paper on a specific topic by certain authors. It may
be sufficient to read the title, the list of authors, and the abstract. A
glance at the bibliography will give context to the paper.
- Skim – to
understand the main themes of the paper. Reading quickly through the
paper will give a general view of the methodology and the achievements claimed.
Details of experiments or the proof of mathematical results will probably
be skipped over.
- Thorough – for one's own understanding. This is the theme
of the rest of this Web page.
- Source for one's own work. When making
use of a paper in one's own work (e.g., an algorithm or some mathematics),
there is often a part that needs to be understood in great detail, while
other sections can be skimmed.
- Review. The purpose of a review is two-fold:
to make suggestions to the author for changes; and to recommend that the
paper be published in a journal or accepted for a conference. This is beyond
the scope of this Web page.
Who wrote the paper and why?
Papers are written by
people for a purpose. Who are the authors? Why did they write this paper? Where
was it published?
Superficially, research papers are written to tell the world about new scholarship
and research, but that is not the whole story. There are cultural differences
between various research groups and between different discipline; papers will
reflect these cultural difference.
Journals and conferences have criteria for
accepting papers; any sensible author adheres to the guidelines and emphasizes
aspects that will appeal to the reviewers.
Academic authors are seeking for
the prestige that helps their careers; they may divide a program of work into
small sections, so that they can publish several papers, or write different
versions for different audiences. A junior researcher (e.g., a Ph.D. student)
has a different view than an established authority.
Context
When reading a paper thoroughly expect to spend considerable time reading
things other than the paper itself, for example:
- Surveying
papers that are cited by the paper. They will be listed in the bibliography
or in footnotes.
- Looking for papers that cite this paper. Citeseer is
a useful tool for this purpose.
- Reading other papers by the same
authors. Authors usually list their publications on their personal Web sites.
- Checking
terminology in dictionaries or by online searching.
- Checking my understanding of techniques, algorithms, or mathematical methods used by the paper. Mathworld
and Wikipedia are useful sources for mathematics.
For these reasons, it is of great help to be online when reading a paper.
Note taking
Some people take extensive notes when reading a paper, others rely more
on their memory. (When reading thoroughly, I will usually save a digital
copy of the paper which I annotate in red, highlighting points that I find
particularly relevant or interesting.)
To understand a technical paper, it helps to have a sheet of paper
to expand the ideas in your own manner. Many papers are very compressed.
Often this is because the authors are writing for a journal or conference
that has a limit on the length of a paper. To fully comprehend their work,
you need to expand the steps that they have left out. For example.
- Build understanding of an algorithm of complex mathematics by working
a simple numerical example.
- Derive mathematical steps that the paper skips over.
- Draw a diagram to illustrate some aspect of a computer system.
- Sketch a graph to display tabular data.
- Reformulate mathematics in a different notation.
Empirical data and models
Many papers include empirical data. To understand the paper you need to understand
the data.
- How was the data collected and how has it been treated? The process
used to gather the data determines how it can be used. Social
science data may be derived from a few case studies or a large randomized
study following a structured statistical design. Selected variables may have
been randomized. The treatment of outliers or missing data may have an impact
on the results. Scientific data will depend on the equipment used and how
it is calibrated. Large data sets may have been preprocessed algorithmically,
e.g. by filtering.
- Which examples are chosen for publication and how is
it displayed? Often
a huge amount of data is reduced to a few tables and graphs for publication.
How was this selection made? Do you understand the variables in tables
and the scales on graphs? Do you consider that they represent the experiment
well or do the authors appear to have presented the data in a tendentious
manner?
Empirical data is often used to suggest models or to validate models that
have been suggested by theory. In either case, what is the evidence that the
model represents the data?
- Suppose that the paper makes a statement such as,
"This figure shows a linear relationship between the two variables." Do
you agree that the figure supports this observation? If so, what is the evidence
that this is a general relationship or just a property of this particular
set of data? Is there a strong relationship?
- If a statistical test is used to evaluate the fit, what are the assumptions
behind the test? Many statistical tests have very strict mathematical assumptions
(e.g., the residuals must follow a certain distribution). Do they apply to
this data?
Reading
mathematics
Mathematics in a research paper can be anything from trivial to highly obscure.
At a first reading, try
to understand the shape of the mathematics, without following every line. What
are the main results? What is new about this paper and what is a restatement
of previous work? Are there intermediate results on which they depend? Is there
a general result followed by one or more special cases? It is impossible to
read any serious mathematics without being clear of the notation and
the definitions.
To fully understand a deep mathematical concept takes time.
Enthusiasm and skepticism
Hopefully, every paper that you read will have some interesting ideas or report
on valuable experiments. Papers that are published in reputable publications
should have been reviewed both for correctness and for the value of the research.
This may be by peer-review or by an editorial process.
But also be skeptical. Authors are enthusiasts about their work. There is
great pressure on authors to present results in the most favorable light. Reviewers
are not always as thorough as they should be, and have no way to check experiments
or observe how empirical data way gathered.
William Y. Arms
Draft: February 6, 2006