How to write a scientific article.
Writing is a creative process that feeds back into, and helps guide, your science.
A good paper is like a delicious buffet where the audience can be enriched just by looking at it (an informative Title), have their sense of smell entranced by just walking around the table (Abstract) and then be fully satisfied by just the starter, main course and desert of their choosing (e.g. reading the last paragraph of the Introduction, looking at one main Figure and the first paragraph of the Discussion). A point to bear in mind, as this analogy continues to be drawn out, is that it is rare for a reader to have sufficient appetite for your paper to read the whole buffet in one sitting. In summary, a great ‘dinner party’ will increase your popularity with reviewers and editors and help you get accepted. It will also get your message across better to the eventual readers who will end up liking and citing you.
The writer’s job is to prepare the raw ingredients for the reader and reduce the reader’s effort .
First-off then, psych yourself up!
For anyone who experiences writer’s block I recommend the following book by Stephen B. Heard. It focuses a lot on the discipline you need to write a paper i.e. the writing process itself.
But otherwise steel yourself and just get on with it! Here’s what to do in detail!
1. Figures sequence
Whether or not you have some written material already, to really get started writing you need to have manifested some experiments and recorded the evidence and prepared your Figures and Tables. If and when you have distilled your Results into a set of Figures with which you are happy, then the real job of writing can start in earnest. You can feel a rush, and that the hitherto virtual paper has finally come palpably into ‘existence’, when you identify the Main Figure. If you have a main figure that expresses the main idea of your work you now have the germ of an accepted manuscript! You can imagine a dream paper with just one Figure. You are sure to run into problems, with writing a paper and getting it accepted, if you cannot identify a single main figure around which to build your story.
This main figure is a visual representation of the main idea of your paper-to-be. Note, if you feel you have two main figures the first thing to ask is whether they make the same point or if one confirms the other and is subordinate to it and thus not the main figure after all? If the two figures make the same point they can be combined into one figure. If the two main figures really are irreconcilably at the top of the pile the answer is to write two papers. The most common example of the two-main-figure problem is when you are reporting a new method and a new result that you found with it at the same time. In this case it is important to decide, for your approach and for the letter to editor, if you are submitting a methods paper or a results paper and to subordinate one part to the other. It is however even better for several reasons to write two papers in each of which the other subordinate part is played down. This gives you two papers probably in different journals and it minimizes the risk for the editor who will prefer to build your results on the ‘solid ground’ of an already-published method.
Before writing the first draft, then, it is important to identify a hierarchy of figures, in which up to three extra figures confirm your Main Figure. These 4 Figures, max total, then are the core of your paper. If you have further figures that you want to use that do not directly confirm the main figure, leave them out, at least at this point where you need a clear story skeleton to get the bare bones down in a rough first draft. Finally, you can shuffle your select pertinent figures into an order that tells that story. This does not have to be the order you did the experiments in. Now you can begin the writing.
Remember that at the start the journal editor is just looking for the weakest link.
Further advice on Figures.
- Make figures ‘square’ (or rectangular) in lay-out so that like things line up with like, in rows and columns, i.e. no irregular shaped figures with bumps and holes please. Also try to group similar techniques together in individual figures rather than having every figure as a mix of methods. It will be visually more pleasing and less confusing overall. For example if you did the same variety of techniques on two different genes, combine both genes into each figure part. The hardest part of reading is understanding the reason for experiments and presenting the same experiments over and over is laborious.
- Note also that a figure needs to make a point and have something in it you can literally point at to point something out. No matter how descriptive your study, if you cannot point at something in your figure there is no point to it being in print. For example if you are describing a new better x ray structure determination technique, put an arrow that points to a hydrogen bond that was not previously visible. Please don’t just put the picture in all its uninterpreted beauty. Figures without virtual or real arrows can be converted to Table or moved to Supplementary.
- The figure should be understandable as much as possible at the ‘buffet’ level (see above), without reading the figure legend, i.e. put words into the figure that indicate key differences as near to the heart of the action as possible (e.g., on graph lines rather than in a legend if non-cluttering permits). Remove any redundant cluttering e.g. repeated words on multiple panels – line things up.
- Figure Legends should describe every word and object, including the virtual or real arrow (see above), that appears unexplained in the Figure. The figure and its legend are a unit that should be fully understood in isolation and stand alone, i.e., put in information that is specific to your study and avoid general figure titles like ‘Genomic Analysis’ that could be excised from absolutely any paper.
Step 2 -Results
- The Main Figure in part 1 above contains the most important idea of your paper. This idea can be translated into a question and this question is really a hypothesis (that the answer to your question is yes). The hypothesis responds to a knowledge gap/problem in the field. So start your Results – ‘In order to attack problem a, we asked the question b. To do this we used such and such materials and methods.’
- After the first two sentences give the positive results leaving mention of controls until the end of the paragraph. i.e.- we found x and y.. then ‘..the control z was OK’ (optional). Try not to mention the controls at all if they can be clearly indicated in the figure legends.
- Finish the paragraph with a take-home message such as ‘we found p therefore our (question) q is moving forward in such and such a way…’.
- Start the next paragraph by reminding the reader where we have got to i.e ‘To further delve into our question q we.. did x and found y, in a revolving door scenario.
- The Results section is the main part of your story and it should be a story and not an itemized stock-take of the data. Point out trends and patterns as you go along, highlighting and making comparisons to keep the reader engaged. Note however that this is not the full in-depth Discussion of course.
- For a brilliant demonstration of what the Results section is for, including a definition of good and bad witches(!), see the USF writing commons series on scientific writing with Kristin Sainani on YouTube.
Step 3 – Introduction and Discussion
The Introduction and Discussion are the creative hub of your paper that package your Results and Figures. The Introduction and Discussion are the Before and After. They are like a descent into the Grand Canyon (skip this bit if you don’t like analogies) and the ascent on the other side: the contours of both journeys should correspond in reverse order. Starting the Introduction at ground level the reader is invited into a wide general field with a view of the study’s significance at the end of the Discussion on the horizon. Descending into the subject the Introduction stealthily and ineluctably points down toward a specific question/problem at the bottom end of the Introduction. After wading through to the other side of the Results the reader is given the specific answer/solution and main idea of the paper at the beginning of the Discussion and is gently but surely then taken back up to ground level to see the wider implications to researchers and to society.
The Introduction is not a full review of a subject.
- 2 double spaced pages, no more – What is the field you are working in and why? Where is the knowledge gap/problem? Just enough words to get the reader from, not thinking about your work, to ‘the question’.
- Last paragraph. What methods you used to solve the problem.
- In the last line it is optional to repeat what you found. You don’t need to say what you found at the end of the Introduction as it is already there at the end of the Abstract. The end of the Introduction is the one place in a scientific paper where a hint of mystery and leading-the-reader-on is OK.
Also just 2 double-spaced pages please (i.e. less than 1000 words max). Every sentence in the Discussion should be clearly related to your work and not exclusively about other work. If your Discussion contains a sentence that states a fact and finishes with a reference, consider how things could be improved. Delete that sentence or add a clause that relates it to your current work as discussed in the previous and following sentences. When talking about other work it should be clear how it relates back to your current paper.
- First line – What is ‘the answer’ to the problem. Note – this is not the same as ‘what were your results?’ (again). This is a place for interpretation of your results.
- Second line and on – justify your answer. Develop, discuss shortcomings.
- Conclusion/End of Discussion- Paper must come to a clear end. Take a look forward. Re-emphasize the newness/advance. Don’t finish (any section) with references to other work. This is about your work.
Check that Introduction and Discussion correspond (slopes of grand canyon).
As much as possible, everything in the Introduction should be dealt with in the Discussion and everything in the Discussion alluded to in the Introduction. Make sure you have always described your work from one consistent ‘point of view’ so that the most obstinate of readers or reviewers cannot fail to follow your argument. For example, if the fundamental question/hypothesis that your Introduction poses, is the following,
Q- We set out to discover if binge drinking the night before a race reduces the performance of top athletes –
do not report the answer in the Discussion like this,
A1- We found that an alcohol-free early night led to times that were close to athletes’ personal bests.
Report the answer rather like this,
A2- We found that binge drinking the night before a race reduced the performance of top athletes–
i.e., in exactly the same form and from the same point of view as the question.
With the Introduction, Results and Discussion all complete, as with the figure-shuffling at the beginning of this writing process, this is a moment to reflect. Have you done all the experiments you need?
Step 4 – Abstract and Title
The Abstract is at once the ‘Contents’ of your paper and its representation/advertisement online.
As the Contents it should have a snippet from the Introduction, followed by a bit of the Methods and Results, rounded off by the main point from the Discussion, and strictly in that order. As online advertisement for your paper, Stephen B. Heard likens the Abstract to speed-dating and the Title to a pick-up line – ‘A weak title clears its throat politely in a noisy bar and will certainly be going home alone’ he says poignantly. The title should be 10 words or less because that is all the eye can scan. Note the main structure of every sentence should be in this scannable early window of <10 words (see below). For a high impact title, that maximizes the chance of appearing in a PubMed search, try making a list of Keywords and put as many of them in the Title as possible and juggle them (see below also) as near the beginning of the title as possible. Then put just the unused keywords in you actual keywords list.
Step 5 -Methods and References.
Materials and Methods – This section is basically a protocol in the past tense. It usually comes before the Results so it allows you to mention the methods-used very fleetingly in the Results section so that your focus there remains on interpreting the significance of your results.
References – References should be ‘precise and to the point’, which in slightly out-of-date English used to be the meaning of the word ‘nice’. References should be nice to the reader in the modern sense too. References should be there for the reason that the reader might want to look up what you have said. Journal editors and readers prefer avoiding too many references. Journal editors do not want to print two pages of titles and authors. Readers do not want sentences clumsily punctuated by numbers or first-author names. so my take home message is reduce references to a minimum and try to do without them (terms and conditions apply). Avoid placing references in the middle of sentences. References can be made to appear at sentence ends by splitting what are otherwise probably overly long or complex sentences in two. Avoid having more than 1-3 references in any one place. Again if you have 6 references to point the reader to at the end of a sentence there is probably more that you haven’t said about these references that you should make another sentence for the readers about. A six-reference ‘dump’ at the end of a sentence is akin to over-preliminary data in a results section. it is an unprocessed note-to-self that needs further work before presenting to the reader (see advice in next paragraph). A trick to avoid citing six references by the same author is to refer to most recent and add ..’and references therein’. If the recent paper only cites four of the five you can reduce your citations to the recent one and the one that was missed with this phrase.
Reference software packages such as Endnote are easy to use and essential to use because they are so good. I recommend a third and final thinking point before subjecting your manuscript to scrutiny by peers, freelance editors (like myself) and journals (reviewers and editors). That is the following – Your references are a fingerprint of your paper and you should master them just as you master your Introduction, Results and Discussion. If you go to ‘Output Styles’ in Endnote and choose ‘Annotated’ you can print up all the Abstracts of the papers you have cited in one go.
I suggest you then read your sequence of abstracts like a paper, with a marker and pen, for a final distillation of your thoughts. You will find there are References to leave out and things in other Abstracts you should mention in the text and further papers that you should read in more detail right now to complete your visual assessment of your scientific buffet. This may all lead you to look at references within your references that you had not previously considered. Reading your own reference list (at least at the Abstract level) is the third and final reflection point where the writing process feeds in to improve your science, as once noted over 50 years ago.
This concludes the first section of this tutorial on improving scientific writing, which was specific to the main sections of a typical scientific paper. There is more widely applicable and at the same time more detailed advice below.
Step 6 – Flow and order
Having established the gross structure of the paper in the above sections, as the reader expects to find them, it is time to start fine-tuning the sentences while maintaining the view from 10 km above. Before perfecting each sentence in isolation, then, it is important to think about how your sentences join together and flow into one another.
At the gross level, your Introduction has introduced the following in the correct order: the field, the problem/knowledge gap, proposed hypothesis and method of attack, results and significance. It is now important to check that there are transitions at the micro or intra-sentence level. These transitions are important to keep the reader permanently ‘in the know’ as to what is being talked about and so they never have to finish a sentence or paragraph just to start to understand it. The reader thus never needs to waste time re-reading.
The most important rule for the reader’s bearings is to reiterate old before new so that sentences link to each other and flow together directionally in time from start to finish. For example my following sentences are admittedly nonsense but due to the placement of old before new they form readable nonsense.
‘The field of hydrodynamics is in flux and it suffers from a lot of blockages that lead to inappropriate pressure. In this study we proposed to ease that pressure with fluidics. Our results show that fluidics ease pressure and release all blockages’. This is a satisfying string that resolves the ideas it introduces while keeping the reader on the page i.e. it flows! This is because new concepts (blue and mauve) are introduced in the context of old concepts.
When next considering sentences in isolation it is also paramount to consider the reader experience. In your lab book you likely have phrases like..’so x causes y eureka‘ inspired by the exciting long-sought discovery of x as the answer to your long-internalized problem y. However no editor wants to publish your lab book and this needs ‘translating’ for the reader into reverse order, i.e. ‘We found the phenomenon y is caused by x‘. For the researcher the answer from years at the bench solves the problem but for the (hands-off) reader the cause and effect needs repackaging so that the problem leads ‘magically’ to the answer. The problem needs to be presented as the old and the answer as the new (see above).
Within a sentence this transects the concept of topic and stress, where the topic (or subject, grammatically speaking) is at the beginning of the sentence and the stress (or object) is at the end. For example, a lawyer would sum up by saying ‘My client was mistreated by this crook‘ rather than ‘This crook mistreated my client’ to point the finger and stress that this crook should go to jail.
The flow and order between and in your sentences then affects their meaning, so the main priority for the writer is to grapple with the syntax to get the words in the sentences in their rightful positions. The lawyer George Gopen likens this process to Feng Shui where the contents of rooms are shuffled to perfection. I like this analogy as applied to punctuation where commas, for example, are like separators that partition different parts of a sentence or room without sealing them off completely. Note that the legal and scientific worlds are also analogous as they are both quests for truth: researchers doing experiments and writing papers as both detective and prosecutor, whilst the editor and reviewers act as judge and jury. The Feng-shui of grammar that George Gopen coined, though, is the switch that we inadvertently used twice in the last paragraph, namely between the active and passive voice.
The Active voice is subject then verb with or without object afterwards e.g. dogs hate cats.
The Passive voice is basically an active voice sentence in reverse order i.e. with object before verb and subject after, where the verb is some version of ‘to be’ followed by the past tense of another verb e.g. cats ‘are’ hated by dogs
When referring to your own work, please do not use the passive voice and refer to yourself in the third person as this sounds insecure. For example do not say ‘x was demonstrated by Blogs et al’, if you could take credit and say ‘we demonstrated x (Blogs et al.)’. The active voice is also best to move the story forward in the Results e.g. – ‘Next we performed three thousand PCRs’.
The passive can be preferable in Methods where who the agent is is irrelevant.
e.g. – ‘3000 PCRs were performed in high throughput’.
Generally the active is desirable as it is lively, shorter and easier to read.
However do not hesitate to use the passive to make sentences flow together with old before new or to give sentences their desired emphasis with topic before stress.
Avoid the ‘ultra-passive’ form of verbs known as Nominalizations, in which the verb is turned into a noun, as this can really deaden your text. In this ultra-passive voice the active ‘Dogs hate cats’ becomes ‘Dogs have a hatred for cats’. This might work if the subject of your paragraph is really hatred in all its forms. However if you are actually really talking about dogs or cats the active is far nicer. I could imagine that the ultra-passive (nominalization) could be applicable in a meta-study of techniques where you are comparing performances of different techniques. Generally, however, nominalizations put too much emphasis on the researchers’ acts and this makes the prose sound snobby. Typical verb nominalizations to avoid in scientific writing include analysis (analyze is better), assessment (assess..), decision (decide), formation (form), inhibition (inhibit), measurement (measure), removal (remove) and suggestion (suggest).
As mentioned above, flow and order are important to save the reader effort and avoid overloading their short term memory, which leads them to stall and re-read passages. As also mentioned above a title should be less than ten words so that the visual cortex can process it instantly and the same goes for the structural elements of a sentence which should be less than ten words and appear at the start of the sentence. The reader will thus instantaneously understand the import of your sentences and never have to re-read them. For this reason it is important to.. avoid split predicates i.e. do no split the verb from its subject. The verb is the life-blood or point of the sentence so lively prose generally gets to the verb in the first 9 words.
‘Oranges, despite being garishly colored and full of citric acid, make a refreshing drink’
has a split predicate.
It is easier to grasp when scanning text when the verb (make) arrives in the first 9 words before the dependent phrase as in..
Oranges make a refreshing drink, despite being garishly colored and full of citric acid.
Another principle that stacks the sentence structure into the first 10- word unit is to keep
short clauses before long. For example the short and long clauses are best in version 2 with short before long because the reader has to get to the end to understand the structure in version 1.
1. ‘these viruses affect some whales, including Blues, Greys, Fin whales, Belugas, Humpbacks, Minkes and Sperm whales and all porpoises‘.
2. ‘these viruses affect all porpoises and some whales, including Blues, Greys, Fin whales, Belugas, Humpbacks, Minkes and Sperm whales‘
To summarize, the most important points of the above section I would say that sentences should link to the sentence before and after to maintain flow and that their messages are best bite-sized 10-word chunks. An ‘ideal sentence’ should be around 20 words to incorporate both the message and the linkages. This brings me to Eric Lichfouse’s 20+10 rule for sentence length from his short primer on scientific writing, available in English or French.
The 20+10 rule is like a Richter or decibel scale for incomprehensible sentences. it says that for each 10 words above 20, a sentence becomes twice as hard to read. So for a 70-word sentence, which is not uncommon, there is an ‘excess of 50 words’ i.e. 5 extra 10s, or a 2x2x2x2x2 = a 32-fold increase in difficulty! You want to avoid this and especially if you are not an English native speaker where a single out of place word among dozens can lead to ambiguous relationships that may cause information overload.
In a typical successful newspaper like the New York Times the average sentence length is just 15 words. Try to keep sentence length down to 22 words on average in your scientific papers to maintain reader avidity and start to split sentences into separate points when they exceed 25 words. An example of such a sentence might be of the form ‘A and b lead to the questions p and q, which is consistent with hypothesis y and not z’. This could be split into the following sentence pair.
‘A and b lead to the questions p and q. This is consistent with hypothesis y and not z’.
The two sentences make for easier reading but are also problematic, it transpires. By splitting into smaller units we realize that the logical arguments are not fully expounded in the original and that the trivial implied word ‘this’ has an unclear antecedent. To resolve this ambiguity our simpler sentences can now add needed specific logic as in the revised version
‘A and b lead to the questions p and q. These inferences/questions are consistent with hypotheses y and z’.
Now we have a chance to specify if we meant that ‘a and b leading to’ (inferences) or just ‘the resulting p and q’ (questions) are consistent with the best hypothesis. I would go further to argue that splitting sentences can go a long, long way to making text more informative as it uncovers layers of hidden logic. For example a further step would now be to add reasons why hypothesis y is more adapted. What would be the case if z were true instead? It all needs spelling out simply, from one who is kept awake at night by these problems to one who has never thought about them before. Educate as much as possible. Experienced experts and novices alike will appreciate it.
Indeed all terms that indicate logical relationship between ideas are not just crutches and they all need using correctly. Connector words like this, therefore, thus, because, hence, similarly, then, until then, for example, that is, specifically, in addition, also, next, in contrast, despite this, however, although, on the other hand, nevertheless, in conclusion etc. all illustrate relationships that need to be clear to the reader.
A serious tool for the committed obfuscator is to condense complex logical relationships into strings of adjectives and nouns acting as adjectives. These ‘noun clusters’ or ‘noun strings’ are a form of abbreviation or jargon with inherent ambiguities. For example the worst one I could find, from a book by Janice and Robert Matthews entitled Successful Scientific Writing, was ‘two week old single comb white leghorn specific pathogen free chickens’. There is maybe a serious ambiguity here about what is ‘specific’ to what, but the main point is that noun strings will force readers to re-read the list. So avoid even three-word noun clusters as they are likely to be ambiguous and probably act as speed bumps.
Commonly noun clusters go to 5 words like ‘molecular biology quality absolute ethanol’. Now it depends how familiar you are with the field whether you understand these beasts, so they should be spelled out the first time i.e. in reverse order as in ‘ethanol that is of absolutely pure and of sufficient quality for molecular biology (or even biology of the molecular sub-type)’. Note that French is practically the same language as English except everything is done in the opposite order and the French have it right with their noun clusters and would say éthanol absolu de qualité biologie moléculaire, which is much better as you know from the start that the subject is ‘ethanol’.
Abbreviations and brackets
Avoid them to smooth the reader experience. Experienced journal editor Eric Lichtfouse asks for a maximum of 3 abbreviations per paper, excluding completely well-known things like DNA. Substitute all abbreviations that appear less than 10 times in a paper. Avoid using an abbreviation 5 times in a paragraph etc.
If I can keep condensing my above message to a shorter and shorter take home message then it would be ‘avoid all forms of possible confusion and your paper will be accepted’. So one more form of obfuscation are bulgers where a simple word like ‘if’ is replaced by the needless ‘in the event of’ or where ‘most’ becomes ‘the majority of’. Another space-taker is over-hedging. Scientists get it if you are hedging so do not overdo it with phrases like ‘suggests it may be‘. If ‘suggests’ and ‘may be’ both mean a 50% chance, does ‘Suggests it may be’ mean a 25% chance? If so it is better to choose a ‘25% word’ like ‘possibly’. Why use two words when one will do?
Possible exceptions to the above simplification rules are where the scientific paper becomes too casual. it is a fine line but the words show, find, did, tried, got and made can be correctly replaced with the more formal demonstrate, discover, perform, test, obtain and construct.
A final form of casualness that I find imprecise is anthropomorphism (humanizing) of lab objects like proteins, cells and tissues that elevates them to the level of actors in plays. For example why say ‘tumors are associated with poor survival’ like ‘Billy associates with bad company’ when you can say the more precise and impersonal ‘tumors correlate with poor survival’. Like much in this tutorial it is my subjective view but to me this kind of fluff adds false liveliness that obscures the buzz you get from understanding the wonders of nature through simple clear prose. I notice this tendency even in British and American writers to replace the auxiliary verbs to have and to be with more ‘exciting’ verbs and thus displace the real biological action verb to an anthropomorphic action verb that trivializes the text.
For example ‘the cells show/display/exhibit an aggressive phenotype’ could be more biological using just the auxiliary verb as in ‘the cells have an aggressive phenotype’ or ‘the cells are aggressive’. Having made this simplification you can further dehumanize your cells and say simply ‘the cells are invasive’. Having removed the ‘exhibit an aggressive phenotype‘ anthropomorphism you could start adding real biological information by putting the action in the verb and adding a real object such as in ‘the cells invade both Matrigel and stroma’. Note that de-embellishing sentences of theatrical words allows you to think more clearly about what you mean to say and to add real information in the same way as splitting sentences into their core messages.
Here are some further examples of anthropomorphism?
‘Protein degradation represents an important process’
vs. ‘Protein degradation is important’?
‘The intron/plasmid harbors/contains/possesses weak splice sites/a LacZ gene’.
or ‘The plasmid has a LacZ gene’?
Finally, if this wave of advice about clear writing is a bit overwhelming, help is at hand from human and machine editors.
Try this fun site to tighten up your writing
You can paste in any text and start editing it online until all the knots are ironed out, and all the colors disappear from the text. Its free and fun – try it!
And the take-home message is this.
Writing is constructive and as such it is actually fun.
If you want to know more about scientific writing I have two more books to recommend.
This book was written 25 years ago but it reads fresh today – the classic reference.
and this book was published last year and is pretty comprehensive –