Poisson regression

Pamela Warner

doi:10.1136/jfprhc-2015-101262

Article Text

PDF

Noteworthy statistics

Poisson regression

Free

Pamela Warner

Reader in Medical Statistics, Centre for Population Health Sciences, University of Edinburgh, Edinburgh, UK

Correspondence to Dr Pamela Warner, Centre for Population Health Sciences, University of Edinburgh, Medical School, Teviot Place, Edinburgh EH8 9AG, UK; p.warner{at}ed.ac.uk

https://doi.org/10.1136/jfprhc-2015-101262

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

statistics

Background

The Rashed et al.1 article in this issue uses Poisson regression as a statistical tool. These notes are intended to provide some supplementary explanation of this method (see Box 1 for a glossary of terms used in this article).

Box 1

Glossary of statistical terms used in this article*

Confidence interval (95%)	This defines a range of values within which we are 95% confident the true population value lies (e.g. for an estimated population mean, rate, difference in proportions).
Cross-classified/Cross-tabulation	Subdividing individuals studied in terms of two (or more) categorical variables, jointly. If two variables then an R×C table of counts is produced, where R=number of rows and C=number of columns. Each cell of the table is termed a frequency count of individuals with that combination of row and column values.
Estimation (inference)	This involves use of a summary value calculated from the sample data (e.g. mean, rate) to make an inference about the true value for that entity (parameter) in the background population from which the sample was drawn. The single parameter is a ‘point estimate’, but it is good practice to provide also an interval estimate. See Confidence interval.
Explanatory variable	A feature potentially associated with outcome (e.g. calendar year).
Frequency count	See Cross-classified.
Hypothesis test/Significance test	The process of testing aims to enable a binary decision to be made about the null hypothesis (NH) – reject NH or not. This decision is based on the significance probability (p value) obtained via the test.
Independent (events)	The occurrence of any one event does not influence the occurrence of other events.
Multivariate modelling	A method of analysis that allows assessment of the association between an outcome and some explanatory variable(s) of interest (or here, event rate). The analysis is comparing rates between risk factor subgroups, and these comparisons are expressed as rate ratios.
Null hypothesis (NH)	A statement, prior to testing, of no effect (e.g. ‘no association’ between row and column classifications). See also Significance probability.
Odds (of an outcome)	The number of occurrences of that outcome, divided by non-occurrences (i.e. cases÷non-cases).
Poisson distribution	A discrete probability distribution for the number of events that occur.
p value	See Significance probability.
Rate	A rate summarises a quantity in relation to another unit of measurement, usually but not always in relation to time. For example, births per year is a rate per time unit, whereas perinatal mortality expresses neonatal deaths per live births.
Response variable	The outcome variable to be modelled/tested.
Significance probability (p value)	The probability, if the NH is true, of obtaining the observed data or something more ‘extreme’ (i.e. further from the NH). The smaller the p value is, then the less likely this data would be under the NH, and so the greater our doubts that the NH is indeed true.
Valid method of analysis	Used here loosely to mean a method that is suited to the research question and data variable(s) to be analysed, the latter including that the data to be analysed satisfy the data assumptions required for the method, so that results will not be misleading, on that account.

*Definitions of other terms used can be found in previous Noteworthy Statistics articles.3–5

What is Poisson regression?

When the response variable is a count of the number of occurrences of an event (such as a births, or prescriptions for contraception), Poisson regression is commonly used to model the association between that count of events and any potential explanatory variables.2 Poisson regression is also suitable for analysis of rate data, where the rate is the count of events occurring per some unit of observation – for example, a specified time period (see also Warner3). In the Rashed et al. article, what is modelled is rate of prescriptions for contraception within calendar years.1

When/why is it useful?

This multivariate method has a range of uses, but the main one is comparison of rates between circumstances (e.g. historical time, i.e. years 2002 to 2011)1 or between specific subgroup characteristics (e.g. younger and older teenagers).1 It can also be used to control for confounding, and to provide population estimates [with confidence intervals (CIs)].

There are many similarities in approach between logistic regression4 and Poisson regression. The key difference between the methods is that logistic regression focuses on a binary outcome (e.g. being a ‘case’ or not, such as positive test result for chlamydia), and models this odds of being a case, within subgroups cross-classified in terms of potential explanatory variables (e.g. use of condoms, etc.), whereas Poisson regression focuses on how many cases there are (per cross-classification, and perhaps per unit of measurement, e.g. calendar year). Poisson regression is particularly useful for fairly rare events.

What precautions are needed?

In order to be valid, Poisson regression requires that the count variable has a Poisson distribution, and that the occurrence of events is independent. However, many research variables exhibit more dispersion than they should, if truly following a Poisson distribution. The impact of this on the results of analysis would be (slightly) misleading CIs, narrower than they truly should be to provide the stated ‘confidence’ in the estimate. Dispersion therefore needs to be checked as a preliminary to Poisson regression analysis. Other pitfalls with Poisson regression are the same as those encountered with other methods of multivariate modelling, including logistic regression.4 ,5

Example

Figure 1 in the Rashed et al. article shows the estimates (with CIs) that were obtained for rates of prescription for contraception by calendar year for the two teenage subgroups.1 The model has also been used to compare prescription rates between the two teenage subgroups, combined across calendar years. Not surprisingly, given the graph, the null hypothesis of no difference between prescription rates for older and younger teenagers can be rejected (p<0.001, Results paragraph 1). In Figure 1 the increase in prescribing across years appears to have been modest,1 and no p value is reported, so presumably there was no statistically significant difference across calendar years.

Overview

Poisson regression is a useful method for analysis of event rates and comparison across levels of explanatory variables.

References

↵
1. Rashed AN,
2. Hsia Y,
3. Wilton L, et al
. Trends and patterns of hormonal contraception prescribing for adolescents in primary care in the UK. J Fam Plann Reprod Health Care 2015:41:216–222. doi:10.1136/jfprhc-2012-100486
OpenUrl Abstract/FREE Full Text
↵
1. Kirkwood BR,
2. Sterne JAC
. Essential Medical Statistics. Oxford, UK: Blackwell Science, 2003.
↵
1. Warner P
. Rate and rate ratio. J Fam Plann Reprod Health Care 2009;35:111–113. doi:10.1783/147118909787931609
OpenUrl FREE Full Text
↵
1. Warner P
. Testing and quantifying association in binary data. J Fam Plann Reprod Health Care 2009;35:26–27. doi:10.1783/147118909787072414
OpenUrl FREE Full Text
↵
1. Warner P
. Modelling with multiple explanatory variables. J Fam Plann Reprod Health Care 2011;37:32–34. doi:10.1136/jfprhc.2010.0034
OpenUrl FREE Full Text

Footnotes

Competing interests None declared.
Provenance and peer review Commissioned; internally peer reviewed.

Linked Articles

Highlights from this issue
In this issue

British Medical Journal Publishing Group
BMJ Sexual & Reproductive Health 2015; 41 159-160 Published Online First: 22 Jun 2015. doi: 10.1136/jfprhc-2015-101277

[1] ↵
Rashed AN,
Hsia Y,
Wilton L, et al
. Trends and patterns of hormonal contraception prescribing for adolescents in primary care in the UK. J Fam Plann Reprod Health Care 2015:41:216–222. doi:10.1136/jfprhc-2012-100486
OpenUrl Abstract/FREE Full Text

[2] Rashed AN,

[3] Hsia Y,

[4] Wilton L, et al

[5] ↵
Kirkwood BR,
Sterne JAC
. Essential Medical Statistics. Oxford, UK: Blackwell Science, 2003.

[6] Kirkwood BR,

[7] Sterne JAC

[8] ↵
Warner P
. Rate and rate ratio. J Fam Plann Reprod Health Care 2009;35:111–113. doi:10.1783/147118909787931609
OpenUrl FREE Full Text

[9] Warner P

[10] ↵
Warner P
. Testing and quantifying association in binary data. J Fam Plann Reprod Health Care 2009;35:26–27. doi:10.1783/147118909787072414
OpenUrl FREE Full Text

[11] Warner P

[12] ↵
Warner P
. Modelling with multiple explanatory variables. J Fam Plann Reprod Health Care 2011;37:32–34. doi:10.1136/jfprhc.2010.0034
OpenUrl FREE Full Text

[13] Warner P

Log in using your username and password

Main menu

Log in using your username and password

You are here

Statistics from Altmetric.com

Request Permissions

Background

Glossary of statistical terms used in this article*

What is Poisson regression?

When/why is it useful?

What precautions are needed?

Example

Overview

References

Footnotes

Linked Articles

Read the full text or download the PDF:

Log in using your username and password