Sample Size in Logistic Regression: A Simple Binary Approach

sample size in logistic regression

This article will guide you through calculating the sample size for a Simple Binary Logistic Regression. We will utilize the popular and freely available software G*Power , which is one of the most used for this purpose. We were inspired to create this article after realizing that many online tutorials for G*Power-based sample size calculations are inaccurate.

Selecting The Logistic Regression Analysis

Upon downloading and installing G*Power, open it and choose the sample size calculation option for logistic regression analysis by clicking Tests: Correlation and regression: Logistic regression tab.

GPower Logistic Regression Sample Size Calculation 01

One-tailed or Two-tailed?

Next, input the following parameters: In Tail(s), select One for one-tailed tests or Two for two-tailed tests. A one-tailed test is appropriate for a specific alternative hypothesis, such as “increased value of X corresponds to a higher probability of the event occurring.“ A two-tailed test is suitable for a general alternative hypothesis, like “X influences event Y,” without an initial directional distinction. Base your hypothesis on existing knowledge in your field. If unsure, opt for Two (two-tailed).

GPower Logistic Regression Sample Size Calculation 01

Significance Level

The significance level (α) represents the probability of rejecting the null hypothesis when it is true, leading to a type I error. Typically, α is set at 0.05 or 0.01. An α of 0.05, for instance, indicates a 5% risk of concluding a significant relationship exists when none actually does.

GPower Logistic Regression Sample Size Calculation 01

Statistical Power

The Test Power (1 – β) is the probability of rejecting the null hypothesis if false, effectively controlling for type II error (β). Acceptable values usually range from 0.80 to 0.99. Higher test power is preferable but also increases the required sample size.

GPower Logistic Regression Sample Size Calculation 01

R² of Other Explanatory Variables (X)

Since simple binary logistic regression models only have one independent variable, set this value to zero.

GPower Logistic Regression Sample Size Calculation 01

APPLIED STATISTICS - DATA ANALYSIS

❓ CONFUSED BY DATA ANALYSIS?

Our Comprehensive Guide Will Make It Crystal Clear
Click to Learn More!

Variable X Distribution

Select the distribution type for variable X, which is the independent or predictor variable in the model. Choose Binomial for binary variables, Normal for continuous quantitative variables, and Poisson for discrete variables. Use other available distributions only if necessary.

GPower Logistic Regression Sample Size Calculation 01

Pilot Study?

The last four parameters require population estimates from a pilot study, similar research, or theoretical calculations. If possible, use data from a pilot study, as we will demonstrate here.

Estimated Mean and Standard Deviation

Enter variable X’s mean and standard deviation from the pilot study data for the Population Mean of Variable X and Population Standard Deviation of Variable X parameters, respectively.

GPower Logistic Regression Sample Size Calculation 01

Preliminary Analysis

Obtain the final two parameters through a preliminary simple logistic regression analysis with the pilot study data. For our demonstration, we will use the free and easy-to-use PSPP software. Any software capable of running logistic regression can be used. Learn how to use PSPP by clicking here!

Odds Ratio

The odds ratio indicates the association between an exposure and an outcome and measures effect size. Perform a preliminary analysis with the pilot data in PSPP to obtain the odds ratio value (Exp(B) for variable X, 1.48 in this example) and enter it in G*Power.

Probability of y = 1 under H0

The final parameter is the probability of occurrence of the dependent variable (y = 1) when the null hypothesis (H0) is true, i.e., when the coefficient of the independent variable (X) equals 0, and the model only contains the intercept. To calculate this value, enter the constant (intercept) estimate B from the previous step into the following Excel formula: =EXP(B)/(1+EXP(B))

For our example: =EXP(-1.85)/(1+EXP(-1.85)) =0.1355

Conclusion

With all parameters entered in G*Power, click Calculate to obtain the sample size determined by the calculation! In our example, the sample size required to identify the estimated odds ratio is 97 individuals randomly sampled from the target population. By following these steps and using G*Power, you can effectively calculate the appropriate sample size for a Simple Binary Logistic Regression analysis. This process allows you to optimize your study design, minimize errors, and improve the validity of your findings. Furthermore, understanding the role of different parameters in determining sample size contributes to a comprehensive grasp of logistic regression as a whole.

GPower Logistic Regression Sample Size Calculation 01

Want to learn how to calculate sample size in G*Power for the most crucial inferential analyses? Don’t miss out on the FREE samples of our recently launched digital book! Inside, you’ll master sample size calculation for independent or paired t-tests; one- or two-way ANOVA, with or without repeated measures, and mixed models; simple and multiple linear and logistic regression, and more. Click this link and discover everything it has to offer: Applied Statistics: Data Analysis.

Applied Statistics: Data Analysis

Visit us on our social networks

DAILY POSTS ON INSTAGRAM!
OTHER VERSION

Similar Posts

Bayesian Statistics Thomas Bayes

Bayesian Statistics: A Practical Introduction for Frequentist Practitioners

By Learn Statistics Easily March 26, 2024 March 26, 2024

Unlock the potential of Bayesian Statistics with our practical guide for frequentist statisticians, featuring hands-on R examples.

normal distribution

Understanding Normal Distribution: A Comprehensive Guide

By Learn Statistics Easily June 3, 2023 April 18, 2024

Explore our comprehensive guide to understanding normal distribution, a statistical concept known as Gaussian distribution or bell curve.

Logistic Regression Scikit Learn

Logistic Regression Scikit-Learn: A Comprehensive Guide for Data Scientists

By Learn Statistics Easily February 14, 2024 February 15, 2024

Master logistic regression scikit learn techniques for predictive modeling. Enhance your data science skills with our comprehensive guide.

Machine Learning Support Vector Machines

Machine Learning Support Vector Machines: A Comprehensive Guide

By Learn Statistics Easily February 29, 2024 March 28, 2024

Unlock the power of Machine Learning Support Vector Machines: your definitive guide to SVM principles, applications, and tutorials.

convenience sampling

Understanding Convenience Sampling: Pros, Cons, and Best Practices

By Learn Statistics Easily June 2, 2023 April 18, 2024

Dive into the world of convenience sampling! Understand its pros, cons, and best practices in our comprehensive guide.

box plot

Box Plot: A Powerful Data Visualization Tool

By Learn Statistics Easily June 9, 2023 April 20, 2024

Unearth the power of Box Plots in statistical data analysis. Learn to create, interpret and apply these graphical data displays!