5B Coding for Experiments

~10 hours Power, Surveys, Randomization Intermediate-Advanced

Learning Objectives

Calculate statistical power and determine required sample sizes
Program surveys and integrate with platforms like Qualtrics
Implement randomization with proper stratification
Distribute experiments via Prolific and other platforms
Verify randomization and check balance of covariates

Running experiments requires careful planning and precise implementation. This module covers the programming aspects of experimental research—from determining how many participants you need, to building your survey, randomizing treatments, and verifying that everything worked correctly.

Essential Reference

This module draws heavily on:
Duflo, E., Glennerster, R., & Kremer, M. (2007). "Using Randomization in Development Economics Research: A Toolkit." Handbook of Development Economics, Vol. 4.
Available at: J-PAL Resources

Module Overview

This module is organized into four subpages, each covering a critical aspect of experimental implementation:

5B.1 Power Analysis

How many participants do you need? Learn to calculate statistical power, determine minimum detectable effects, and plan sample sizes for simple and complex designs.

Formulas, simulation, Stata/R/Python tools

5B.2 Survey Programming

Build surveys programmatically, integrate with Qualtrics, manage URL parameters for treatment assignment, and connect with panel platforms like Prolific.

Qualtrics, Prolific, URL parameters

5B.3 Randomization

Implement proper randomization: simple random assignment, stratified randomization, block randomization, and cluster randomization. Verify integrity.

randtreat, randomizr, stratification

5B.4 Balance Checks

Verify that randomization worked: create balance tables, test for systematic differences, and handle attrition. Publication-ready tables.

iebaltab, cobalt, attrition analysis

The Experimental Pipeline

A well-executed experiment follows this workflow:

Power Analysis (Pre-registration)
Before data collection: Determine how many participants you need to detect your effect of interest. Write a pre-analysis plan.
Survey/Instrument Design
Before launch: Program your survey, set up treatment arms, test thoroughly with pilot participants.
Randomization
At recruitment: Randomly assign participants to treatment and control groups, potentially stratifying on key variables.
Data Collection
During experiment: Monitor response rates, check for technical issues, manage panel recruitment.
Balance Verification
After collection: Confirm that randomization produced comparable groups before analyzing outcomes.
Analysis
Final stage: Estimate treatment effects following your pre-analysis plan. See Module 6.

Key Concepts Preview

Statistical Power

Power is the probability of detecting an effect when it truly exists. The standard target is 80% power at a 5% significance level. The minimum detectable effect (MDE) depends on:

Sample size (N): More participants = smaller detectable effects
Outcome variance (σ²): More noise = need more data
Treatment allocation (P): 50-50 split is optimal for simple designs
Significance level (α): Usually 0.05

Power Formula (Simple RCT)

MDE = (t_1-κ + t_α/2) × √[ σ² / (P × (1-P) × N) ] Where: - t_1-κ ≈ 0.84 for 80% power - t_α/2 ≈ 1.96 for 5% significance (two-sided) - P = proportion treated (often 0.5) - σ² = outcome variance - N = total sample size

Randomization Methods

Method	Description	Use When
Simple	Coin flip for each unit	Large samples, no important stratifying variables
Stratified	Randomize within subgroups	Want balance on key variables (gender, region)
Block	Fixed number per block	Want exact proportions in each stratum
Cluster	Randomize groups, not individuals	Treatment at group level (schools, villages)

Balance Tables

A balance table compares treatment and control groups on baseline characteristics. It typically shows:

Mean (or proportion) in each group
Difference between groups
P-value testing if difference is statistically significant
Standardized difference (effect size)

Good randomization should produce no systematic differences—p-values should be distributed uniformly, and you should not see more significant differences than expected by chance.

Essential Tools

Stata: randtreat (randomization), iebaltab (balance tables)
R: randomizr package, cobalt package
Python: randomization module, custom implementations
Survey platforms: Qualtrics, SurveyMonkey, Google Forms
Panels: Prolific, MTurk, CloudResearch