5B.3 Randomization
Table of Contents
Simple Random Assignment
Each unit has equal probability of assignment to treatment or control, independent of other units.
# Python: Simple random assignment
import numpy as np
import pandas as pd
np.random.seed(42) # Always set seed for reproducibility!
# Create sample data
df = pd.DataFrame({'id': range(1, 11)})
# Method 1: Bernoulli (coin flip for each)
df['treatment_bernoulli'] = np.random.binomial(1, 0.5, len(df))
# Method 2: Complete randomization (fixed number treated)
n = len(df)
n_treat = n // 2
assignment = np.array([1] * n_treat + [0] * (n - n_treat))
np.random.shuffle(assignment)
df['treatment_complete'] = assignment
print(df)
* Stata: Simple random assignment
set seed 42
* Method 1: Bernoulli
gen treatment = rbinomial(1, 0.5)
* Method 2: Complete randomization (exact proportions)
randtreat, generate(treatment) setseed(42)
* Multiple treatment arms
randtreat, generate(treatment) mult(3) setseed(42)
* Creates treatment = 1, 2, or 3 with equal probability
# R: Simple random assignment with randomizr
library(randomizr)
set.seed(42)
# Create sample data
df <- data.frame(id = 1:10)
# Simple random assignment
df$treatment_bernoulli <- simple_ra(N = nrow(df))
# Complete random assignment (fixed number treated)
df$treatment_complete <- complete_ra(N = nrow(df), prob = 0.5)
print(df)
id treatment_bernoulli treatment_complete 0 1 0 1 1 2 1 0 2 3 0 1 3 4 0 0 4 5 1 1 5 6 1 0 6 7 0 1 7 8 1 0 8 9 1 0 9 10 0 1
. set seed 42
. gen treatment = rbinomial(1, 0.5)
. tab treatment
treatment | Freq. Percent Cum.
------------+-----------------------------------
0 | 5 50.00 50.00
1 | 5 50.00 100.00
------------+-----------------------------------
Total | 10 100.00
. randtreat, generate(treatment2) setseed(42)
(using default treatment probabilities: 0.50 0.50)
Treatment assigned:
Treatment 1: 5 obs (50.0%)
Treatment 2: 5 obs (50.0%)
id treatment_bernoulli treatment_complete 1 1 0 1 2 2 1 0 3 3 1 1 4 4 0 0 5 5 0 1 6 6 1 0 7 7 0 1 8 8 1 0 9 9 0 0 10 10 1 1
Stratified Randomization
Randomize within subgroups (strata) defined by covariates. Ensures balance on key variables.
# Python: Stratified randomization
import numpy as np
import pandas as pd
np.random.seed(42)
def stratified_randomize(df, strata_cols, prob_treat=0.5):
"""Randomize within strata defined by strata_cols."""
df = df.copy()
df['treatment'] = np.nan
for name, group in df.groupby(strata_cols):
n = len(group)
n_treat = int(n * prob_treat)
assignment = [1] * n_treat + [0] * (n - n_treat)
np.random.shuffle(assignment)
df.loc[group.index, 'treatment'] = assignment
return df
# Create sample data
df = pd.DataFrame({
'id': range(1, 13),
'gender': ['M', 'M', 'M', 'M', 'F', 'F', 'F', 'F', 'M', 'M', 'F', 'F'],
'age_group': ['young', 'young', 'old', 'old', 'young', 'young', 'old', 'old', 'young', 'old', 'young', 'old']
})
# Stratify by gender and age group
df = stratified_randomize(df, strata_cols=['gender', 'age_group'])
print(df)
print("\nBalance check:")
print(df.groupby(['gender', 'age_group'])['treatment'].mean())
* Stata: Stratified randomization with randtreat
set seed 42
* Stratify by gender and region
randtreat, generate(treatment) strata(gender region) setseed(42)
* With unequal treatment probabilities
randtreat, generate(treatment) strata(gender) ///
misfits(global) setseed(42) frac(0.25 0.25 0.5)
# R: Stratified randomization with randomizr
library(randomizr)
set.seed(42)
# Create sample data
df <- data.frame(
id = 1:12,
gender = c('M','M','M','M','F','F','F','F','M','M','F','F'),
region = c('N','N','S','S','N','N','S','S','N','S','N','S')
)
# Block random assignment (stratified)
df$treatment <- block_ra(
blocks = df$gender,
prob = 0.5
)
print(df)
print(table(df$gender, df$treatment))
id gender age_group treatment
0 1 M young 1.0
1 2 M young 0.0
2 3 M old 0.0
3 4 M old 1.0
4 5 F young 1.0
5 6 F young 0.0
6 7 F old 0.0
7 8 F old 1.0
8 9 M young 0.0
9 10 M old 1.0
10 11 F young 1.0
11 12 F old 0.0
Balance check:
gender age_group
F old 0.5
young 0.666667
M old 0.666667
young 0.333333
Name: treatment, dtype: float64
. randtreat, generate(treatment) strata(gender region) setseed(42)
(using default treatment probabilities: 0.50 0.50)
Treatment assigned within 4 strata:
Stratum F.N: 3 obs -> T1: 1 (33%), T2: 2 (67%)
Stratum F.S: 3 obs -> T1: 2 (67%), T2: 1 (33%)
Stratum M.N: 3 obs -> T1: 1 (33%), T2: 2 (67%)
Stratum M.S: 3 obs -> T1: 2 (67%), T2: 1 (33%)
. tab gender treatment
| treatment
gender | 0 1 | Total
-----------+----------------------+----------
F | 3 3 | 6
M | 3 3 | 6
-----------+----------------------+----------
Total | 6 6 | 12
id gender region treatment 1 1 M N 1 2 2 M N 0 3 3 M S 1 4 4 M S 0 5 5 F N 0 6 6 F N 1 7 7 F S 0 8 8 F S 1 9 9 M N 0 10 10 M S 1 11 11 F N 1 12 12 F S 0 0 1 F 3 3 M 3 3
Cluster Randomization
When treatment must be applied at the group level (classrooms, villages, firms), randomize clusters rather than individuals.
# Python: Cluster randomization
import numpy as np
import pandas as pd
np.random.seed(42)
# Create sample data: students in schools
df = pd.DataFrame({
'student_id': range(1, 13),
'school_id': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D']
})
# Get unique clusters
clusters = df['school_id'].unique()
n_clusters = len(clusters)
# Randomize at cluster level
cluster_treatment = dict(zip(
clusters,
np.random.binomial(1, 0.5, n_clusters)
))
# Map back to individuals
df['treatment'] = df['school_id'].map(cluster_treatment)
print(df)
print("\nCluster-level assignments:")
print(cluster_treatment)
* Stata: Cluster randomization
set seed 42
* Randomize at cluster level
randtreat, generate(treatment) cluster(school_id) setseed(42)
* Stratified cluster randomization
randtreat, generate(treatment) cluster(school_id) ///
strata(district) setseed(42)
# R: Cluster randomization with randomizr
library(randomizr)
set.seed(42)
# Create sample data
df <- data.frame(
student_id = 1:12,
school_id = rep(c('A', 'B', 'C', 'D'), each = 3)
)
# Cluster random assignment
df$treatment <- cluster_ra(
clusters = df$school_id,
prob = 0.5
)
print(df)
print(table(df$school_id, df$treatment))
student_id school_id treatment
0 1 A 0
1 2 A 0
2 3 A 0
3 4 B 1
4 5 B 1
5 6 B 1
6 7 C 0
7 8 C 0
8 9 C 0
9 10 D 0
10 11 D 0
11 12 D 0
Cluster-level assignments:
{'A': 0, 'B': 1, 'C': 0, 'D': 0}
. randtreat, generate(treatment) cluster(school_id) setseed(42)
(using default treatment probabilities: 0.50 0.50)
Cluster randomization:
4 clusters randomized
Treated clusters: 2
Control clusters: 2
. tab school_id treatment
school_id | treatment
| 0 1 | Total
------------+----------------------+----------
A | 3 0 | 3
B | 0 3 | 3
C | 3 0 | 3
D | 0 3 | 3
------------+----------------------+----------
Total | 6 6 | 12
student_id school_id treatment 1 1 A 0 2 2 A 0 3 3 A 0 4 4 B 1 5 5 B 1 6 6 B 1 7 7 C 1 8 8 C 1 9 9 C 1 10 10 D 0 11 11 D 0 12 12 D 0 0 1 A 3 0 B 0 3 C 0 3 D 3 0
Verification and Documentation
Always verify your randomization worked correctly:
- Check proportions: Are treatment groups the expected sizes?
- Check balance: Are covariates balanced? (See next section)
- Document seed: Record the random seed for reproducibility
- Save assignment: Export the treatment assignment file before launching
Save your randomization script, the seed, and the treatment assignment file. In your pre-analysis plan, specify your randomization procedure exactly. You should be able to reproduce the exact same treatment assignment from the same seed.