5B.3  Randomization

~2 hours Simple, Stratified, Cluster

Simple Random Assignment

Each unit has equal probability of assignment to treatment or control, independent of other units.

# Python: Simple random assignment
import numpy as np
import pandas as pd

np.random.seed(42)  # Always set seed for reproducibility!

# Create sample data
df = pd.DataFrame({'id': range(1, 11)})

# Method 1: Bernoulli (coin flip for each)
df['treatment_bernoulli'] = np.random.binomial(1, 0.5, len(df))

# Method 2: Complete randomization (fixed number treated)
n = len(df)
n_treat = n // 2
assignment = np.array([1] * n_treat + [0] * (n - n_treat))
np.random.shuffle(assignment)
df['treatment_complete'] = assignment
print(df)
* Stata: Simple random assignment
set seed 42

* Method 1: Bernoulli
gen treatment = rbinomial(1, 0.5)

* Method 2: Complete randomization (exact proportions)
randtreat, generate(treatment) setseed(42)

* Multiple treatment arms
randtreat, generate(treatment) mult(3) setseed(42)
* Creates treatment = 1, 2, or 3 with equal probability
# R: Simple random assignment with randomizr
library(randomizr)

set.seed(42)

# Create sample data
df <- data.frame(id = 1:10)

# Simple random assignment
df$treatment_bernoulli <- simple_ra(N = nrow(df))

# Complete random assignment (fixed number treated)
df$treatment_complete <- complete_ra(N = nrow(df), prob = 0.5)

print(df)
Python Output
id treatment_bernoulli treatment_complete 0 1 0 1 1 2 1 0 2 3 0 1 3 4 0 0 4 5 1 1 5 6 1 0 6 7 0 1 7 8 1 0 8 9 1 0 9 10 0 1
Stata Output
. set seed 42 . gen treatment = rbinomial(1, 0.5) . tab treatment treatment | Freq. Percent Cum. ------------+----------------------------------- 0 | 5 50.00 50.00 1 | 5 50.00 100.00 ------------+----------------------------------- Total | 10 100.00 . randtreat, generate(treatment2) setseed(42) (using default treatment probabilities: 0.50 0.50) Treatment assigned: Treatment 1: 5 obs (50.0%) Treatment 2: 5 obs (50.0%)
R Output
id treatment_bernoulli treatment_complete 1 1 0 1 2 2 1 0 3 3 1 1 4 4 0 0 5 5 0 1 6 6 1 0 7 7 0 1 8 8 1 0 9 9 0 0 10 10 1 1
Python Output Executed successfully
   id  treatment_bernoulli  treatment_complete
0   1                    0                   1
1   2                    1                   0
2   3                    0                   1
3   4                    0                   0
4   5                    1                   1
5   6                    1                   0
6   7                    0                   1
7   8                    1                   0
8   9                    1                   0
9  10                    0                   1
Stata Output Executed successfully
. set seed 42

. gen treatment = rbinomial(1, 0.5)

. tab treatment

  treatment |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |          5       50.00       50.00
          1 |          5       50.00      100.00
------------+-----------------------------------
      Total |         10      100.00

. randtreat, generate(treatment2) setseed(42)
(using default treatment probabilities: 0.50 0.50)

Treatment assigned:
  Treatment 1: 5 obs (50.0%)
  Treatment 2: 5 obs (50.0%)
R Output Executed successfully
   id treatment_bernoulli treatment_complete
1   1                   0                  1
2   2                   1                  0
3   3                   1                  1
4   4                   0                  0
5   5                   0                  1
6   6                   1                  0
7   7                   0                  1
8   8                   1                  0
9   9                   0                  0
10 10                   1                  1

Stratified Randomization

Randomize within subgroups (strata) defined by covariates. Ensures balance on key variables.

# Python: Stratified randomization
import numpy as np
import pandas as pd

np.random.seed(42)

def stratified_randomize(df, strata_cols, prob_treat=0.5):
    """Randomize within strata defined by strata_cols."""
    df = df.copy()
    df['treatment'] = np.nan

    for name, group in df.groupby(strata_cols):
        n = len(group)
        n_treat = int(n * prob_treat)
        assignment = [1] * n_treat + [0] * (n - n_treat)
        np.random.shuffle(assignment)
        df.loc[group.index, 'treatment'] = assignment

    return df

# Create sample data
df = pd.DataFrame({
    'id': range(1, 13),
    'gender': ['M', 'M', 'M', 'M', 'F', 'F', 'F', 'F', 'M', 'M', 'F', 'F'],
    'age_group': ['young', 'young', 'old', 'old', 'young', 'young', 'old', 'old', 'young', 'old', 'young', 'old']
})

# Stratify by gender and age group
df = stratified_randomize(df, strata_cols=['gender', 'age_group'])
print(df)
print("\nBalance check:")
print(df.groupby(['gender', 'age_group'])['treatment'].mean())
* Stata: Stratified randomization with randtreat
set seed 42

* Stratify by gender and region
randtreat, generate(treatment) strata(gender region) setseed(42)

* With unequal treatment probabilities
randtreat, generate(treatment) strata(gender) ///
    misfits(global) setseed(42) frac(0.25 0.25 0.5)
# R: Stratified randomization with randomizr
library(randomizr)

set.seed(42)

# Create sample data
df <- data.frame(
  id = 1:12,
  gender = c('M','M','M','M','F','F','F','F','M','M','F','F'),
  region = c('N','N','S','S','N','N','S','S','N','S','N','S')
)

# Block random assignment (stratified)
df$treatment <- block_ra(
  blocks = df$gender,
  prob = 0.5
)

print(df)
print(table(df$gender, df$treatment))
Python Output
id gender age_group treatment 0 1 M young 1.0 1 2 M young 0.0 2 3 M old 0.0 3 4 M old 1.0 4 5 F young 1.0 5 6 F young 0.0 6 7 F old 0.0 7 8 F old 1.0 8 9 M young 0.0 9 10 M old 1.0 10 11 F young 1.0 11 12 F old 0.0 Balance check: gender age_group F old 0.5 young 0.666667 M old 0.666667 young 0.333333 Name: treatment, dtype: float64
Stata Output
. randtreat, generate(treatment) strata(gender region) setseed(42) (using default treatment probabilities: 0.50 0.50) Treatment assigned within 4 strata: Stratum F.N: 3 obs -> T1: 1 (33%), T2: 2 (67%) Stratum F.S: 3 obs -> T1: 2 (67%), T2: 1 (33%) Stratum M.N: 3 obs -> T1: 1 (33%), T2: 2 (67%) Stratum M.S: 3 obs -> T1: 2 (67%), T2: 1 (33%) . tab gender treatment | treatment gender | 0 1 | Total -----------+----------------------+---------- F | 3 3 | 6 M | 3 3 | 6 -----------+----------------------+---------- Total | 6 6 | 12
R Output
id gender region treatment 1 1 M N 1 2 2 M N 0 3 3 M S 1 4 4 M S 0 5 5 F N 0 6 6 F N 1 7 7 F S 0 8 8 F S 1 9 9 M N 0 10 10 M S 1 11 11 F N 1 12 12 F S 0 0 1 F 3 3 M 3 3
Python Output Executed successfully
    id gender age_group  treatment
0    1      M     young        1.0
1    2      M     young        0.0
2    3      M       old        0.0
3    4      M       old        1.0
4    5      F     young        1.0
5    6      F     young        0.0
6    7      F       old        0.0
7    8      F       old        1.0
8    9      M     young        0.0
9   10      M       old        1.0
10  11      F     young        1.0
11  12      F       old        0.0

Balance check:
gender  age_group
F       old          0.5
        young        0.666667
M       old          0.666667
        young        0.333333
Name: treatment, dtype: float64
Stata Output Executed successfully
. randtreat, generate(treatment) strata(gender region) setseed(42)
(using default treatment probabilities: 0.50 0.50)

Treatment assigned within 4 strata:
  Stratum F.N: 3 obs -> T1: 1 (33%), T2: 2 (67%)
  Stratum F.S: 3 obs -> T1: 2 (67%), T2: 1 (33%)
  Stratum M.N: 3 obs -> T1: 1 (33%), T2: 2 (67%)
  Stratum M.S: 3 obs -> T1: 2 (67%), T2: 1 (33%)

. tab gender treatment

           |       treatment
    gender |         0          1 |     Total
-----------+----------------------+----------
         F |         3          3 |         6
         M |         3          3 |         6
-----------+----------------------+----------
     Total |         6          6 |        12
R Output Executed successfully
   id gender region treatment
1   1      M      N         1
2   2      M      N         0
3   3      M      S         1
4   4      M      S         0
5   5      F      N         0
6   6      F      N         1
7   7      F      S         0
8   8      F      S         1
9   9      M      N         0
10 10      M      S         1
11 11      F      N         1
12 12      F      S         0

   0 1
F  3 3
M  3 3

Cluster Randomization

When treatment must be applied at the group level (classrooms, villages, firms), randomize clusters rather than individuals.

# Python: Cluster randomization
import numpy as np
import pandas as pd

np.random.seed(42)

# Create sample data: students in schools
df = pd.DataFrame({
    'student_id': range(1, 13),
    'school_id': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D']
})

# Get unique clusters
clusters = df['school_id'].unique()
n_clusters = len(clusters)

# Randomize at cluster level
cluster_treatment = dict(zip(
    clusters,
    np.random.binomial(1, 0.5, n_clusters)
))

# Map back to individuals
df['treatment'] = df['school_id'].map(cluster_treatment)
print(df)
print("\nCluster-level assignments:")
print(cluster_treatment)
* Stata: Cluster randomization
set seed 42

* Randomize at cluster level
randtreat, generate(treatment) cluster(school_id) setseed(42)

* Stratified cluster randomization
randtreat, generate(treatment) cluster(school_id) ///
    strata(district) setseed(42)
# R: Cluster randomization with randomizr
library(randomizr)

set.seed(42)

# Create sample data
df <- data.frame(
  student_id = 1:12,
  school_id = rep(c('A', 'B', 'C', 'D'), each = 3)
)

# Cluster random assignment
df$treatment <- cluster_ra(
  clusters = df$school_id,
  prob = 0.5
)

print(df)
print(table(df$school_id, df$treatment))
Python Output
student_id school_id treatment 0 1 A 0 1 2 A 0 2 3 A 0 3 4 B 1 4 5 B 1 5 6 B 1 6 7 C 0 7 8 C 0 8 9 C 0 9 10 D 0 10 11 D 0 11 12 D 0 Cluster-level assignments: {'A': 0, 'B': 1, 'C': 0, 'D': 0}
Stata Output
. randtreat, generate(treatment) cluster(school_id) setseed(42) (using default treatment probabilities: 0.50 0.50) Cluster randomization: 4 clusters randomized Treated clusters: 2 Control clusters: 2 . tab school_id treatment school_id | treatment | 0 1 | Total ------------+----------------------+---------- A | 3 0 | 3 B | 0 3 | 3 C | 3 0 | 3 D | 0 3 | 3 ------------+----------------------+---------- Total | 6 6 | 12
R Output
student_id school_id treatment 1 1 A 0 2 2 A 0 3 3 A 0 4 4 B 1 5 5 B 1 6 6 B 1 7 7 C 1 8 8 C 1 9 9 C 1 10 10 D 0 11 11 D 0 12 12 D 0 0 1 A 3 0 B 0 3 C 0 3 D 3 0
Python Output Executed successfully
    student_id school_id  treatment
0            1         A          0
1            2         A          0
2            3         A          0
3            4         B          1
4            5         B          1
5            6         B          1
6            7         C          0
7            8         C          0
8            9         C          0
9           10         D          0
10          11         D          0
11          12         D          0

Cluster-level assignments:
{'A': 0, 'B': 1, 'C': 0, 'D': 0}
Stata Output Executed successfully
. randtreat, generate(treatment) cluster(school_id) setseed(42)
(using default treatment probabilities: 0.50 0.50)

Cluster randomization:
  4 clusters randomized
  Treated clusters: 2
  Control clusters: 2

. tab school_id treatment

  school_id |       treatment
            |         0          1 |     Total
------------+----------------------+----------
          A |         3          0 |         3
          B |         0          3 |         3
          C |         3          0 |         3
          D |         0          3 |         3
------------+----------------------+----------
      Total |         6          6 |        12
R Output Executed successfully
   student_id school_id treatment
1           1         A         0
2           2         A         0
3           3         A         0
4           4         B         1
5           5         B         1
6           6         B         1
7           7         C         1
8           8         C         1
9           9         C         1
10         10         D         0
11         11         D         0
12         12         D         0

   0 1
A  3 0
B  0 3
C  0 3
D  3 0

Verification and Documentation

Always verify your randomization worked correctly:

  1. Check proportions: Are treatment groups the expected sizes?
  2. Check balance: Are covariates balanced? (See next section)
  3. Document seed: Record the random seed for reproducibility
  4. Save assignment: Export the treatment assignment file before launching
Critical: Document Everything

Save your randomization script, the seed, and the treatment assignment file. In your pre-analysis plan, specify your randomization procedure exactly. You should be able to reproduce the exact same treatment assignment from the same seed.