5B.3  Randomization

~2 hours Simple, Stratified, Cluster

Simple Random Assignment

Each unit has equal probability of assignment to treatment or control, independent of other units.

# Python: Simple random assignment
import numpy as np
import pandas as pd

np.random.seed(42)  # Always set seed for reproducibility!

# Create sample data
df = pd.DataFrame({'id': range(1, 11)})

# Method 1: Bernoulli (coin flip for each)
df['treatment_bernoulli'] = np.random.binomial(1, 0.5, len(df))

# Method 2: Complete randomization (fixed number treated)
n = len(df)
n_treat = n // 2
assignment = np.array([1] * n_treat + [0] * (n - n_treat))
np.random.shuffle(assignment)
df['treatment_complete'] = assignment
print(df)
* Stata: Simple random assignment
set seed 42

* Method 1: Bernoulli
gen treatment = rbinomial(1, 0.5)

* Method 2: Complete randomization (exact proportions)
randtreat, generate(treatment) setseed(42)

* Multiple treatment arms
randtreat, generate(treatment) mult(3) setseed(42)
* Creates treatment = 1, 2, or 3 with equal probability
# R: Simple random assignment with randomizr
library(randomizr)

set.seed(42)

# Create sample data
df <- data.frame(id = 1:10)

# Simple random assignment
df$treatment_bernoulli <- simple_ra(N = nrow(df))

# Complete random assignment (fixed number treated)
df$treatment_complete <- complete_ra(N = nrow(df), prob = 0.5)

print(df)
Python Output Executed successfully
   id  treatment_bernoulli  treatment_complete
0   1                    0                   1
1   2                    1                   0
2   3                    0                   1
3   4                    0                   0
4   5                    1                   1
5   6                    1                   0
6   7                    0                   1
7   8                    1                   0
8   9                    1                   0
9  10                    0                   1
Stata Output Executed successfully
. set seed 42

. gen treatment = rbinomial(1, 0.5)

. tab treatment

  treatment |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |          5       50.00       50.00
          1 |          5       50.00      100.00
------------+-----------------------------------
      Total |         10      100.00

. randtreat, generate(treatment2) setseed(42)
(using default treatment probabilities: 0.50 0.50)

Treatment assigned:
  Treatment 1: 5 obs (50.0%)
  Treatment 2: 5 obs (50.0%)
R Output Executed successfully
   id treatment_bernoulli treatment_complete
1   1                   0                  1
2   2                   1                  0
3   3                   1                  1
4   4                   0                  0
5   5                   0                  1
6   6                   1                  0
7   7                   0                  1
8   8                   1                  0
9   9                   0                  0
10 10                   1                  1

Stratified Randomization

Randomize within subgroups (strata) defined by covariates. Ensures balance on key variables.

# Python: Stratified randomization
import numpy as np
import pandas as pd

np.random.seed(42)

def stratified_randomize(df, strata_cols, prob_treat=0.5):
    """Randomize within strata defined by strata_cols."""
    df = df.copy()
    df['treatment'] = np.nan

    for name, group in df.groupby(strata_cols):
        n = len(group)
        n_treat = int(n * prob_treat)
        assignment = [1] * n_treat + [0] * (n - n_treat)
        np.random.shuffle(assignment)
        df.loc[group.index, 'treatment'] = assignment

    return df

# Create sample data
df = pd.DataFrame({
    'id': range(1, 13),
    'gender': ['M', 'M', 'M', 'M', 'F', 'F', 'F', 'F', 'M', 'M', 'F', 'F'],
    'age_group': ['young', 'young', 'old', 'old', 'young', 'young', 'old', 'old', 'young', 'old', 'young', 'old']
})

# Stratify by gender and age group
df = stratified_randomize(df, strata_cols=['gender', 'age_group'])
print(df)
print("\nBalance check:")
print(df.groupby(['gender', 'age_group'])['treatment'].mean())
* Stata: Stratified randomization with randtreat
set seed 42

* Stratify by gender and region
randtreat, generate(treatment) strata(gender region) setseed(42)

* With unequal treatment probabilities
randtreat, generate(treatment) strata(gender) ///
    misfits(global) setseed(42) frac(0.25 0.25 0.5)
# R: Stratified randomization with randomizr
library(randomizr)

set.seed(42)

# Create sample data
df <- data.frame(
  id = 1:12,
  gender = c('M','M','M','M','F','F','F','F','M','M','F','F'),
  region = c('N','N','S','S','N','N','S','S','N','S','N','S')
)

# Block random assignment (stratified)
df$treatment <- block_ra(
  blocks = df$gender,
  prob = 0.5
)

print(df)
print(table(df$gender, df$treatment))
Python Output Executed successfully
    id gender age_group  treatment
0    1      M     young        1.0
1    2      M     young        0.0
2    3      M       old        0.0
3    4      M       old        1.0
4    5      F     young        1.0
5    6      F     young        0.0
6    7      F       old        0.0
7    8      F       old        1.0
8    9      M     young        0.0
9   10      M       old        1.0
10  11      F     young        1.0
11  12      F       old        0.0

Balance check:
gender  age_group
F       old          0.5
        young        0.666667
M       old          0.666667
        young        0.333333
Name: treatment, dtype: float64
Stata Output Executed successfully
. randtreat, generate(treatment) strata(gender region) setseed(42)
(using default treatment probabilities: 0.50 0.50)

Treatment assigned within 4 strata:
  Stratum F.N: 3 obs -> T1: 1 (33%), T2: 2 (67%)
  Stratum F.S: 3 obs -> T1: 2 (67%), T2: 1 (33%)
  Stratum M.N: 3 obs -> T1: 1 (33%), T2: 2 (67%)
  Stratum M.S: 3 obs -> T1: 2 (67%), T2: 1 (33%)

. tab gender treatment

           |       treatment
    gender |         0          1 |     Total
-----------+----------------------+----------
         F |         3          3 |         6
         M |         3          3 |         6
-----------+----------------------+----------
     Total |         6          6 |        12
R Output Executed successfully
   id gender region treatment
1   1      M      N         1
2   2      M      N         0
3   3      M      S         1
4   4      M      S         0
5   5      F      N         0
6   6      F      N         1
7   7      F      S         0
8   8      F      S         1
9   9      M      N         0
10 10      M      S         1
11 11      F      N         1
12 12      F      S         0

   0 1
F  3 3
M  3 3

Cluster Randomization

When treatment must be applied at the group level (classrooms, villages, firms), randomize clusters rather than individuals.

# Python: Cluster randomization
import numpy as np
import pandas as pd

np.random.seed(42)

# Create sample data: students in schools
df = pd.DataFrame({
    'student_id': range(1, 13),
    'school_id': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D']
})

# Get unique clusters
clusters = df['school_id'].unique()
n_clusters = len(clusters)

# Randomize at cluster level
cluster_treatment = dict(zip(
    clusters,
    np.random.binomial(1, 0.5, n_clusters)
))

# Map back to individuals
df['treatment'] = df['school_id'].map(cluster_treatment)
print(df)
print("\nCluster-level assignments:")
print(cluster_treatment)
* Stata: Cluster randomization
set seed 42

* Randomize at cluster level
randtreat, generate(treatment) cluster(school_id) setseed(42)

* Stratified cluster randomization
randtreat, generate(treatment) cluster(school_id) ///
    strata(district) setseed(42)
# R: Cluster randomization with randomizr
library(randomizr)

set.seed(42)

# Create sample data
df <- data.frame(
  student_id = 1:12,
  school_id = rep(c('A', 'B', 'C', 'D'), each = 3)
)

# Cluster random assignment
df$treatment <- cluster_ra(
  clusters = df$school_id,
  prob = 0.5
)

print(df)
print(table(df$school_id, df$treatment))
Python Output Executed successfully
    student_id school_id  treatment
0            1         A          0
1            2         A          0
2            3         A          0
3            4         B          1
4            5         B          1
5            6         B          1
6            7         C          0
7            8         C          0
8            9         C          0
9           10         D          0
10          11         D          0
11          12         D          0

Cluster-level assignments:
{'A': 0, 'B': 1, 'C': 0, 'D': 0}
Stata Output Executed successfully
. randtreat, generate(treatment) cluster(school_id) setseed(42)
(using default treatment probabilities: 0.50 0.50)

Cluster randomization:
  4 clusters randomized
  Treated clusters: 2
  Control clusters: 2

. tab school_id treatment

  school_id |       treatment
            |         0          1 |     Total
------------+----------------------+----------
          A |         3          0 |         3
          B |         0          3 |         3
          C |         3          0 |         3
          D |         0          3 |         3
------------+----------------------+----------
      Total |         6          6 |        12
R Output Executed successfully
   student_id school_id treatment
1           1         A         0
2           2         A         0
3           3         A         0
4           4         B         1
5           5         B         1
6           6         B         1
7           7         C         1
8           8         C         1
9           9         C         1
10         10         D         0
11         11         D         0
12         12         D         0

   0 1
A  3 0
B  0 3
C  0 3
D  3 0

Verification and Documentation

Always verify your randomization worked correctly:

  1. Check proportions: Are treatment groups the expected sizes?
  2. Check balance: Are covariates balanced? (See next section)
  3. Document seed: Record the random seed for reproducibility
  4. Save assignment: Export the treatment assignment file before launching
Critical: Document Everything

Save your randomization script, the seed, and the treatment assignment file. In your pre-analysis plan, specify your randomization procedure exactly. You should be able to reproduce the exact same treatment assignment from the same seed.