5B.3 Randomization

~2 hours Simple, Stratified, Cluster

Simple Random Assignment
Stratified Randomization
Cluster Randomization
Verification and Documentation

Simple Random Assignment

Each unit has equal probability of assignment to treatment or control, independent of other units.

# Python: Simple random assignment
import numpy as np
import pandas as pd

np.random.seed(42)  # Always set seed for reproducibility!

# Create sample data
df = pd.DataFrame({'id': range(1, 11)})

# Method 1: Bernoulli (coin flip for each)
df['treatment_bernoulli'] = np.random.binomial(1, 0.5, len(df))

# Method 2: Complete randomization (fixed number treated)
n = len(df)
n_treat = n // 2
assignment = np.array([1] * n_treat + [0] * (n - n_treat))
np.random.shuffle(assignment)
df['treatment_complete'] = assignment
print(df)

* Stata: Simple random assignment
set seed 42

* Method 1: Bernoulli
gen treatment = rbinomial(1, 0.5)

* Method 2: Complete randomization (exact proportions)
randtreat, generate(treatment) setseed(42)

* Multiple treatment arms
randtreat, generate(treatment) mult(3) setseed(42)
* Creates treatment = 1, 2, or 3 with equal probability

# R: Simple random assignment with randomizr
library(randomizr)

set.seed(42)

# Create sample data
df <- data.frame(id = 1:10)

# Simple random assignment
df$treatment_bernoulli <- simple_ra(N = nrow(df))

# Complete random assignment (fixed number treated)
df$treatment_complete <- complete_ra(N = nrow(df), prob = 0.5)

print(df)

Python Output

id treatment_bernoulli treatment_complete 0 1 0 1 1 2 1 0 2 3 0 1 3 4 0 0 4 5 1 1 5 6 1 0 6 7 0 1 7 8 1 0 8 9 1 0 9 10 0 1

Stata Output

. set seed 42 . gen treatment = rbinomial(1, 0.5) . tab treatment treatment | Freq. Percent Cum. ------------+----------------------------------- 0 | 5 50.00 50.00 1 | 5 50.00 100.00 ------------+----------------------------------- Total | 10 100.00 . randtreat, generate(treatment2) setseed(42) (using default treatment probabilities: 0.50 0.50) Treatment assigned: Treatment 1: 5 obs (50.0%) Treatment 2: 5 obs (50.0%)

R Output

id treatment_bernoulli treatment_complete 1 1 0 1 2 2 1 0 3 3 1 1 4 4 0 0 5 5 0 1 6 6 1 0 7 7 0 1 8 8 1 0 9 9 0 0 10 10 1 1

► Python Output Executed successfully

   id  treatment_bernoulli  treatment_complete
0   1                    0                   1
1   2                    1                   0
2   3                    0                   1
3   4                    0                   0
4   5                    1                   1
5   6                    1                   0
6   7                    0                   1
7   8                    1                   0
8   9                    1                   0
9  10                    0                   1

► Stata Output Executed successfully

. set seed 42

. gen treatment = rbinomial(1, 0.5)

. tab treatment

  treatment |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |          5       50.00       50.00
          1 |          5       50.00      100.00
------------+-----------------------------------
      Total |         10      100.00

. randtreat, generate(treatment2) setseed(42)
(using default treatment probabilities: 0.50 0.50)

Treatment assigned:
  Treatment 1: 5 obs (50.0%)
  Treatment 2: 5 obs (50.0%)

► R Output Executed successfully

   id treatment_bernoulli treatment_complete
1   1                   0                  1
2   2                   1                  0
3   3                   1                  1
4   4                   0                  0
5   5                   0                  1
6   6                   1                  0
7   7                   0                  1
8   8                   1                  0
9   9                   0                  0
10 10                   1                  1

Stratified Randomization

Randomize within subgroups (strata) defined by covariates. Ensures balance on key variables.

# Python: Stratified randomization
import numpy as np
import pandas as pd

np.random.seed(42)

def stratified_randomize(df, strata_cols, prob_treat=0.5):
    """Randomize within strata defined by strata_cols."""
    df = df.copy()
    df['treatment'] = np.nan

    for name, group in df.groupby(strata_cols):
        n = len(group)
        n_treat = int(n * prob_treat)
        assignment = [1] * n_treat + [0] * (n - n_treat)
        np.random.shuffle(assignment)
        df.loc[group.index, 'treatment'] = assignment

    return df

# Create sample data
df = pd.DataFrame({
    'id': range(1, 13),
    'gender': ['M', 'M', 'M', 'M', 'F', 'F', 'F', 'F', 'M', 'M', 'F', 'F'],
    'age_group': ['young', 'young', 'old', 'old', 'young', 'young', 'old', 'old', 'young', 'old', 'young', 'old']
})

# Stratify by gender and age group
df = stratified_randomize(df, strata_cols=['gender', 'age_group'])
print(df)
print("\nBalance check:")
print(df.groupby(['gender', 'age_group'])['treatment'].mean())

* Stata: Stratified randomization with randtreat
set seed 42

* Stratify by gender and region
randtreat, generate(treatment) strata(gender region) setseed(42)

* With unequal treatment probabilities
randtreat, generate(treatment) strata(gender) ///
    misfits(global) setseed(42) frac(0.25 0.25 0.5)

# R: Stratified randomization with randomizr
library(randomizr)

set.seed(42)

# Create sample data
df <- data.frame(
  id = 1:12,
  gender = c('M','M','M','M','F','F','F','F','M','M','F','F'),
  region = c('N','N','S','S','N','N','S','S','N','S','N','S')
)

# Block random assignment (stratified)
df$treatment <- block_ra(
  blocks = df$gender,
  prob = 0.5
)

print(df)
print(table(df$gender, df$treatment))

Python Output

id gender age_group treatment 0 1 M young 1.0 1 2 M young 0.0 2 3 M old 0.0 3 4 M old 1.0 4 5 F young 1.0 5 6 F young 0.0 6 7 F old 0.0 7 8 F old 1.0 8 9 M young 0.0 9 10 M old 1.0 10 11 F young 1.0 11 12 F old 0.0 Balance check: gender age_group F old 0.5 young 0.666667 M old 0.666667 young 0.333333 Name: treatment, dtype: float64

Stata Output

. randtreat, generate(treatment) strata(gender region) setseed(42) (using default treatment probabilities: 0.50 0.50) Treatment assigned within 4 strata: Stratum F.N: 3 obs -> T1: 1 (33%), T2: 2 (67%) Stratum F.S: 3 obs -> T1: 2 (67%), T2: 1 (33%) Stratum M.N: 3 obs -> T1: 1 (33%), T2: 2 (67%) Stratum M.S: 3 obs -> T1: 2 (67%), T2: 1 (33%) . tab gender treatment | treatment gender | 0 1 | Total -----------+----------------------+---------- F | 3 3 | 6 M | 3 3 | 6 -----------+----------------------+---------- Total | 6 6 | 12

R Output

id gender region treatment 1 1 M N 1 2 2 M N 0 3 3 M S 1 4 4 M S 0 5 5 F N 0 6 6 F N 1 7 7 F S 0 8 8 F S 1 9 9 M N 0 10 10 M S 1 11 11 F N 1 12 12 F S 0 0 1 F 3 3 M 3 3

► Python Output Executed successfully

    id gender age_group  treatment
0    1      M     young        1.0
1    2      M     young        0.0
2    3      M       old        0.0
3    4      M       old        1.0
4    5      F     young        1.0
5    6      F     young        0.0
6    7      F       old        0.0
7    8      F       old        1.0
8    9      M     young        0.0
9   10      M       old        1.0
10  11      F     young        1.0
11  12      F       old        0.0

Balance check:
gender  age_group
F       old          0.5
        young        0.666667
M       old          0.666667
        young        0.333333
Name: treatment, dtype: float64

► Stata Output Executed successfully

. randtreat, generate(treatment) strata(gender region) setseed(42)
(using default treatment probabilities: 0.50 0.50)

Treatment assigned within 4 strata:
  Stratum F.N: 3 obs -> T1: 1 (33%), T2: 2 (67%)
  Stratum F.S: 3 obs -> T1: 2 (67%), T2: 1 (33%)
  Stratum M.N: 3 obs -> T1: 1 (33%), T2: 2 (67%)
  Stratum M.S: 3 obs -> T1: 2 (67%), T2: 1 (33%)

. tab gender treatment

           |       treatment
    gender |         0          1 |     Total
-----------+----------------------+----------
         F |         3          3 |         6
         M |         3          3 |         6
-----------+----------------------+----------
     Total |         6          6 |        12

► R Output Executed successfully

   id gender region treatment
1   1      M      N         1
2   2      M      N         0
3   3      M      S         1
4   4      M      S         0
5   5      F      N         0
6   6      F      N         1
7   7      F      S         0
8   8      F      S         1
9   9      M      N         0
10 10      M      S         1
11 11      F      N         1
12 12      F      S         0

   0 1
F  3 3
M  3 3

Cluster Randomization

When treatment must be applied at the group level (classrooms, villages, firms), randomize clusters rather than individuals.

# Python: Cluster randomization
import numpy as np
import pandas as pd

np.random.seed(42)

# Create sample data: students in schools
df = pd.DataFrame({
    'student_id': range(1, 13),
    'school_id': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'D', 'D', 'D']
})

# Get unique clusters
clusters = df['school_id'].unique()
n_clusters = len(clusters)

# Randomize at cluster level
cluster_treatment = dict(zip(
    clusters,
    np.random.binomial(1, 0.5, n_clusters)
))

# Map back to individuals
df['treatment'] = df['school_id'].map(cluster_treatment)
print(df)
print("\nCluster-level assignments:")
print(cluster_treatment)

* Stata: Cluster randomization
set seed 42

* Randomize at cluster level
randtreat, generate(treatment) cluster(school_id) setseed(42)

* Stratified cluster randomization
randtreat, generate(treatment) cluster(school_id) ///
    strata(district) setseed(42)

# R: Cluster randomization with randomizr
library(randomizr)

set.seed(42)

# Create sample data
df <- data.frame(
  student_id = 1:12,
  school_id = rep(c('A', 'B', 'C', 'D'), each = 3)
)

# Cluster random assignment
df$treatment <- cluster_ra(
  clusters = df$school_id,
  prob = 0.5
)

print(df)
print(table(df$school_id, df$treatment))

Python Output

student_id school_id treatment 0 1 A 0 1 2 A 0 2 3 A 0 3 4 B 1 4 5 B 1 5 6 B 1 6 7 C 0 7 8 C 0 8 9 C 0 9 10 D 0 10 11 D 0 11 12 D 0 Cluster-level assignments: {'A': 0, 'B': 1, 'C': 0, 'D': 0}

Stata Output

. randtreat, generate(treatment) cluster(school_id) setseed(42) (using default treatment probabilities: 0.50 0.50) Cluster randomization: 4 clusters randomized Treated clusters: 2 Control clusters: 2 . tab school_id treatment school_id | treatment | 0 1 | Total ------------+----------------------+---------- A | 3 0 | 3 B | 0 3 | 3 C | 3 0 | 3 D | 0 3 | 3 ------------+----------------------+---------- Total | 6 6 | 12

R Output

student_id school_id treatment 1 1 A 0 2 2 A 0 3 3 A 0 4 4 B 1 5 5 B 1 6 6 B 1 7 7 C 1 8 8 C 1 9 9 C 1 10 10 D 0 11 11 D 0 12 12 D 0 0 1 A 3 0 B 0 3 C 0 3 D 3 0

► Python Output Executed successfully

    student_id school_id  treatment
0            1         A          0
1            2         A          0
2            3         A          0
3            4         B          1
4            5         B          1
5            6         B          1
6            7         C          0
7            8         C          0
8            9         C          0
9           10         D          0
10          11         D          0
11          12         D          0

Cluster-level assignments:
{'A': 0, 'B': 1, 'C': 0, 'D': 0}

► Stata Output Executed successfully

. randtreat, generate(treatment) cluster(school_id) setseed(42)
(using default treatment probabilities: 0.50 0.50)

Cluster randomization:
  4 clusters randomized
  Treated clusters: 2
  Control clusters: 2

. tab school_id treatment

  school_id |       treatment
            |         0          1 |     Total
------------+----------------------+----------
          A |         3          0 |         3
          B |         0          3 |         3
          C |         3          0 |         3
          D |         0          3 |         3
------------+----------------------+----------
      Total |         6          6 |        12

► R Output Executed successfully

   student_id school_id treatment
1           1         A         0
2           2         A         0
3           3         A         0
4           4         B         1
5           5         B         1
6           6         B         1
7           7         C         1
8           8         C         1
9           9         C         1
10         10         D         0
11         11         D         0
12         12         D         0

   0 1
A  3 0
B  0 3
C  0 3
D  3 0

Verification and Documentation

Always verify your randomization worked correctly:

Check proportions: Are treatment groups the expected sizes?
Check balance: Are covariates balanced? (See next section)
Document seed: Record the random seed for reproducibility
Save assignment: Export the treatment assignment file before launching

Critical: Document Everything

Save your randomization script, the seed, and the treatment assignment file. In your pre-analysis plan, specify your randomization procedure exactly. You should be able to reproduce the exact same treatment assignment from the same seed.

ProTools ER1

Course Modules

5B.3 Randomization

Table of Contents

Simple Random Assignment

Stratified Randomization

Cluster Randomization

Verification and Documentation