← Back to Module 0: Languages & Platforms

Stata Interface Guide

~30 min For Stata users

Stata is the most widely used statistical software in academic economics. It's particularly strong for econometrics, panel data analysis, and survey data. If you're going into applied economics research, you will use Stata.

What is Stata?

Unlike R and Python, Stata is both a programming language and a complete software package. When you buy Stata, you get everything: the language, the interface, the documentation, and thousands of built-in statistical commands. This "all-in-one" approach is part of why economists love it—everything just works together.

For the nerds: more on Stata's language layers

Stata's language operates at two levels:

Interactive commands / do-files — The primary way users interact with Stata. You type commands like regress y x1 x2 or summarize income, and you can save sequences of commands in .do files to create reproducible scripts.
Ado-files — Stata's higher-level programming language for writing new commands. Most of Stata's built-in commands are actually written in ado, and users can write their own. Ado is interpreted (not compiled).

Mata then sits underneath as a compiled, C-like matrix language for performance-critical code.

So when people say "Stata," they're typically referring to both the software application and its scripting/command language. Mata is a separate but integrated language within the Stata ecosystem.

Stata is Commercial Software

Stata requires a license ($125+ for students, more for professionals). However, most universities provide free access through site licenses. Check with your IT department or library. If you're at Sciences Po, Stata is available on campus computers and through remote desktop.

Stata Versions

Stata comes in several versions. The main differences are in memory capacity and processing speed:

Version	Variables Limit	Best For
Stata/BE (Basic Edition)	2,048 variables	Small datasets, learning
Stata/SE (Standard Edition)	32,767 variables	Most research projects
Stata/MP (Multiprocessor)	120,000 variables	Large datasets, parallel processing

For most coursework and research, Stata/SE is sufficient. Your university likely provides either SE or MP.

The Stata Interface

When you open Stata, you'll see a window with five main panels. Hover over each numbered region below to learn what it does:

Stata/MP 18.0

Review

. use "data.dta"

. describe

. reg income education

Stata Results

. reg income education experience

Source | SS df MS

-------+------------------------------

Model | 1.2e+10 2 6.1e+09

Resid | 5.4e+10 997 5.4e+07

-------+------------------------------

Total | 6.6e+10 999

R-squared = 0.1842

Command

. |

Variables

income

education

experience

age

female

Properties

Variables: 12

Observations: 1,000

Size: 68 KB

Hover over a numbered region to learn about that part of Stata.

The Do-file Editor: Where Real Work Happens

The Command window is fine for quick tests, but all serious Stata work should be done in Do-files. A Do-file is a script that contains a sequence of Stata commands. Using Do-files makes your work reproducible, shareable, and easier to debug.

To open the Do-file Editor:

Go to Window > Do-file Editor > New Do-file Editor
Or press Ctrl+9 (Windows) / Cmd+9 (Mac)
Or type doedit in the Command window

Essential Stata Commands

Here are the commands every Stata user needs to know:

Command	What it does	Example
`use`	Load a dataset	`use "data.dta", clear`
`describe`	Show variable names and types	`describe`
`summarize`	Summary statistics	`summarize income age`
`tabulate`	Frequency tables	`tab education`
`generate`	Create a new variable	`gen log_income = log(income)`
`replace`	Modify existing variable	`replace age = . if age < 0`
`regress`	OLS regression	`reg income education age`
`help`	Get help on any command	`help regress`

A Sample Do-file

Here's what a typical Do-file looks like. I recommend always starting with the commands shown below:

/*******************************************************************************
 * Project: Analysis of Wage Data
 * Author:  Your Name
 * Date:    January 2026
 * Purpose: Explore determinants of wages
 *******************************************************************************/

* Clear everything and set up
clear all
set more off

* Set working directory (adjust to your path)
cd "/Users/yourname/research/wage_project"

* Start a log file to save all output
log using "analysis_log.txt", text replace

* Load the data
use "wage_data.dta", clear

* Examine the data
describe
summarize

* Summary statistics for key variables
summarize wage education experience, detail

* Run a simple regression
regress wage education experience i.female

* Close the log file
log close

Stata Results Window

. clear all

. set more off

. cd "/Users/yourname/research/wage_project"
/Users/yourname/research/wage_project

. log using "analysis_log.txt", text replace
      name:  <unnamed>
       log:  /Users/yourname/research/wage_project/analysis_log.txt
  log type:  text
 opened on:  27 Jan 2026, 14:32:15

. use "wage_data.dta", clear

. describe
Contains data from wage_data.dta
 Observations:         1,000
    Variables:             5
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
wage            float   %9.0g                 Hourly wage (dollars)
education       byte    %9.0g                 Years of education
experience      byte    %9.0g                 Years of experience
female          byte    %9.0g      sex        1 = Female

. summarize

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        wage |      1,000       22.47       12.35       5.25      98.50
   education |      1,000       13.24        2.68          8         20
  experience |      1,000       17.82       11.45          0         45
      female |      1,000        0.48        0.50          0          1

. regress wage education experience i.female

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(3, 996)       =    142.56
       Model |   45892.12         3  15297.37     Prob > F        =    0.0000
    Residual |  106834.88       996    107.26     R-squared       =    0.3005
-------------+----------------------------------   Adj R-squared   =    0.2984
       Total |  152727.00       999    152.88     Root MSE        =    10.357

------------------------------------------------------------------------------
        wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
   education |    2.4521     0.1423    17.23   0.000      2.1728      2.7314
  experience |    0.3845     0.0412     9.33   0.000      0.3036      0.4654
    1.female |   -3.2156     0.6534    -4.92   0.000     -4.4982     -1.9330
       _cons |   -8.7234     2.0145    -4.33   0.000    -12.6781     -4.7687
------------------------------------------------------------------------------

. log close
      name:  <unnamed>
       log:  /Users/yourname/research/wage_project/analysis_log.txt
  log type:  text
 closed on:  27 Jan 2026, 14:32:18

Best Practices for Do-files

Always start with clear all to ensure a clean environment
Use set more off to prevent Stata from pausing output
Start a log file to save all your output
Comment liberally using * or /* */
Use relative paths after setting your working directory

Understanding Log Files

A log file is a text record of everything that happens in your Stata session: every command you run and all the output it produces. Log files are essential for reproducibility—they let you (and others) see exactly what you did and what results you got.

Creating a Log File

You start and stop logging with simple commands:

stata

* Start logging to a text file (overwrites if exists)
log using "my_analysis.log", text replace

* ... run your analysis here ...

* Stop logging
log close

The text option creates a plain text file (readable anywhere). Without it, Stata creates a .smcl file (Stata Markup and Control Language), which preserves formatting but only opens in Stata.

Sample Log File

Here's what a typical log file looks like after running some basic analysis:

my_analysis.log

-------------------------------------------------------------------------------
      name:  
       log:  /Users/researcher/projects/wages/my_analysis.log
  log type:  text
 opened on:  15 Jan 2026, 10:32:15

. * Load the dataset
. use "wages.dta", clear

. * Describe the data
. describe

Contains data from wages.dta
 Observations:         1,000
    Variables:             5                  15 Jan 2026 09:15
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
wage            float   %9.0g                 Hourly wage (dollars)
education       byte    %9.0g                 Years of education
experience      byte    %9.0g                 Years of experience
female          byte    %9.0g      sex        1 = Female
age             byte    %9.0g                 Age in years
-------------------------------------------------------------------------------
Sorted by:

. * Summary statistics
. summarize wage education experience

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        wage |      1,000       22.47       12.35       5.25      98.50
   education |      1,000       13.24        2.68          8         20
  experience |      1,000       17.82       11.45          0         45

. * Run regression
. regress wage education experience female

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(3, 996)       =    142.56
       Model |   45892.123         3  15297.374   Prob > F        =    0.0000
    Residual |  106834.877       996   107.263   R-squared       =    0.3005
-------------+----------------------------------   Adj R-squared   =    0.2984
       Total |  152727.000       999   152.880   Root MSE        =    10.357

------------------------------------------------------------------------------
        wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
   education |    2.4521     0.1423    17.23   0.000      2.1728      2.7314
  experience |    0.3845     0.0412     9.33   0.000      0.3036      0.4654
      female |   -3.2156     0.6534    -4.92   0.000     -4.4982     -1.9330
       _cons |   -8.7234     2.0145    -4.33   0.000    -12.6781     -4.7687
------------------------------------------------------------------------------

. log close
      name:  
       log:  /Users/researcher/projects/wages/my_analysis.log
  log type:  text
 closed on:  15 Jan 2026, 10:32:18
-------------------------------------------------------------------------------

Key Parts of a Log File

Header — Shows when and where the log was created
Commands — Each command you ran appears after a dot (.)
Output — The results of each command appear directly below it
Footer — Shows when the log was closed

Log File Tips

Use log using "filename", append to add to an existing log instead of replacing it
Name your logs descriptively: analysis_v2_2026-01-15.log
Store logs alongside your do-files for easy reference
If you forget to close a log, use log close _all to close any open logs

Essential Keyboard Shortcuts

Shortcut (Windows)	Shortcut (Mac)	Action
`Ctrl + D`	`Cmd + Shift + D`	Run selected code in Do-file Editor
`Ctrl + 9`	`Cmd + 9`	Open Do-file Editor
`Ctrl + S`	`Cmd + S`	Save current Do-file
`Page Up`	`Page Up`	Previous command (in Command window)
`F1`	`F1`	Help for selected command

Getting Help in Stata

Stata has excellent built-in documentation. To get help on any command:

help regress — Opens help for the regress command
search panel data — Searches all documentation for "panel data"
findit xtreg — Searches for user-written commands

Stata Interface Guide

What is Stata?

Stata Versions

The Stata Interface

The Do-file Editor: Where Real Work Happens

Essential Stata Commands

A Sample Do-file

Understanding Log Files

Creating a Log File

Sample Log File

Key Parts of a Log File

Essential Keyboard Shortcuts

Getting Help in Stata

Video Tutorials

ProTools ER1 Assistant

ProTools ER1

Course Modules

Stata Interface Guide

What is Stata?

Stata Versions

The Stata Interface

The Do-file Editor: Where Real Work Happens

Essential Stata Commands

A Sample Do-file

Understanding Log Files

Creating a Log File

Sample Log File

Key Parts of a Log File

Essential Keyboard Shortcuts

Getting Help in Stata

Video Tutorials

ProTools ER1 Assistant