← Back to Module 0: Languages & Platforms

Stata Interface Guide

~30 min For Stata users

Stata is the most widely used statistical software in academic economics. It's particularly strong for econometrics, panel data analysis, and survey data. If you're going into applied economics research, you will use Stata.

What is Stata?

Unlike R and Python, Stata is both a programming language and a complete software package. When you buy Stata, you get everything: the language, the interface, the documentation, and thousands of built-in statistical commands. This "all-in-one" approach is part of why economists love it—everything just works together.

For the nerds: more on Stata's language layers

Stata's language operates at two levels:

  • Interactive commands / do-files — The primary way users interact with Stata. You type commands like regress y x1 x2 or summarize income, and you can save sequences of commands in .do files to create reproducible scripts.
  • Ado-files — Stata's higher-level programming language for writing new commands. Most of Stata's built-in commands are actually written in ado, and users can write their own. Ado is interpreted (not compiled).

Mata then sits underneath as a compiled, C-like matrix language for performance-critical code.

So when people say "Stata," they're typically referring to both the software application and its scripting/command language. Mata is a separate but integrated language within the Stata ecosystem.

Stata is Commercial Software

Stata requires a license ($125+ for students, more for professionals). However, most universities provide free access through site licenses. Check with your IT department or library. If you're at Sciences Po, Stata is available on campus computers and through remote desktop.

Stata Versions

Stata comes in several versions. The main differences are in memory capacity and processing speed:

Version Variables Limit Best For
Stata/BE (Basic Edition) 2,048 variables Small datasets, learning
Stata/SE (Standard Edition) 32,767 variables Most research projects
Stata/MP (Multiprocessor) 120,000 variables Large datasets, parallel processing

For most coursework and research, Stata/SE is sufficient. Your university likely provides either SE or MP.

The Stata Interface

When you open Stata, you'll see a window with five main panels. Hover over each numbered region below to learn what it does:

Stata/MP 18.0
Review
. use "data.dta"
. describe
. reg income education
1
Stata Results
. reg income education experience

Source |       SS       df       MS
-------+------------------------------
Model |  1.2e+10     2  6.1e+09
Resid |  5.4e+10   997  5.4e+07
-------+------------------------------
Total |  6.6e+10   999

                R-squared = 0.1842
2
Command
. |
3
Variables
income
education
experience
age
female
4
Properties
Variables: 12
Observations: 1,000
Size: 68 KB
5

Hover over a numbered region to learn about that part of Stata.

The Do-file Editor: Where Real Work Happens

The Command window is fine for quick tests, but all serious Stata work should be done in Do-files. A Do-file is a script that contains a sequence of Stata commands. Using Do-files makes your work reproducible, shareable, and easier to debug.

To open the Do-file Editor:

  • Go to Window > Do-file Editor > New Do-file Editor
  • Or press Ctrl+9 (Windows) / Cmd+9 (Mac)
  • Or type doedit in the Command window

Essential Stata Commands

Here are the commands every Stata user needs to know:

Command What it does Example
use Load a dataset use "data.dta", clear
describe Show variable names and types describe
summarize Summary statistics summarize income age
tabulate Frequency tables tab education
generate Create a new variable gen log_income = log(income)
replace Modify existing variable replace age = . if age < 0
regress OLS regression reg income education age
help Get help on any command help regress

A Sample Do-file

Here's what a typical Do-file looks like. I recommend always starting with the commands shown below:

/*******************************************************************************
 * Project: Analysis of Wage Data
 * Author:  Your Name
 * Date:    January 2026
 * Purpose: Explore determinants of wages
 *******************************************************************************/

* Clear everything and set up
clear all
set more off

* Set working directory (adjust to your path)
cd "/Users/yourname/research/wage_project"

* Start a log file to save all output
log using "analysis_log.txt", text replace

* Load the data
use "wage_data.dta", clear

* Examine the data
describe
summarize

* Summary statistics for key variables
summarize wage education experience, detail

* Run a simple regression
regress wage education experience i.female

* Close the log file
log close
Best Practices for Do-files
  • Always start with clear all to ensure a clean environment
  • Use set more off to prevent Stata from pausing output
  • Start a log file to save all your output
  • Comment liberally using * or /* */
  • Use relative paths after setting your working directory

Understanding Log Files

A log file is a text record of everything that happens in your Stata session: every command you run and all the output it produces. Log files are essential for reproducibility—they let you (and others) see exactly what you did and what results you got.

Creating a Log File

You start and stop logging with simple commands:

stata
* Start logging to a text file (overwrites if exists)
log using "my_analysis.log", text replace

* ... run your analysis here ...

* Stop logging
log close

The text option creates a plain text file (readable anywhere). Without it, Stata creates a .smcl file (Stata Markup and Control Language), which preserves formatting but only opens in Stata.

Sample Log File

Here's what a typical log file looks like after running some basic analysis:

my_analysis.log
-------------------------------------------------------------------------------
      name:  
       log:  /Users/researcher/projects/wages/my_analysis.log
  log type:  text
 opened on:  15 Jan 2026, 10:32:15

. * Load the dataset
. use "wages.dta", clear

. * Describe the data
. describe

Contains data from wages.dta
 Observations:         1,000
    Variables:             5                  15 Jan 2026 09:15
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
wage            float   %9.0g                 Hourly wage (dollars)
education       byte    %9.0g                 Years of education
experience      byte    %9.0g                 Years of experience
female          byte    %9.0g      sex        1 = Female
age             byte    %9.0g                 Age in years
-------------------------------------------------------------------------------
Sorted by:

. * Summary statistics
. summarize wage education experience

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        wage |      1,000       22.47       12.35       5.25      98.50
   education |      1,000       13.24        2.68          8         20
  experience |      1,000       17.82       11.45          0         45

. * Run regression
. regress wage education experience female

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(3, 996)       =    142.56
       Model |   45892.123         3  15297.374   Prob > F        =    0.0000
    Residual |  106834.877       996   107.263   R-squared       =    0.3005
-------------+----------------------------------   Adj R-squared   =    0.2984
       Total |  152727.000       999   152.880   Root MSE        =    10.357

------------------------------------------------------------------------------
        wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
   education |    2.4521     0.1423    17.23   0.000      2.1728      2.7314
  experience |    0.3845     0.0412     9.33   0.000      0.3036      0.4654
      female |   -3.2156     0.6534    -4.92   0.000     -4.4982     -1.9330
       _cons |   -8.7234     2.0145    -4.33   0.000    -12.6781     -4.7687
------------------------------------------------------------------------------

. log close
      name:  
       log:  /Users/researcher/projects/wages/my_analysis.log
  log type:  text
 closed on:  15 Jan 2026, 10:32:18
-------------------------------------------------------------------------------

Key Parts of a Log File

  • Header — Shows when and where the log was created
  • Commands — Each command you ran appears after a dot (.)
  • Output — The results of each command appear directly below it
  • Footer — Shows when the log was closed
Log File Tips
  • Use log using "filename", append to add to an existing log instead of replacing it
  • Name your logs descriptively: analysis_v2_2026-01-15.log
  • Store logs alongside your do-files for easy reference
  • If you forget to close a log, use log close _all to close any open logs

Essential Keyboard Shortcuts

Shortcut (Windows) Shortcut (Mac) Action
Ctrl + D Cmd + Shift + D Run selected code in Do-file Editor
Ctrl + 9 Cmd + 9 Open Do-file Editor
Ctrl + S Cmd + S Save current Do-file
Page Up Page Up Previous command (in Command window)
F1 F1 Help for selected command

Getting Help in Stata

Stata has excellent built-in documentation. To get help on any command:

  • help regress — Opens help for the regress command
  • search panel data — Searches all documentation for "panel data"
  • findit xtreg — Searches for user-written commands

Video Tutorials

Recommended Videos