0  Languages & Platforms

Overview Foundations Beginner

Before diving into coding, let's get oriented. This module introduces the three programming languages I use throughout the course—Python, Stata, and R—and the software environments where you'll write and run your code.

If you're completely new to programming, this can feel overwhelming. Three languages? Multiple softwares? Don't worry. By the end of this module, you'll understand what each tool is for and which ones to focus on first. You don't need to master everything—you need to know where to start.

What You'll Learn

  • The differences between Python, Stata, and R—and when to use each
  • How to navigate the software environments (IDEs) for each language
  • How to set up your computer for coding, or use cloud-based alternatives

The Three Languages

Each language in this course emerged from a different community with different goals. Understanding their origins helps you choose the right one for each task.

Aspect Python Stata R
Born 1991 1985 1993
Origin General-purpose scripting Statistical analysis Statistical computing
License Free, open-source Commercial (paid) Free, open-source
Strength ML, versatility, AI tools Econometrics, panel data Statistics, visualization
Learning curve Gentle Gentle for basics Steeper initially
AI assistance Excellent Good Good

My Recommendation: Start with Python and Stata

One of the most common questions I get is "Which language should I learn first?" Here's my answer as of the beginning of this course (Jan 2026):

Start with Python, follow with Stata/R. Here's why:
  • Python is excellent for beginners and is the language that AI tools (ChatGPT, Claude, Copilot) understand best. You may not strictly need it directly for your economics research (depending on your subfield), hence I encourage all to prioritize learning it as an investment: when you ask an LLM to help you code, it will be most fluent in Python. This makes debugging and learning much faster.
  • Stata is probably still the dominant language in academic economics. You'll need it to replicate published papers, work as a research assistant, and collaborate with senior researchers. Most replication packages from top journals are in Stata.

R is great for statistical analysis, and is superior if you want advanced visualization (ggplot2) or specific causal inference packages (fixest, rdrobust). Also, it's *free*!

The Software Environments (IDEs)

An IDE (Integrated Development Environment) is the software where you write and run your code. Think of it like a word processor for code—it provides syntax highlighting, error checking, and tools to run your programs.

Each language has a preferred environment, but some IDEs (like VS Code) can handle multiple languages. I've created detailed guides for each:

RStudio

The standard IDE for R. Free, powerful, and designed specifically for statistical analysis.

Read the RStudio Guide →

Stata

Stata has its own built-in IDE. Learn the interface and the essential Do-file Editor.

Read the Stata Guide →

Visual Studio Code

My recommended editor for Python. Free, works with any language, great AI tool integration.

Read the VS Code Guide →

Jupyter & Google Colab

Interactive notebooks for Python. Colab requires no installation—just open in your browser.

Read the Notebooks Guide →

(Note: part 2 of the course --ProTools ER2-- will cover AI-powered IDEs like "Claude code desktop" and "LM studio". For the moment, however, it is important to learn coding outside of fully-AI-assisted environments. Hold your FOMO! (="Fear Of Missing Out", for boomers))

Quick Start: No Installation Needed

If you want to start coding immediately without installing anything, use these cloud-based options:

Language Cloud Option Link
Python Google Colab colab.research.google.com
R Posit Cloud posit.cloud
Stata University remote desktop Check with your IT department

Which Tool for Which Task?

Here's some tips to choosing tools for common research tasks (these suggestions assume you are a PhD candidate in Economics; but even so, take them with a pinch of salt: "most needed" tools vary across subfields. In the end, you need to figure out what's best!):

Task (Probably) best tool Why
Replicating an economics paper Stata Most replication packages are in Stata (statement valid as of now; expect things to change)
Machine learning / Deep learning Python Python has excellent libraries. You can run code in Colab (an online IDE) if you want to use cloud GPUs (however, you'll need to pay ...)
Publication-quality visualizations R Great visualization capabilities (eg ggplot2)
Web scraping Python Best libraries (BeautifulSoup, Selenium)
Panel data econometrics Stata or R Purpose-built for this
Quick exploratory analysis I use Python, but you'll use what you are most comfortable with. I often end up using an online IDE for those tasks (eg. Colab) No setup, immediate results
Collaborating with economists popularity is still: Stata>=R>>Python (probably, my guess). Any language really; expect python to gain momentum.
Learning with AI assistance Python AI tools are most fluent in Python

File Formats Reference

As you work with code and data, you'll encounter many different file types. The key distinction to remember:

CODE Files
Your instructions to the computer
.py, .do, .R, .ipynb
DATA Files
The information you analyze
.csv, .dta, .xlsx, .rds

Files by Language

Each programming language has its own file formats. Here's what belongs to what:

Python
.py CODE
Python script - your main code file
.ipynb CODE
Jupyter Notebook - code + notes + output
Stata
.do CODE
Do-file - Stata commands to run
.dta DATA
Stata dataset - preserves labels!
.ado CODE
Program file - reusable commands
.log OUTPUT
Log file - record of your session
R
.R CODE
R script - your main code file
.Rmd CODE
R Markdown - code + formatted text
.rds DATA
R data - single object saved
.RData DATA
R workspace - multiple objects
Universal Works with Any Language
.csv DATA
Comma-separated - universal format!
.xlsx DATA
Excel spreadsheet - multiple sheets
.json DATA
Structured text - from web APIs
.txt DATA
Plain text - tab or space delimited
Quick Rules of Thumb
  • Sharing data with anyone? Use .csv - it works everywhere
  • Working in Stata? Save as .dta to keep your variable labels
  • Working in R? Save as .rds to preserve data types
  • Always save your code! The .py, .do, or .R file is more important than the data output

Test Your Knowledge!

10 questions in 40 seconds. Can you identify the file formats?

What's Next?

Now that you understand the landscape, here's what to do:

  1. Choose your starting point. If you're new to programming, I recommend starting with Python in Google Colab (zero setup required). However, the learning habits you set up initially may be very persistent. So, I encourage you to explore different environments early on to find what works best for you. I particularly enjoy working on Visual Studio because it provides a robust development environment with excellent debugging capabilities, and you can code in different languages, all in one environment.
  2. Read the relevant guide. Use the guide cards above to learn your chosen environment.
  3. Move to Module 1. Once you can run code, you're ready to start learning the basics.

Don't feel like you need to read all the IDE guides now. You can always come back to them when you need a specific tool. The most important thing is to start coding. Look at the bottom right of the page: there is a chatbot assistant especially trained to answer questions on the course materials. Use it to ask questions as you go along! A log (ie a transcript) of the conversations will be stored and I will use it to improve the course and expand the course materials in the direction people need the most. So, using the chatbot you'll be contributing to a public good!