2b Working with APIs
Learning Objectives
- Understand what an API is and how it works technically
- Make API requests and parse JSON responses
- Use the World Bank API for economic research data
- Handle authentication and rate limits
What is an API?
API = Application Programming Interface
An API is a structured way for programs to communicate. Instead of downloading a CSV file manually, you send a request to a server, and it sends back data in a structured format (usually JSON).
Analogy: Think of a restaurant. You (the client) don't go into the kitchen. Instead, you tell the waiter (the API) what you want, and the waiter brings your food (the data) from the kitchen (the server).
How an API Request Works
GET /countries/USA/gdp{"gdp": 25e12}Understanding API URLs
API requests use URLs with specific structures:
https://api.worldbank.org/v2/country/USA/indicator/NY.GDP.MKTP.CD?format=json |_____||________________||__||___||_____________________________||___________| | | | | | | Protocol Host Version Country Indicator Code Parameters
Key Components
- Host: The server address (api.worldbank.org)
- Endpoint: The specific resource you want (/country/USA/indicator/...)
- Parameters: Options like format, date range (?format=json&date=2010:2020)
Building URLs: Brick by Brick
In your code, you construct the API URL piece by piece before sending the request. Think of it like building with LEGO blocks:
- Base URL — The foundation (e.g.,
https://api.worldbank.org/v2/) - Endpoint path — What you want (e.g.,
country/USA/indicator/NY.GDP.MKTP.CD) - Query parameters — Filters and options (e.g.,
?format=json&date=2015:2022)
Your code assembles these pieces, then makes a single HTTP request. The server processes the URL and returns the data.
# Python example: building the URL brick by brick base = "https://api.worldbank.org/v2" # Foundation endpoint = "/country/USA/indicator/NY.GDP.MKTP.CD" # What we want params = "?format=json&date=2015:2022" # Options full_url = base + endpoint + params # Assemble the bricks response = requests.get(full_url) # Send to server!
Your First API Call
Let's fetch GDP data from the World Bank API. Click Run below each code block to see the output:
# Import the requests library for making HTTP calls
import requests
import pandas as pd
# ═══════════════════════════════════════════════════════════════════
# STEP 1: BUILD THE URL (brick by brick)
# ═══════════════════════════════════════════════════════════════════
# Base URL: The API's address
base = "https://api.worldbank.org/v2"
# Endpoint: What we want (country/indicator)
country = "USA" # Try: "FRA", "CHN", "BRA"
indicator = "NY.GDP.MKTP.CD" # GDP in current US$
# Assemble the URL
url = f"{base}/country/{country}/indicator/{indicator}"
print(f"URL built: {url}")
# Parameters: Filters added to the URL as ?key=value&key=value
params = {
"format": "json", # Return data as JSON (not XML)
"date": "2015:2022", # Get data from 2015 to 2022
"per_page": 100 # Maximum records per page
}
# ═══════════════════════════════════════════════════════════════════
# STEP 2: SEND THE REQUEST
# ═══════════════════════════════════════════════════════════════════
# requests.get() combines url + params and sends to server
response = requests.get(url, params=params)
print(f"Status Code: {response.status_code}") # 200 = success
# Parse the JSON response into a Python object
data = response.json()
# World Bank returns [metadata, records] - we want index 1
records = data[1]
# Convert to DataFrame using list comprehension
df = pd.DataFrame([{
'country': r['country']['value'], # Extract country name
'year': int(r['date']), # Convert year to integer
'gdp': r['value'] # GDP value
} for r in records if r['value'] is not None])
print(df)
* Stata requires additional packages for API calls
* The wbopendata package makes World Bank data easy to access
* Install the World Bank Open Data package (run once)
ssc install wbopendata
* ═══════════════════════════════════════════════════════════════════
* In Stata, wbopendata BUILDS THE URL FOR YOU behind the scenes
* You provide the "bricks" as options:
* ═══════════════════════════════════════════════════════════════════
* Define our "bricks"
local country "USA" // Try: "FRA", "CHN", "BRA"
local indicator "NY.GDP.MKTP.CD" // GDP in current US$
local years "2015:2022" // Time range
* wbopendata combines these into an API call
wbopendata, indicator(`indicator') country(`country') year(`years') clear
* Display the data
list countryname year ny_gdp_mktp_cd
# Load required libraries
library(httr)
library(jsonlite)
library(tidyverse)
# ═══════════════════════════════════════════════════════════════════
# STEP 1: BUILD THE URL (brick by brick)
# ═══════════════════════════════════════════════════════════════════
# Base URL: The API's address
base <- "https://api.worldbank.org/v2"
# Endpoint: What we want (country/indicator)
country <- "USA" # Try: "FRA", "CHN", "BRA"
indicator <- "NY.GDP.MKTP.CD" # GDP in current US$
# Assemble the URL using paste0()
url <- paste0(base, "/country/", country, "/indicator/", indicator)
cat("URL built:", url, "\n")
# ═══════════════════════════════════════════════════════════════════
# STEP 2: SEND THE REQUEST
# ═══════════════════════════════════════════════════════════════════
# GET() combines url + query params and sends to server
response <- GET(url, query = list(
format = "json", # Request JSON format
date = "2015:2022", # Year range
per_page = 100 # Max results
))
# Check response status
cat("Status:", status_code(response), "\n")
# Parse JSON response
data <- content(response, "text") %>% fromJSON()
# Extract records (second element of response list)
records <- data[[2]]
# Create tibble (tidyverse data frame)
df <- tibble(
country = records$country$value, # Country name
year = as.integer(records$date), # Year as integer
gdp = records$value # GDP value
) %> filter(!is.na(gdp)) # Remove NAs
print(df)
| Indicator Code | Description |
|---|---|
NY.GDP.MKTP.CD | GDP (current US$) |
NY.GDP.PCAP.CD | GDP per capita (current US$) |
EN.ATM.CO2E.KT | CO2 emissions (kt) |
EN.ATM.CO2E.PC | CO2 emissions per capita (metric tons) |
SP.POP.TOTL | Total population |
EG.USE.PCAP.KG.OE | Energy use per capita (kg oil equivalent) |
AG.LND.FRST.ZS | Forest area (% of land area) |
Full list: data.worldbank.org/indicator
API Authentication
Why Authentication?
Many APIs require an API key to track usage and prevent abuse. The World Bank API is free and doesn't require authentication, but others (like FRED, Alpha Vantage, OpenAI) do. You typically get a key by registering on the provider's website.
# Example: Using an API key (FRED API)
import requests
import os
# NEVER hardcode API keys in your code!
# Instead, use environment variables for security
api_key = os.environ.get('FRED_API_KEY')
# Set in terminal: export FRED_API_KEY=your_key_here
url = "https://api.stlouisfed.org/fred/series/observations"
params = {
"series_id": "GDP", # The data series to fetch
"api_key": api_key, # Your authentication key
"file_type": "json" # Response format
}
response = requests.get(url, params=params)
Never commit API keys to Git! We haven't covered Git yet (that's in Module 8), but this is important enough to mention now. Git is a version control system that tracks all your code changes—including any secrets you accidentally include.
Use environment variables or a .env file (and add it to .gitignore). Exposed keys can be stolen and abused, potentially costing you money or compromising your accounts.
We'll explain Git, commits, and .gitignore properly in Module 8: Git & GitHub.
Next Steps
In the next module (Data Cleaning), we'll take the data we fetched and:
- Handle missing values in our GDP and CO2 data
- Merge with additional datasets
- Create derived variables for analysis