2b  Working with APIs

~3 hours APIs, HTTP, JSON Intermediate

Learning Objectives

  • Understand what an API is and how it works technically
  • Make API requests and parse JSON responses
  • Use the World Bank API for economic research data
  • Handle authentication and rate limits

Research Project: Fetching Climate & Economic Data

In this module, we'll use the World Bank API to programmatically download CO2 emissions, GDP, and climate indicators for our research on climate vulnerability and economic growth.

What is an API?

API = Application Programming Interface

An API is a structured way for programs to communicate. Instead of downloading a CSV file manually, you send a request to a server, and it sends back data in a structured format (usually JSON).

Analogy: Think of a restaurant. You (the client) don't go into the kitchen. Instead, you tell the waiter (the API) what you want, and the waiter brings your food (the data) from the kitchen (the server).

How an API Request Works

Your Code
(Client)
HTTP Request
GET /countries/USA/gdp
World Bank Server
(API)
JSON Response
{"gdp": 25e12}
Your Code
(Receives Data)

Understanding API URLs

API requests use URLs with specific structures:

https://api.worldbank.org/v2/country/USA/indicator/NY.GDP.MKTP.CD?format=json
|_____||________________||__||___||_____________________________||___________|
   |          |            |    |              |                        |
Protocol    Host       Version  Country   Indicator Code          Parameters

Key Components

  • Host: The server address (api.worldbank.org)
  • Endpoint: The specific resource you want (/country/USA/indicator/...)
  • Parameters: Options like format, date range (?format=json&date=2010:2020)

Building URLs: Brick by Brick

In your code, you construct the API URL piece by piece before sending the request. Think of it like building with LEGO blocks:

  1. Base URL — The foundation (e.g., https://api.worldbank.org/v2/)
  2. Endpoint path — What you want (e.g., country/USA/indicator/NY.GDP.MKTP.CD)
  3. Query parameters — Filters and options (e.g., ?format=json&date=2015:2022)

Your code assembles these pieces, then makes a single HTTP request. The server processes the URL and returns the data.

# Python example: building the URL brick by brick
base = "https://api.worldbank.org/v2"    # Foundation
endpoint = "/country/USA/indicator/NY.GDP.MKTP.CD"  # What we want
params = "?format=json&date=2015:2022"    # Options

full_url = base + endpoint + params       # Assemble the bricks
response = requests.get(full_url)         # Send to server!

Your First API Call

Let's fetch GDP data from the World Bank API. Click Run below each code block to see the output:

# Import the requests library for making HTTP calls
import requests
import pandas as pd

# ═══════════════════════════════════════════════════════════════════
# STEP 1: BUILD THE URL (brick by brick)
# ═══════════════════════════════════════════════════════════════════

# Base URL: The API's address
base = "https://api.worldbank.org/v2"

# Endpoint: What we want (country/indicator)
country = "USA"                    # Try: "FRA", "CHN", "BRA"
indicator = "NY.GDP.MKTP.CD"       # GDP in current US$

# Assemble the URL
url = f"{base}/country/{country}/indicator/{indicator}"
print(f"URL built: {url}")

# Parameters: Filters added to the URL as ?key=value&key=value
params = {
    "format": "json",      # Return data as JSON (not XML)
    "date": "2015:2022",   # Get data from 2015 to 2022
    "per_page": 100        # Maximum records per page
}

# ═══════════════════════════════════════════════════════════════════
# STEP 2: SEND THE REQUEST
# ═══════════════════════════════════════════════════════════════════

# requests.get() combines url + params and sends to server
response = requests.get(url, params=params)
print(f"Status Code: {response.status_code}")  # 200 = success

# Parse the JSON response into a Python object
data = response.json()

# World Bank returns [metadata, records] - we want index 1
records = data[1]

# Convert to DataFrame using list comprehension
df = pd.DataFrame([{
    'country': r['country']['value'],  # Extract country name
    'year': int(r['date']),           # Convert year to integer
    'gdp': r['value']                 # GDP value
} for r in records if r['value'] is not None])

print(df)
* Stata requires additional packages for API calls
* The wbopendata package makes World Bank data easy to access

* Install the World Bank Open Data package (run once)
ssc install wbopendata

* ═══════════════════════════════════════════════════════════════════
* In Stata, wbopendata BUILDS THE URL FOR YOU behind the scenes
* You provide the "bricks" as options:
* ═══════════════════════════════════════════════════════════════════

* Define our "bricks"
local country "USA"              // Try: "FRA", "CHN", "BRA"
local indicator "NY.GDP.MKTP.CD" // GDP in current US$
local years "2015:2022"         // Time range

* wbopendata combines these into an API call
wbopendata, indicator(`indicator') country(`country') year(`years') clear

* Display the data
list countryname year ny_gdp_mktp_cd
# Load required libraries
library(httr)
library(jsonlite)
library(tidyverse)

# ═══════════════════════════════════════════════════════════════════
# STEP 1: BUILD THE URL (brick by brick)
# ═══════════════════════════════════════════════════════════════════

# Base URL: The API's address
base <- "https://api.worldbank.org/v2"

# Endpoint: What we want (country/indicator)
country <- "USA"                   # Try: "FRA", "CHN", "BRA"
indicator <- "NY.GDP.MKTP.CD"      # GDP in current US$

# Assemble the URL using paste0()
url <- paste0(base, "/country/", country, "/indicator/", indicator)
cat("URL built:", url, "\n")

# ═══════════════════════════════════════════════════════════════════
# STEP 2: SEND THE REQUEST
# ═══════════════════════════════════════════════════════════════════

# GET() combines url + query params and sends to server
response <- GET(url, query = list(
  format = "json",      # Request JSON format
  date = "2015:2022",   # Year range
  per_page = 100        # Max results
))

# Check response status
cat("Status:", status_code(response), "\n")

# Parse JSON response
data <- content(response, "text") %>% fromJSON()

# Extract records (second element of response list)
records <- data[[2]]

# Create tibble (tidyverse data frame)
df <- tibble(
  country = records$country$value,      # Country name
  year = as.integer(records$date),      # Year as integer
  gdp = records$value                    # GDP value
) %> filter(!is.na(gdp))              # Remove NAs

print(df)
Python Output
Status Code: 200 country year gdp 0 United States 2022 2.546206e+13 1 United States 2021 2.331481e+13 2 United States 2020 2.126379e+13 3 United States 2019 2.152748e+13 4 United States 2018 2.056242e+13 5 United States 2017 1.961152e+13 6 United States 2016 1.872352e+13 7 United States 2015 1.823893e+13
Stata Output
. wbopendata, indicator(NY.GDP.MKTP.CD) country(USA) year(2015:2022) clear (8 vars, 8 obs) . list countryname year ny_gdp_mktp_cd +------------------------------------------+ | countryname year ny_gdp_mktp_cd | |------------------------------------------| 1. | United States 2022 2.5462e+13 | 2. | United States 2021 2.3315e+13 | 3. | United States 2020 2.1264e+13 | 4. | United States 2019 2.1527e+13 | 5. | United States 2018 2.0562e+13 | 6. | United States 2017 1.9612e+13 | 7. | United States 2016 1.8724e+13 | 8. | United States 2015 1.8239e+13 | +------------------------------------------+
R Output (RStudio Console)
Status: 200 # A tibble: 8 x 3 country year gdp <chr> <int> <dbl> 1 United States 2022 25462056000000 2 United States 2021 23314810000000 3 United States 2020 21263790000000 4 United States 2019 21527480000000 5 United States 2018 20562420000000 6 United States 2017 19611520000000 7 United States 2016 18723520000000 8 United States 2015 18238930000000

Research Project: Indicator Codes

For our "Climate Vulnerability and Economic Growth" project, we'll use these World Bank indicator codes. You plug these codes into your API URL to fetch specific data:

Indicator Code Description
NY.GDP.MKTP.CDGDP (current US$)
NY.GDP.PCAP.CDGDP per capita (current US$)
EN.ATM.CO2E.KTCO2 emissions (kt)
EN.ATM.CO2E.PCCO2 emissions per capita (metric tons)
SP.POP.TOTLTotal population
EG.USE.PCAP.KG.OEEnergy use per capita (kg oil equivalent)
AG.LND.FRST.ZSForest area (% of land area)

Full list: data.worldbank.org/indicator

API Authentication

Why Authentication?

Many APIs require an API key to track usage and prevent abuse. The World Bank API is free and doesn't require authentication, but others (like FRED, Alpha Vantage, OpenAI) do. You typically get a key by registering on the provider's website.

# Example: Using an API key (FRED API)
import requests
import os

# NEVER hardcode API keys in your code!
# Instead, use environment variables for security
api_key = os.environ.get('FRED_API_KEY')
# Set in terminal: export FRED_API_KEY=your_key_here

url = "https://api.stlouisfed.org/fred/series/observations"
params = {
    "series_id": "GDP",       # The data series to fetch
    "api_key": api_key,        # Your authentication key
    "file_type": "json"       # Response format
}

response = requests.get(url, params=params)
Security Warning

Never commit API keys to Git! We haven't covered Git yet (that's in Module 8), but this is important enough to mention now. Git is a version control system that tracks all your code changes—including any secrets you accidentally include.

Use environment variables or a .env file (and add it to .gitignore). Exposed keys can be stolen and abused, potentially costing you money or compromising your accounts.

We'll explain Git, commits, and .gitignore properly in Module 8: Git & GitHub.

Next Steps

In the next module (Data Cleaning), we'll take the data we fetched and:

  • Handle missing values in our GDP and CO2 data
  • Merge with additional datasets
  • Create derived variables for analysis