This repository contains 140+ million federal employee records from 1998-2024, processed from the official FedScope Employment Cube datasets.
You can use this data in two ways:
Download individual Parquet files directly from GitHub without cloning:
Browse available files: fedscope_data/parquet/
⚠️ Large Repository Warning: This repo is ~3.7GB due to the included data files.
Then load files locally:
- 72 quarterly snapshots from March 1998 through September 2024
- 1.7-2.3 million employees per quarter
- 52 fields including demographics, job details, and compensation
- Lookup tables joined for easier usage
🚀 Quick Start: Run examples.py for comprehensive usage examples!
💡 Note: The dataset uses string types for numeric fields like employment and salary. See examples.py for proper handling.
- fedscope_data/parquet/ - 72 quarterly Parquet files (2.3GB total)
- fedscope_data/raw/ - Original ZIP files from OPM
- main.py - Processing pipeline to recreate Parquet files
- Additional Data Documentation
- 1998-2008: September only (annual snapshots)
- 2009: September, December
- 2010-2024: Full quarterly coverage (March, June, September, December, ending in September 2024)
The dataset contains both code fields (e.g., agelvl) and description fields (e.g., agelvlt). Use the description fields ending in 't' for analysis - they contain human-readable values.
The 72 quarterly ZIP files are included in fedscope_data/raw/. To recreate the Parquet files:
Options:
Each quarterly dataset contains:
- FACTDATA_*.TXT: Main fact table with employee records (1.7M - 2.2M records per quarter)
- DT*.txt: Lookup tables providing descriptions for coded values
- DTagelvl.txt - Age levels
- DTagy.txt - Agencies
- DTedlvl.txt - Education levels
- DTgsegrd.txt - General Schedule grades
- DTloc.txt - Locations
- DTocc.txt - Occupations
- DTpatco.txt - PATCO categories
- DTpp.txt - Pay plans (from 2017 onward)
- DTppgrd.txt - Pay plans and grades
- DTsallvl.txt - Salary levels
- DTstemocc.txt - STEM occupations
- DTsuper.txt - Supervisory status
- DTtoa.txt - Types of appointment
- DTwrksch.txt - Work schedules
- DTwkstat.txt - Work status
- Source: U.S. Office of Personnel Management (OPM) FedScope Employment Cube
- Official Site: https://www.fedscope.opm.gov/
- License: Public domain (U.S. Government work)
This is an independent data processing project. For official federal employment statistics, visit fedscope.opm.gov.
.png)
