Replication Package for "Regulatory Leakage Among Financial Advisors: Evidence From FINRA Regulation of 'Bad' Brokers"
This replication package contains all code and documentation needed to reproduce the analysis and figures in the above-named study. The project examines how financial advisors with misconduct records migrate between different regulatory regimes (FINRA, SEC investment advisors, state investment advisors, and insurance producers) in response to increased regulatory scrutiny.
The code constructs the analysis files from multiple regulatory and public data sources primarily using SAS and R. The main notebooks run all code to generate the data for the figures and tables in the paper. Replication requires a large merged dataset and several hours of compute time (see below).
bad_brokers/
├── README.md # This file
├── 01_merge.ipynb # Data processing and merging
├── 02_analysis.ipynb # Main regression analyses and figures
├── pixi.toml # Package management configuration
└── data/ # Symlinked to shared data directory
├── input/ # Raw input data files
│ ├── NAIC_state_year_clean.csv # NAIC state insurance department data
│ ├── NPN_LOA_anon.csv # Insurance lines of authority (anonymized)
│ ├── TX_compl.csv # Texas complaint data
│ ├── bls_advisor_salary.xlsx # BLS wage data
│ ├── drp_history_anon.csv # Disciplinary actions (anonymized)
│ ├── exam_history_anon.csv # Professional exams (anonymized)
│ ├── firm_crds.csv # Firm identifiers
│ ├── firm_summary.csv # Firm characteristics
│ ├── ia_bd_reg3_anon.csv # Investment advisor registrations (anonymized)
│ ├── ins_assets.csv # Insurance industry assets data
│ ├── ins_reg_history_anon.csv # Insurance registrations (anonymized)
│ ├── new_registrations.csv # New registration statistics
│ ├── producer_matches_anon.csv # Name matching results (anonymized)
│ ├── reg_history.csv # Registration history
│ ├── state_history_anon.csv # State registrations (anonymized)
│ └── wgnd_2_0_name-gender-code.csv # Gender name dictionary
└── output/ # SAS output files created by 01_merge.ipynb
| Data Name | File(s) | Provided? | Source/Access |
|---|---|---|---|
| FINRA BrokerCheck | data/input/drp_history_anon.csv, reg_history.csv, exam_history_anon.csv, firm_crds.csv, firm_summary.csv | Yes | FINRA, see https://brokercheck.finra.org/ |
| SEC IAPD | data/input/state_history_anon.csv, ia_bd_reg3_anon.csv | Yes | SEC IAPD, https://adviserinfo.sec.gov/ |
| State Insurance Producer | data/input/ins_reg_history_anon.csv, NPN_LOA_anon.csv, producer_matches_anon.csv | Yes | State insurance departments, NAIC |
| NAIC State Data | data/input/NAIC_state_year_clean.csv | Yes | National Association of Insurance Commissioners |
| Insurance Assets | data/input/ins_assets.csv | Yes | Industry reports |
| New Registrations | data/input/new_registrations.csv | Yes | Regulatory filings |
| Texas Complaints | data/input/TX_compl.csv | Yes | Texas Department of Insurance |
| BLS Wage Data | data/input/bls_advisor_salary.xlsx | Yes | U.S. Bureau of Labor Statistics |
| Gender Data | data/input/wgnd_2_0_name-gender-code.csv | Yes | Gender API, see file |
| Merged Analysis Data | data/output/all_reg.sas7bdat, last.sas7bdat, tx_all.sas7bdat, tx_recidivism.sas7bdat, cma_map.sas7bdat, cma_drop_last.sas7bdat, sankey2.sas7bdat | No | Derived |
All data files are described above. The main analysis dataset (all_reg.sas7bdat) is approximately 8GB and contains ~7 million advisor-year observations. All required data files in the data/output/ directory are generated and not included in the archive.
See FOIA Log below for details on our public records requests for insurance producer registration data.
- R (v4.4+ recommended)
- data.table (v1.16.4+)
- haven (v2.5+)
- lubridate (v1.9+)
- lfe (v3.1.1+)
- fixest (v0.12+)
- stargazer (v5.2.3+)
- ggplot2 (v3.4+)
- scales (v1.2+)
- maps (v3.4+)
- shadowtext (v0.1+)
- ggthemes (v5.0+)
- IRdisplay (v1.1+)
- plotly (v4.10+)
- tidyverse (v2.0+)
- binsreg
- mapproj (v1.2.12+)
- marginaleffects (v0.10+, for ggfixest)
- dreamerr (for ggfixest)
- legendry (v0.2+, for ggfixest)
- ggfixest (v0.3+, installed from CRAN tarballs with pak)
- Python 3.12+ (for working with R/SAS kernels)
- SAS (for data prep)
- pixi (recommended: for environment management)
- OS: Linux (recommended), MacOS, or Windows
- RAM: 32GB+ recommended for full data processing
- Disk: 20GB+ free space (main dataset is ~8GB)
- Runtime: Data processing 2-4 hours; analysis 30-60 minutes
This project uses pixi for environment management. To set up the environment:
# Install pixi (if not already installed)
# See https://pixi.sh/latest/
# Install all dependencies
pixi install
# Activate the environment and start Jupyter
pixi shell
jupyter labA portable environment archive environment.tar (~586MB) and the required data files data.tar.gz (~548MB) are available through the Zenodo DOI above. Download both archives from Zenodo, then to use the environment:
# Install pixi-pack for unpacking (if not already installed)
# See https://github.com/Quantco/pixi-pack
# Extract the data files
tar -xzf data.tar.gz
# Unpack the environment
pixi-unpack environment.tar
# Install the custom ggfixest package from the included tarball
pixi shell
Rscript -e "pak::pak('local::r_packages/src/contrib/ggfixest_0.3.0.tar.gz')"
# Start Jupyter
jupyter labThe environment contains all conda packages plus an offline CRAN repository with ggfixest and 83 dependency tarballs (~65MB), ready for installation with pak.
- Set up the environment using the portable environment from Zenodo or install packages manually (see Software Requirements).
- If using Zenodo archives, extract
data.tar.gzto get all required data files in the data/input directory. - Run
01_merge.ipynbto process and merge data (runtime: 2-4 hours). - Run
02_analysis.ipynbto generate all tables and figures (runtime: 30-60 minutes). - Outputs (tables and figures) are generated within the notebooks.
The analysis uses data from multiple regulatory sources:
- FINRA BrokerCheck: Individual broker records, misconduct, and employment history
- SEC Investment Adviser Public Disclosure (IAPD): SEC investment advisor records
- State Insurance Producer Databases: State-level insurance producer registrations
- NAIC Data: State insurance department resources and enforcement statistics
- Bureau of Labor Statistics: Wage data by occupation and state
- Lines of Authority (LOA) Data: Insurance product classifications
indv_crd: Individual Central Registration Depository number (FINRA)has_srs_ever: Ever had serious misconduct (Specified Risk Event)form_cma: High-risk broker (2+ SREs or 1+ criminal matter)bc,ia,ins: Current FINRA, investment advisor, insurance registrationssec_ia,state_ia: SEC vs. state investment advisor registrationadd_ia,drop_bc: Indicators for adding IA or dropping BC registration
has_mis_ever: Ever had misconduct disclosurehas_srs_ever: Ever had serious misconducthas_drp_ever: Ever had disciplinary actioncount_srs: Count of serious misconduct eventsfine_*: Dollar amounts of fines by type
female: Gender indicatoryears_exp: Years of experience in financial servicesn_exams: Number of professional exams passedretail_broker: Serves retail clientsqual_va,qual_nasaa: Qualifications for variable annuities and state advisory work
department_budget: State insurance department budgetinsurance_staff: Number of insurance department staffdollar_fines,n_fines: Dollar amount and count of fines issuedn_complaints,n_inquiries: Consumer complaints and inquiriesbroker_p50,insurance_p50: Median wages by occupation
The data merging process involves several key steps:
- Load FINRA BrokerCheck Data: Individual advisor records with employment history and misconduct
- Process IAPD Data: SEC investment advisor registrations and firm information
- Merge Insurance Data: State insurance producer records linked by name matching
- Add State-Level Data: NAIC insurance department characteristics
- Create Panel Dataset: Individual-year observations from 2012-2022
- Generate Analysis Variables: Migration indicators, misconduct measures, controls
The main analyses include:
Examine how misconduct relates to current regulatory registrations:
felm(bc ~ has_srs_ever + controls | firm_county_year | 0 | firm, data)
felm(ins ~ has_srs_ever + controls | firm_county_year | 0 | firm, data)Analyze likelihood of adding/dropping registrations:
felm(add_ia ~ has_srs + controls | firm_county_year | 0 | firm, data[bc==1 & ia==0])
felm(drop_bc ~ has_srs + controls | firm_county_year | 0 | firm, data[bc==1])Examine impact of 2019 regulatory changes on high-risk brokers:
felm(drop_bc ~ form_cma * post_2018 * insurance + controls | firm_county_year | 0 | firm, data)Below is the list of the states that we FOIA'd to get insurance producer registration data. The Data Received column contains the date when we received the request, a 'Y' if we obtained the data, but did not log the exact date, and 'N' if we did not receive any response. For some states we were able to download the data ourselves, and the link to the data is provided.
| State | FOIA Submission Date | Data Received |
|---|---|---|
| Alabama | 2022-04-25 | N |
| Alaska | 2022-04-25 | 2022-05-05 |
| Arizona | 2022-04-25 | N |
| Arkansas | 2022-04-25 | N |
| California | 2022-04-25 | N |
| Colorado | 2022-04-25 | 2022-04-26 |
| Connecticut | 2022-04-27 | 2022-04-27 |
| Delaware | 2022-04-25 | N |
| Florida | 2022-04-25 | https://licenseesearch.fldfs.com/BulkDownload |
| Georgia | 2022-04-25 | N |
| Hawaii | 2022-04-25 | N |
| Idaho | 2022-04-25 | Y |
| Illinois | 2022-04-27 | 2022-06-13 |
| Indiana | 2022-04-27 | 2022-07-08 |
| Iowa | 2022-04-27 | 2022-05-02 |
| Kansas | 2022-04-25 | N |
| Kentucky | 2022-04-25 | 2022-04-29 |
| Louisiana | 2022-04-25 | https://www.ldi.la.gov/industry/producer-adjuster/search-for-producers-and-adjusters/producer-adjuster-licensee-report |
| Maine | 2022-04-25 | Y |
| Maryland | 2022-04-25 | Y |
| Massachusetts | 2022-04-27 | 2022-05-21 |
| Michigan | 2022-04-25 | 2022-05-12 |
| Minnesota | 2022-04-25 | N |
| Mississippi | 2022-04-27 | 2022-04-27 |
| Missouri | 2022-04-27 | 2022-05-20 |
| Montana | 2022-04-27 | N |
| Nebraska | 2022-04-27 | 2022-04-27 |
| Nevada | 2022-04-26 | N |
| New Hampshire | 2022-04-26 | N |
| New Jersey | 2022-04-26 | 2022-04-27 |
| New Mexico | 2022-04-26 | N |
| New York | 2022-04-26 | 2022-06-24 |
| North Carolina | 2022-04-26 | 2022-07-07 |
| North Dakota | 2022-04-26 | 20220-04-29 |
| Ohio | 2022-04-26 | 2022-04-26 |
| Oklahoma | 2022-04-26 | 2022-05-03 |
| Oregon | 2022-04-26 | N |
| Pennsylvania | 2022-04-26 | N |
| Rhode Island | 2022-04-26 | 2022-05-03 |
| South Carolina | 2022-04-26 | N |
| South Dakota | 2022-04-26 | Y |
| Tennessee | 2022-04-26 | N |
| Texas | 2022-04-26 | 2022-04-27; https://data.texas.gov/dataset/Insurance-complaints-All-data/ubdr-4uff/about_data |
| Utah | 2022-04-26 | 2022-05-02 |
| Vermont | 2022-04-26 | https://dfr.vermont.gov/industry/insurance/producer-and-individual-licensing |
| Virginia | 2022-04-26 | N |
| Washington | 2022-04-26 | 2022-04-27 |
| West Virginia | 2022-04-26 | N |
| Wisconsin | 2022-04-26 | 2022-07-13 |
| Wyoming | 2022-04-26 | 2022-07-15 |