Extract ICD codes and map them to phecodes for phenome-wide association studies.
Generate phecode counts from ICD code data. Maps ICD codes to phecodes and aggregates counts per person-phecode combination.
phecode_version: Phecode version to use, "X" or "1.2" (str, default: "X")icd_version: ICD mapping version, "US", "WHO", or "custom" (str, default: "US")phecode_map_file_path: Path to custom phecode mapping table (str, optional)output_file_path: Path for output TSV file (str, optional)
from phetk.phecode import Phecode
# All of Us platform
phecode = Phecode(platform="aou")
phecode.count_phecode(
phecode_version="X",
icd_version="US",
output_file_path="phecode_counts.tsv"
)phetk phecode count-phecode \
--platform "aou" \
--phecode_version "X" \
--icd_version "US" \
--output_file_path "phecode_counts.tsv"Calculate age at first phecode event for each participant. Adds age_at_first_event column to phecode counts.
phecode_count_file_path: Path to phecode counts TSV file (str, required)output_file_path: Path for output file with age calculations (str, optional)
from phetk.phecode import Phecode
phecode = Phecode(platform="aou")
phecode.add_age_at_first_event(
phecode_count_file_path="phecode_counts.tsv",
output_file_path="phecode_counts_with_age.tsv"
)phetk phecode add-age-at-first-event \
--phecode_count_file_path "phecode_counts.tsv" \
--output_file_path "phecode_counts_with_age.tsv"Calculate time from study start to first phecode event for survival analysis. Adds phecode_time_to_event column.
phecode_count_file_path: Path to phecode counts CSV/TSV file (str, required)cohort_file_path: Path to cohort file with study start dates (str, required)study_start_date_col: Column name containing study start dates (str, required)time_unit: Time unit for calculations, "days" or "years" (str, default: "days")output_file_path: Path for output file (str, optional)
from phetk.phecode import Phecode
# Static method - no instance needed
Phecode.add_phecode_time_to_event(
phecode_count_file_path="phecode_counts.tsv",
cohort_file_path="cohort.tsv",
study_start_date_col="study_start_date",
time_unit="years",
output_file_path="phecode_counts_with_time.tsv"
)phetk phecode add-phecode-time-to-event \
--phecode_count_file_path "phecode_counts.tsv" \
--cohort_file_path "cohort.tsv" \
--study_start_date_col "study_start_date" \
--time_unit "years" \
--output_file_path "phecode_counts_with_time.tsv"phecode = Phecode(platform="aou")phecode = Phecode(platform="custom", icd_file_path="path/to/icd_data.tsv")Required columns for custom ICD data:
person_id: Participant identifierdate: Date of ICD code occurrenceICD: ICD code valuevocabulary_id: "ICD9CM" or "ICD10CM" (or useflagcolumn with values 9/10)
Example:
| person_id | date | vocabulary_id | ICD |
|---|---|---|---|
| 13579 | 2010-01-11 | ICD9CM | 786.2 |
| 13579 | 2017-12-04 | ICD10CM | R05.1 |
Required columns for custom phecode mapping:
phecode: Phecode valueICD: ICD code valueflag: ICD version (9 or 10)sex: Applicable sex ("Male", "Female", or "Both")phecode_string: Phecode descriptionphecode_category: Phecode categoryexclude_range: Exclusion range for phecode 1.2