clinops¶
Clinical ML Pipeline Toolkit — production-grade data loading, preprocessing, and time-series feature engineering for healthcare AI research.
Every healthcare AI project starts with the same two weeks of plumbing: loading MIMIC tables without hitting memory limits, clipping physiologically impossible values before they corrupt your model, normalizing glucose from mmol/L to mg/dL across sites, building time-series windows that handle clinical missingness correctly, and splitting data without leaking patients across folds. clinops packages those hard-won patterns into a single, well-tested library so your first notebook is actual science.
Modules¶
| Module | What it does |
|---|---|
clinops.ingest |
Loaders for MIMIC-IV, MIMIC-III, FHIR R4, and flat CSV/Parquet with schema validation |
clinops.preprocess |
Outlier clipping with physiological bounds, unit normalization, ICD-9→10 mapping |
clinops.temporal |
Sliding/tumbling windows, gap-aware imputation, lag features, cohort alignment |
clinops.split |
Temporal, patient-level, and stratified patient train/test splitting |
clinops.monitor |
Distribution drift detection (PSI + KS) and data quality alerting for production pipelines |
clinops.orchestrate |
GCS/S3 artifact storage and AWS Step Functions pipeline builder |
Quickstart¶
from clinops.ingest import MimicTableLoader
from clinops.preprocess import ClinicalOutlierClipper
from clinops.temporal import TemporalWindower, ImputationStrategy
from clinops.split import StratifiedPatientSplitter
# Load MIMIC-IV vitals
tbl = MimicTableLoader("/data/mimic-iv-2.2")
charts = tbl.chartevents(subject_ids=[10000032, 10000980])
# Clip physiologically impossible values
charts = ClinicalOutlierClipper(action="clip").fit_transform(charts)
# Build 24-hour windows with 6-hour stride
windows = TemporalWindower(window_hours=24, step_hours=6).fit_transform(
df=charts,
id_col="subject_id",
time_col="charttime",
feature_cols=["heart_rate", "spo2", "resp_rate"],
)
# Patient-stratified split — no leakage
result = StratifiedPatientSplitter(
id_col="subject_id",
outcome_col="hospital_expire_flag",
test_size=0.2,
).split(windows)
Examples¶
- Getting started notebook — module-by-module walkthrough with synthetic data, no MIMIC access needed
- ICU mortality pipeline notebook — end-to-end pipeline from raw EHR tables to ML-ready arrays
Installation¶
Requires Python 3.12+.