clinops.ingest¶
clinops.ingest.mimic_tables.MimicTableLoader
¶
Pre-built loader for the MIMIC-IV tables researchers use most.
Wraps :class:~clinops.ingest.MimicLoader and adds:
- Pre-validated schemas for
chartevents,labevents,admissions,diagnoses_icd, andicustays. with_ref_rangeflag onlabeventsto retain or drop the reference range columns (noisy on many MIMIC exports).primary_onlyflag ondiagnoses_icdto keep onlyseq_num == 1(the principal diagnosis).with_los_bandflag onicustaysto add a categoricallos_bandcolumn (<1d,1-3d,3-7d,>7d) useful as a stratification variable.- A
summary()method that prints row counts and null rates for all five tables without loading full data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mimic_path
|
str | Path
|
Root directory of the MIMIC-IV dataset. |
required |
version
|
str
|
MIMIC-IV version string or |
'auto'
|
strict_validation
|
bool
|
Raise on missing required columns when |
True
|
chunk_size
|
int | None
|
Pass through to underlying :class: |
None
|
Examples:
>>> tbl = MimicTableLoader("/data/mimic-iv-2.2")
>>> charts = tbl.chartevents(subject_ids=[10000032, 10000980])
>>> dx = tbl.diagnoses_icd(subject_ids=[10000032], primary_only=True)
>>> stays = tbl.icustays(subject_ids=[10000032], with_los_band=True)
Source code in clinops/ingest/mimic_tables.py
chartevents
¶
chartevents(
subject_ids=None,
hadm_ids=None,
stay_ids=None,
item_ids=None,
start_time=None,
end_time=None,
)
Load ICU charted observations with schema validation.
Returns a DataFrame with columns:
subject_id, hadm_id, stay_id, itemid,
charttime, valuenum, valueuom.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_ids
|
Sequence[int] | None
|
Restrict to these patients. |
None
|
hadm_ids
|
Sequence[int] | None
|
Restrict to these hospital admissions. |
None
|
stay_ids
|
Sequence[int] | None
|
Restrict to these ICU stays. |
None
|
item_ids
|
Sequence[int] | None
|
Restrict to these MIMIC itemids (see |
None
|
start_time
|
str | None
|
ISO datetime strings for time range filtering. |
None
|
end_time
|
str | None
|
ISO datetime strings for time range filtering. |
None
|
Source code in clinops/ingest/mimic_tables.py
labevents
¶
labevents(
subject_ids=None,
hadm_ids=None,
item_ids=None,
start_time=None,
end_time=None,
with_ref_range=False,
)
Load hospital laboratory results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
with_ref_range
|
bool
|
If |
False
|
Source code in clinops/ingest/mimic_tables.py
admissions
¶
Load hospital admission records.
Returns a DataFrame with columns:
subject_id, hadm_id, admittime, dischtime,
deathtime, admission_type, admission_location,
discharge_location, insurance, hospital_expire_flag.
Source code in clinops/ingest/mimic_tables.py
diagnoses_icd
¶
Load ICD-9/ICD-10 diagnosis codes per hospital admission.
MIMIC-IV mixes ICD-9-CM and ICD-10-CM codes. The icd_version
column contains 9 or 10 — use
:class:~clinops.preprocess.ICDMapper to harmonize to a single
version before modelling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
primary_only
|
bool
|
If |
False
|
Source code in clinops/ingest/mimic_tables.py
icustays
¶
Load ICU stay metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
with_los_band
|
bool
|
If |
False
|
Source code in clinops/ingest/mimic_tables.py
summary
¶
Print a quick-look table of row counts and null rates for all five tables without loading the full data into memory.
Uses pd.read_csv with nrows=0 to read headers, then scans
only the first 10,000 rows to estimate null rates.
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Source code in clinops/ingest/mimic_tables.py
clinops.ingest.mimic.MimicLoader
¶
Loader for MIMIC-IV clinical database tables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mimic_path
|
str | Path
|
Root directory of the MIMIC-IV dataset. Should contain
|
required |
version
|
str
|
MIMIC-IV version string (e.g. |
'auto'
|
strict_validation
|
bool
|
If True (default), raise |
True
|
chunk_size
|
int | None
|
If set, return a |
None
|
Examples:
>>> loader = MimicLoader("/data/mimic-iv-2.2")
>>> charts = loader.chartevents(subject_ids=[10000032])
>>> labs = loader.labevents(hadm_ids=[20000019])
Source code in clinops/ingest/mimic.py
chartevents
¶
chartevents(
subject_ids=None,
hadm_ids=None,
stay_ids=None,
item_ids=None,
start_time=None,
end_time=None,
)
Load ICU charted observations (vitals, GCS, ventilator settings, etc.).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_ids
|
Sequence[int] | None
|
Filter to these patients. |
None
|
hadm_ids
|
Sequence[int] | None
|
Filter to these hospital admissions. |
None
|
stay_ids
|
Sequence[int] | None
|
Filter to these ICU stays. |
None
|
item_ids
|
Sequence[int] | None
|
Filter to these MIMIC-IV itemids (see d_items). |
None
|
start_time
|
str | None
|
ISO datetime string — exclude rows before this time. |
None
|
end_time
|
str | None
|
ISO datetime string — exclude rows after this time. |
None
|
Source code in clinops/ingest/mimic.py
labevents
¶
Load hospital laboratory results.
Source code in clinops/ingest/mimic.py
admissions
¶
Load hospital admission records.
Source code in clinops/ingest/mimic.py
patients
¶
icustays
¶
Load ICU stay metadata including LOS.
Source code in clinops/ingest/mimic.py
prescriptions
¶
Load medication prescriptions.
Source code in clinops/ingest/mimic.py
inputevents
¶
Load ICU fluid input events.
Source code in clinops/ingest/mimic.py
d_items
¶
clinops.ingest.mimic_iii.MimicIIILoader
¶
Loader for the MIMIC-III Clinical Database.
Provides the same filtering interface as MimicLoader (MIMIC-IV) so
that analysis code can be reused across both datasets with minimal changes.
Key differences from MIMIC-IV:
- Flat directory — no hosp/ / icu/ split.
- Uppercase column names in source files — normalised to lowercase on load.
- ICU stay key is icustay_id (not stay_id).
- ICD-9-CM only — diagnoses_icd has icd9_code, not icd_code.
- inputevents is split into inputevents_mv (MetaVision) and
inputevents_cv (CareVue); use :meth:inputevents to get both merged.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mimic_path
|
str | Path
|
Root directory of the MIMIC-III dataset. Should contain files like
|
required |
strict_validation
|
bool
|
If |
True
|
chunk_size
|
int | None
|
If set, large tables ( |
None
|
Examples:
>>> loader = MimicIIILoader("/data/mimic-iii-clinical-database-1.4")
>>> charts = loader.chartevents(subject_ids=[40124])
>>> labs = loader.labevents(hadm_ids=[198765])
>>> dx = loader.diagnoses_icd(subject_ids=[40124], primary_only=True)
Source code in clinops/ingest/mimic_iii.py
chartevents
¶
chartevents(
subject_ids=None,
hadm_ids=None,
icustay_ids=None,
item_ids=None,
start_time=None,
end_time=None,
)
Load ICU charted observations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_ids
|
Sequence[int] | None
|
Restrict to these patient IDs. |
None
|
hadm_ids
|
Sequence[int] | None
|
Restrict to these hospital admission IDs. |
None
|
icustay_ids
|
Sequence[int] | None
|
Restrict to these ICU stay IDs ( |
None
|
item_ids
|
Sequence[int] | None
|
Restrict to these |
None
|
start_time
|
str | None
|
Exclude rows with |
None
|
end_time
|
str | None
|
Exclude rows with |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns (lowercase): |
Source code in clinops/ingest/mimic_iii.py
labevents
¶
labevents(
subject_ids=None,
hadm_ids=None,
item_ids=None,
start_time=None,
end_time=None,
with_ref_range=False,
)
Load hospital laboratory results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_ids
|
Sequence[int] | None
|
Standard filters — see :meth: |
None
|
hadm_ids
|
Sequence[int] | None
|
Standard filters — see :meth: |
None
|
item_ids
|
Sequence[int] | None
|
Standard filters — see :meth: |
None
|
start_time
|
Sequence[int] | None
|
Standard filters — see :meth: |
None
|
end_time
|
Sequence[int] | None
|
Standard filters — see :meth: |
None
|
with_ref_range
|
bool
|
If |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Source code in clinops/ingest/mimic_iii.py
admissions
¶
Load hospital admission and discharge records.
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Source code in clinops/ingest/mimic_iii.py
diagnoses_icd
¶
Load ICD-9-CM diagnosis codes.
MIMIC-III uses ICD-9-CM exclusively. The column name is icd9_code
(not icd_code as in MIMIC-IV). For cross-dataset compatibility,
a synthetic icd_version column (always 9) is added on load.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subject_ids
|
Sequence[int] | None
|
Standard filters. |
None
|
hadm_ids
|
Sequence[int] | None
|
Standard filters. |
None
|
icd9_codes
|
Sequence[str] | None
|
Restrict to these ICD-9-CM codes (exact match, case-insensitive). |
None
|
primary_only
|
bool
|
If |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Source code in clinops/ingest/mimic_iii.py
icustays
¶
Load ICU stay metadata including length of stay.
Note: The ICU stay key in MIMIC-III is icustay_id, not
stay_id as in MIMIC-IV.
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Source code in clinops/ingest/mimic_iii.py
prescriptions
¶
Load medication prescriptions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
drugs
|
Sequence[str] | None
|
Restrict to these drug names (case-insensitive substring match). |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Source code in clinops/ingest/mimic_iii.py
inputevents
¶
Load ICU fluid input events.
MIMIC-III stores MetaVision (INPUTEVENTS_MV) and CareVue
(INPUTEVENTS_CV) inputs in separate tables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str
|
|
'mv'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
For MetaVision: |
Source code in clinops/ingest/mimic_iii.py
413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 | |
patients
¶
Load patient demographics.
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Source code in clinops/ingest/mimic_iii.py
d_items
¶
Load the item dictionary (itemid → label).
Use this to resolve itemid values in :meth:chartevents.
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Source code in clinops/ingest/mimic_iii.py
d_labitems
¶
Load the lab item dictionary (itemid → label).
Use this to resolve itemid values in :meth:labevents.
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Source code in clinops/ingest/mimic_iii.py
clinops.ingest.fhir.FHIRLoader
¶
Load FHIR R4 resources from JSON bundles or NDJSON exports.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str | Path
|
Path to a FHIR JSON Bundle file, an NDJSON file, or a directory of JSON resource files. |
required |
Examples:
>>> loader = FHIRLoader("/data/fhir_export")
>>> observations = loader.observations()
>>> patients = loader.patients()
Source code in clinops/ingest/fhir.py
patients
¶
Load Patient resources → DataFrame with demographics.
Source code in clinops/ingest/fhir.py
observations
¶
Load Observation resources → long-format DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
category
|
str | None
|
Filter to a FHIR observation category (e.g. "vital-signs", "laboratory"). |
None
|
loinc_codes
|
list[str] | None
|
Filter to specific LOINC codes. |
None
|
Source code in clinops/ingest/fhir.py
conditions
¶
Load Condition resources → DataFrame with ICD/SNOMED codes.
Source code in clinops/ingest/fhir.py
clinops.ingest.flat.FlatFileLoader
¶
Load clinical data from CSV or Parquet flat files with validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to a CSV (.csv, .csv.gz) or Parquet (.parquet, .pq) file. |
required |
schema
|
ClinicalSchema | None
|
Optional ClinicalSchema for validation after loading. |
None
|
id_col
|
str | None
|
Name of the patient/subject identifier column. Used for deduplication reporting. |
None
|
datetime_cols
|
list[str] | None
|
Column names to parse as datetimes. If None, auto-detection is attempted for columns with "time", "date", or "dt" in the name. |
None
|
strict
|
bool
|
If True, raise on schema violations. If False, warn and continue. |
True
|
Examples:
>>> loader = FlatFileLoader("vitals_export.csv", id_col="patient_id")
>>> df = loader.load()
>>> print(loader.summary())
Source code in clinops/ingest/flat.py
load
¶
Load the file, apply cleaning and validation, return DataFrame.
Returns:
| Type | Description |
|---|---|
DataFrame
|
|
Source code in clinops/ingest/flat.py
summary
¶
Return a human-readable summary of the loaded DataFrame.
Source code in clinops/ingest/flat.py
clinops.ingest.schema.ClinicalSchema
dataclass
¶
Declarative schema for a clinical data table.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Human-readable name for this schema (used in error messages). |
required |
columns
|
list[ColumnSpec]
|
List of ColumnSpec objects describing required and optional columns. |
list()
|
allow_extra_columns
|
bool
|
If True (default), columns not in the spec are silently allowed. |
True
|
Example
schema = ClinicalSchema( ... name="vitals", ... columns=[ ... ColumnSpec("subject_id", dtype="int64", nullable=False), ... ColumnSpec("heart_rate", dtype="float64", min_value=0, max_value=300), ... ] ... ) schema.validate(df)
validate
¶
Validate a DataFrame against this schema.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame to validate. |
required |
strict
|
bool
|
If True, raise SchemaValidationError on the first violation. If False, collect all violations and return them as a list. |
True
|
Returns:
| Type | Description |
|---|---|
list[str]
|
Empty list if valid; list of violation messages otherwise. |
Source code in clinops/ingest/schema.py
clinops.ingest.schema.ColumnSpec
dataclass
¶
ColumnSpec(
name,
dtype=None,
nullable=True,
min_value=None,
max_value=None,
allowed_values=list(),
)
Specification for a single column in a clinical table.
clinops.ingest.schema.SchemaValidationError
¶
Bases: ValueError
Raised when a loaded table does not match the expected schema.