clinops.preprocess¶
clinops.preprocess.outliers.ClinicalOutlierClipper
¶
Detect and clip physiologically impossible values in clinical DataFrames.
Uses published physiological bounds to identify values that are impossible regardless of patient state. Values outside bounds are either clipped to the boundary (default) or replaced with NaN.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bounds
|
dict[str, BoundSpec] | None
|
Dict mapping column name to BoundSpec. Defaults to combined VITAL_BOUNDS + LAB_BOUNDS. Pass a custom dict to override. |
None
|
action
|
str
|
What to do with out-of-range values:
- |
'clip'
|
extra_bounds
|
dict[str, BoundSpec] | None
|
Additional BoundSpec entries to merge with the default bounds. Useful for site-specific or assay-specific ranges. |
None
|
strict
|
bool
|
If True, raise ValueError when a column in bounds is not found in the DataFrame. If False (default), silently skip missing cols. |
False
|
Examples:
>>> clipper = ClinicalOutlierClipper()
>>> clean_df = clipper.fit_transform(vitals_df)
>>> print(clipper.report())
Source code in clinops/preprocess/outliers.py
fit_transform
¶
Clip or flag outliers in df using the configured bounds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input DataFrame. Only columns present in bounds are processed. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with outliers handled according to |
Source code in clinops/preprocess/outliers.py
report
¶
Return a summary DataFrame of all detected outliers.
Returns an empty DataFrame if fit_transform has not been called or no outliers were detected.
Source code in clinops/preprocess/outliers.py
add_bounds
¶
clinops.preprocess.units.UnitNormalizer
¶
Normalize clinical measurements to canonical units.
Detects non-standard units via a companion unit column or explicit mapping and converts values in-place.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
column_unit_map
|
dict[str, str] | None
|
Dict mapping value column name → unit column name.
e.g. |
None
|
explicit_conversions
|
dict[str, ConversionSpec] | None
|
Dict mapping value column name → ConversionSpec to apply
unconditionally (ignores unit columns).
e.g. |
None
|
target_units
|
dict[str, str] | None
|
Dict mapping column name → target unit string. Defaults to
|
None
|
Examples:
Normalize a glucose column that is mixed mg/dL and mmol/L:
>>> normalizer = UnitNormalizer(column_unit_map={"glucose": "glucose_unit"})
>>> df = normalizer.transform(df)
Convert all temperatures from °F to °C unconditionally:
>>> normalizer = UnitNormalizer(
... explicit_conversions={"temperature": UNIT_CONVERSIONS["temperature__f__c"]}
... )
>>> df = normalizer.transform(df)
Source code in clinops/preprocess/units.py
transform
¶
Apply unit normalization to df.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input DataFrame. Modified copy is returned. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
|
Source code in clinops/preprocess/units.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 | |
report
¶
Return a summary of all conversions applied.
clinops.preprocess.icd.ICDMapper
¶
Map ICD-9-CM diagnosis codes to ICD-10-CM equivalents.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mappings
|
list[tuple[str, str, str]] | None
|
Custom list of (icd9, icd10, description) tuples. If None, uses the built-in curated mapping table. |
None
|
default_value
|
str | None
|
Value to use when no mapping is found. Default |
None
|
Examples:
Map a column of ICD-9 codes to ICD-10:
Map in-place with version detection:
Get the ICD-10 chapter for a code:
Source code in clinops/preprocess/icd.py
from_gem_file
classmethod
¶
Load from a CMS General Equivalence Mapping (GEM) file.
The official CMS GEM files are available at: https://www.cms.gov/medicare/coding-billing/icd-10-codes
The file should be a fixed-width or tab-delimited text file with columns: icd9_code, icd10_code, flags.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to the CMS GEM forward mapping file (2018 format). |
required |
Source code in clinops/preprocess/icd.py
map_code
¶
Map a single ICD-9 code to its ICD-10 equivalent.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
icd9_code
|
str
|
ICD-9-CM code string (with or without decimal point). |
required |
Returns:
| Type | Description |
|---|---|
str or None
|
ICD-10-CM code, or |
Source code in clinops/preprocess/icd.py
map_series
¶
Map a Series of ICD-9 codes to ICD-10.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
Series
|
String Series of ICD-9-CM codes. |
required |
Returns:
| Type | Description |
|---|---|
Series
|
ICD-10-CM codes. Unmapped codes become |
Source code in clinops/preprocess/icd.py
harmonize
¶
Harmonize a mixed ICD-9/ICD-10 column to ICD-10 in-place.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input DataFrame. |
required |
code_col
|
str
|
Column containing ICD codes. |
required |
version_col
|
str
|
Column indicating ICD version for each row. |
required |
icd9_value
|
str
|
Value in |
'9'
|
icd10_value
|
str
|
Value in |
'10'
|
output_col
|
str | None
|
Column to write harmonized codes to. Defaults to |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
|
Source code in clinops/preprocess/icd.py
chapter
¶
Return the ICD-10 chapter description for a code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
icd10_code
|
str
|
ICD-10-CM code string (e.g. |
required |
Returns:
| Type | Description |
|---|---|
str
|
Chapter description, or |