clinops.monitor¶
clinops.monitor.drift.DistributionDriftDetector
¶
DistributionDriftDetector(
n_bins=10,
psi_threshold_medium=0.1,
psi_threshold_high=0.2,
run_ks_test=True,
columns=None,
)
Detect distribution drift between a reference dataset and a current batch.
Fit on a reference dataset (typically the training set), then call
detect() on each new batch to get per-column PSI and KS statistics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_bins
|
int
|
Number of equal-frequency bins used to compute PSI. Default 10. Use fewer bins for small datasets (< 500 rows). |
10
|
psi_threshold_medium
|
float
|
PSI threshold for MEDIUM severity. Default 0.1. |
0.1
|
psi_threshold_high
|
float
|
PSI threshold for HIGH severity. Default 0.2. |
0.2
|
run_ks_test
|
bool
|
If True, run a KS two-sample test in addition to PSI. Default True. |
True
|
columns
|
list[str] | None
|
Explicit list of columns to monitor. If None, all numeric columns in the reference DataFrame are monitored. |
None
|
Examples:
>>> detector = DistributionDriftDetector()
>>> detector.fit(train_df)
>>> report = detector.detect(production_batch_df)
>>> print(report.summary())
>>> print(report.to_dataframe())
Source code in clinops/monitor/drift.py
fit
¶
Compute reference statistics from a training/baseline DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Reference DataFrame (typically the training set). |
required |
Returns:
| Type | Description |
|---|---|
DistributionDriftDetector
|
Self, for method chaining. |
Source code in clinops/monitor/drift.py
detect
¶
Compute drift metrics for each fitted column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Current DataFrame to compare against the reference. |
required |
Returns:
| Type | Description |
|---|---|
DriftReport
|
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If |
Source code in clinops/monitor/drift.py
clinops.monitor.drift.DriftReport
dataclass
¶
Structured result from :class:DistributionDriftDetector.
Attributes:
| Name | Type | Description |
|---|---|---|
results |
list[ColumnDriftResult]
|
Per-column drift metrics. |
n_columns_checked |
int
|
Total number of numeric columns evaluated. |
drifted_columns
¶
Return column names with drift at or above min_severity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
min_severity
|
DriftSeverity
|
Minimum severity level to include. Default: MEDIUM. |
MEDIUM
|
Returns:
| Type | Description |
|---|---|
list[str]
|
|
Source code in clinops/monitor/drift.py
to_dataframe
¶
Return per-column results as a DataFrame sorted by PSI descending.
Source code in clinops/monitor/drift.py
summary
¶
Human-readable drift summary.
Source code in clinops/monitor/drift.py
clinops.monitor.drift.ColumnDriftResult
dataclass
¶
ColumnDriftResult(
column,
psi,
ks_statistic,
ks_pvalue,
severity,
reference_mean,
current_mean,
reference_std,
current_std,
n_reference,
n_current,
)
Drift metrics for a single column.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Column name. |
psi |
float
|
Population Stability Index (lower is more stable). |
ks_statistic |
float | None
|
KS two-sample test statistic, or None if not computed. |
ks_pvalue |
float | None
|
KS test p-value, or None if not computed. Values below 0.05 indicate a statistically significant distributional difference. |
severity |
DriftSeverity
|
Drift severity based on PSI thresholds. |
reference_mean |
float
|
Mean of the column in the reference (training) dataset. |
current_mean |
float
|
Mean of the column in the current (production) dataset. |
reference_std |
float
|
Standard deviation in the reference dataset. |
current_std |
float
|
Standard deviation in the current dataset. |
n_reference |
int
|
Number of non-null observations in the reference dataset. |
n_current |
int
|
Number of non-null observations in the current dataset. |
clinops.monitor.drift.DriftSeverity
¶
Bases: StrEnum
Severity levels for distribution drift.
Based on standard PSI thresholds used in healthcare model validation:
LOW PSI < 0.1 — distribution is stable; no action required. MEDIUM 0.1 <= PSI < 0.2 — moderate shift; review the column and investigate whether the change is clinically meaningful. HIGH PSI >= 0.2 — significant drift; model retraining or pipeline investigation is strongly recommended.
clinops.monitor.quality.DataQualityChecker
¶
DataQualityChecker(
max_null_rate=0.5,
required_columns=None,
expected_dtypes=None,
min_rows=None,
max_rows=None,
)
Run data quality checks on a clinical DataFrame.
Can be used standalone (check(df) only) or fitted on a reference
DataFrame to also detect schema drift between pipeline runs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_null_rate
|
float
|
Null rate above which a column is flagged as a warning. Default 0.5. |
0.5
|
required_columns
|
list[str] | None
|
Columns that must be present and non-null. Any missing column is an error; any all-null required column is also an error. |
None
|
expected_dtypes
|
dict[str, str] | None
|
Dict mapping column name to expected dtype string (e.g.
|
None
|
min_rows
|
int | None
|
Minimum number of rows expected. Fewer rows triggers an error. |
None
|
max_rows
|
int | None
|
Maximum number of rows expected. More rows triggers a warning. |
None
|
Examples:
>>> checker = DataQualityChecker(required_columns=["subject_id", "charttime"])
>>> checker.fit(train_df) # learn reference schema and row count
>>> report = checker.check(df)
>>> print(report.summary())
>>> if not report.passed:
... raise RuntimeError("Data quality check failed")
Source code in clinops/monitor/quality.py
fit
¶
Learn the reference schema and row count from a baseline DataFrame.
After fitting, check() will also report columns that were added
or removed relative to this reference.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Reference DataFrame (typically the training set). |
required |
Returns:
| Type | Description |
|---|---|
DataQualityChecker
|
Self, for method chaining. |
Source code in clinops/monitor/quality.py
check
¶
Run all configured quality checks against df.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame to check. |
required |
Returns:
| Type | Description |
|---|---|
QualityReport
|
|
Source code in clinops/monitor/quality.py
188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 | |
clinops.monitor.quality.QualityReport
dataclass
¶
Structured result from :class:DataQualityChecker.
Attributes:
| Name | Type | Description |
|---|---|---|
issues |
list[QualityIssue]
|
All detected quality issues. |
n_rows |
int
|
Number of rows in the checked DataFrame. |
n_columns |
int
|
Number of columns in the checked DataFrame. |
null_rates |
dict[str, float]
|
Per-column null rate (fraction 0–1). |
to_dataframe
¶
Return issues as a DataFrame.
Source code in clinops/monitor/quality.py
summary
¶
Human-readable quality summary.
Source code in clinops/monitor/quality.py
clinops.monitor.quality.QualityIssue
dataclass
¶
A single data quality issue detected by :class:DataQualityChecker.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Affected column name, or |
issue_type |
str
|
One of |
severity |
str
|
|
detail |
str
|
Human-readable description of the issue. |