| Title: | Core Contracts, Parsers, and Scoring Primitives |
|---|---|
| Description: | Foundational package in the R4SUB (R for Regulatory Submission) ecosystem. Defines the core evidence table schema, parsers, indicator abstractions, and scoring primitives needed to quantify clinical submission readiness. Provides a standardized contract for ingesting heterogeneous sources (validation outputs, metadata, traceability) into a single evidence framework. |
| Authors: | Pawan Rama Mali [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-7864-5819>) |
| Maintainer: | Pawan Rama Mali <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.2 |
| Built: | 2026-05-15 09:41:39 UTC |
| Source: | https://github.com/r4sub/r4subcore |
Computes summary scores from an evidence table, grouped by one or more columns.
aggregate_indicator_score( ev, by = "indicator_id", method = c("mean", "min", "weighted") )aggregate_indicator_score( ev, by = "indicator_id", method = c("mean", "min", "weighted") )
ev |
A valid evidence data.frame. |
by |
Character vector of column names to group by.
Default: |
method |
Aggregation method: |
A data.frame with grouping columns plus score (0–1) and
n_evidence (count of rows).
ctx <- suppressMessages(r4sub_run_context("STUDY1", "DEV")) ev <- suppressMessages(as_evidence( data.frame( asset_type = rep("validation", 3), asset_id = rep("ADSL", 3), source_name = rep("pinnacle21", 3), indicator_id = c("SD0001", "SD0001", "SD0002"), indicator_name = c("SD0001", "SD0001", "SD0002"), indicator_domain = rep("quality", 3), severity = c("high", "medium", "low"), result = c("fail", "warn", "pass"), stringsAsFactors = FALSE ), ctx = ctx )) aggregate_indicator_score(ev, by = "indicator_id", method = "weighted")ctx <- suppressMessages(r4sub_run_context("STUDY1", "DEV")) ev <- suppressMessages(as_evidence( data.frame( asset_type = rep("validation", 3), asset_id = rep("ADSL", 3), source_name = rep("pinnacle21", 3), indicator_id = c("SD0001", "SD0001", "SD0002"), indicator_name = c("SD0001", "SD0001", "SD0002"), indicator_domain = rep("quality", 3), severity = c("high", "medium", "low"), result = c("fail", "warn", "pass"), stringsAsFactors = FALSE ), ctx = ctx )) aggregate_indicator_score(ev, by = "indicator_id", method = "weighted")
Takes a data.frame and coerces it into a valid evidence table. Fills in
missing nullable columns with NA of the correct type and validates
controlled vocabulary columns.
as_evidence(x, ctx = NULL, ...)as_evidence(x, ctx = NULL, ...)
x |
A data.frame (or tibble) with at least the required evidence columns. |
ctx |
An optional r4sub_run_context. If provided, |
... |
Additional columns to set (e.g., |
A data.frame conforming to the evidence schema.
ctx <- r4sub_run_context("STUDY1", "DEV") df <- data.frame( asset_type = "validation", asset_id = "ADSL", source_name = "pinnacle21", indicator_id = "P21-001", indicator_name = "Missing variable", indicator_domain = "quality", severity = "high", result = "fail", message = "Variable AGEU missing", stringsAsFactors = FALSE ) ev <- as_evidence(df, ctx = ctx)ctx <- r4sub_run_context("STUDY1", "DEV") df <- data.frame( asset_type = "validation", asset_id = "ADSL", source_name = "pinnacle21", indicator_id = "P21-001", indicator_name = "Missing variable", indicator_domain = "quality", severity = "high", result = "fail", message = "Variable AGEU missing", stringsAsFactors = FALSE ) ev <- as_evidence(df, ctx = ctx)
Row-binds multiple evidence data.frames after validating each one.
bind_evidence(...)bind_evidence(...)
... |
Evidence data.frames to bind. |
A single combined evidence data.frame.
ctx <- suppressMessages(r4sub_run_context("STUDY1", "DEV")) make_ev <- function(ind_id) { suppressMessages(as_evidence( data.frame( asset_type = "validation", asset_id = "ADSL", source_name = "pinnacle21", indicator_id = ind_id, indicator_name = ind_id, indicator_domain = "quality", severity = "low", result = "pass", stringsAsFactors = FALSE ), ctx = ctx )) } ev1 <- make_ev("IND-001") ev2 <- make_ev("IND-002") combined <- suppressMessages(bind_evidence(ev1, ev2)) nrow(combined)ctx <- suppressMessages(r4sub_run_context("STUDY1", "DEV")) make_ev <- function(ind_id) { suppressMessages(as_evidence( data.frame( asset_type = "validation", asset_id = "ADSL", source_name = "pinnacle21", indicator_id = ind_id, indicator_name = ind_id, indicator_domain = "quality", severity = "low", result = "pass", stringsAsFactors = FALSE ), ctx = ctx )) } ev1 <- make_ev("IND-001") ev2 <- make_ev("IND-002") combined <- suppressMessages(bind_evidence(ev1, ev2)) nrow(combined)
Maps common result/status labels to the canonical set:
pass, fail, warn, na.
canon_result(x)canon_result(x)
x |
Character vector of result values. |
Character vector with canonical result labels.
canon_result(c("PASS", "Failed", "Warning", "N/A"))canon_result(c("PASS", "Failed", "Warning", "N/A"))
Maps common severity labels (case-insensitive) to the canonical set.
canon_severity(x)canon_severity(x)
x |
Character vector of severity values. |
Character vector with canonical severity labels.
canon_severity(c("HIGH", "Low", "warning", "Error"))canon_severity(c("HIGH", "Low", "warning", "Error"))
Reads a Define-XML 2.0/2.1 file and extracts dataset, variable, and derivation completeness checks into the standard evidence table format. Three indicators are evaluated for each asset:
Dataset is present and has a non-empty label.
Variable is documented (has label and dataType).
Derivation text is present for derived variables.
define_xml_to_evidence(file, ctx, source_version = "2.1")define_xml_to_evidence(file, ctx, source_version = "2.1")
file |
Character. Path to a Define-XML file ( |
ctx |
An r4sub_run_context providing run and study metadata. |
source_version |
Character. Version label for the Define-XML standard.
Default: |
A data.frame conforming to the evidence schema, one row per dataset-level check (Q-DEFINE-001), per variable check (Q-DEFINE-002), and per derivation check (Q-DEFINE-003).
# Build a minimal Define-XML 2.1 document xml_txt <- '<?xml version="1.0" encoding="UTF-8"?> <ODM xmlns="http://www.cdisc.org/ns/odm/v1.3" xmlns:def="http://www.cdisc.org/ns/def/v2.1"> <Study OID="STUDY001"> <MetaDataVersion OID="MDV.001" Name="Define-XML 2.1"> <def:ItemGroupDef OID="IG.ADSL" Name="ADSL" SASDatasetName="ADSL" Repeating="No" Purpose="Analysis" def:Label="Subject-Level Analysis Dataset"> <ItemRef ItemOID="IT.ADSL.USUBJID" Mandatory="Yes"/> <ItemRef ItemOID="IT.ADSL.AGE" Mandatory="Yes"/> </def:ItemGroupDef> <ItemDef OID="IT.ADSL.USUBJID" Name="USUBJID" DataType="text" Length="20" SASFieldName="USUBJID"> <Description><TranslatedText>Unique Subject Identifier</TranslatedText></Description> </ItemDef> <ItemDef OID="IT.ADSL.AGE" Name="AGE" DataType="integer" Length="8" SASFieldName="AGE"> <Description><TranslatedText>Age</TranslatedText></Description> <def:Origin Type="Derived"> <def:Description><TranslatedText>Derived from RFSTDTC</TranslatedText></def:Description> </def:Origin> </ItemDef> </MetaDataVersion> </Study> </ODM>' tmp <- tempfile(fileext = ".xml") writeLines(xml_txt, tmp) ctx <- r4sub_run_context("STUDY001", "DEV") ev <- define_xml_to_evidence(tmp, ctx) nrow(ev)# Build a minimal Define-XML 2.1 document xml_txt <- '<?xml version="1.0" encoding="UTF-8"?> <ODM xmlns="http://www.cdisc.org/ns/odm/v1.3" xmlns:def="http://www.cdisc.org/ns/def/v2.1"> <Study OID="STUDY001"> <MetaDataVersion OID="MDV.001" Name="Define-XML 2.1"> <def:ItemGroupDef OID="IG.ADSL" Name="ADSL" SASDatasetName="ADSL" Repeating="No" Purpose="Analysis" def:Label="Subject-Level Analysis Dataset"> <ItemRef ItemOID="IT.ADSL.USUBJID" Mandatory="Yes"/> <ItemRef ItemOID="IT.ADSL.AGE" Mandatory="Yes"/> </def:ItemGroupDef> <ItemDef OID="IT.ADSL.USUBJID" Name="USUBJID" DataType="text" Length="20" SASFieldName="USUBJID"> <Description><TranslatedText>Unique Subject Identifier</TranslatedText></Description> </ItemDef> <ItemDef OID="IT.ADSL.AGE" Name="AGE" DataType="integer" Length="8" SASFieldName="AGE"> <Description><TranslatedText>Age</TranslatedText></Description> <def:Origin Type="Derived"> <def:Description><TranslatedText>Derived from RFSTDTC</TranslatedText></def:Description> </def:Origin> </ItemDef> </MetaDataVersion> </Study> </ODM>' tmp <- tempfile(fileext = ".xml") writeLines(xml_txt, tmp) ctx <- r4sub_run_context("STUDY001", "DEV") ev <- define_xml_to_evidence(tmp, ctx) nrow(ev)
Returns the column specification for the R4SUB evidence table. Each element describes a column's expected R type and, where applicable, the set of allowed values.
evidence_schema()evidence_schema()
A named list. Each element is a list with type (character) and
optionally allowed (character vector) or nullable (logical).
str(evidence_schema())str(evidence_schema())
Returns a summary data.frame with counts grouped by domain, severity, result, and source.
evidence_summary(ev)evidence_summary(ev)
ev |
A valid evidence data.frame. |
A data.frame with columns: indicator_domain, severity, result,
source_name, and n.
ctx <- suppressMessages(r4sub_run_context("STUDY1", "DEV")) ev <- suppressMessages(as_evidence( data.frame( asset_type = "validation", asset_id = "ADSL", source_name = "pinnacle21", indicator_id = "SD0001", indicator_name = "SD0001", indicator_domain = "quality", severity = "high", result = "fail", stringsAsFactors = FALSE ), ctx = ctx )) evidence_summary(ev)ctx <- suppressMessages(r4sub_run_context("STUDY1", "DEV")) ev <- suppressMessages(as_evidence( data.frame( asset_type = "validation", asset_id = "ADSL", source_name = "pinnacle21", indicator_id = "SD0001", indicator_name = "SD0001", indicator_domain = "quality", severity = "high", result = "fail", stringsAsFactors = FALSE ), ctx = ctx )) evidence_summary(ev)
Validates the evidence table then writes it to disk in the requested format.
Metadata attributes (exported_at, r4subcore_version, nrow) are
attached to the returned path value for traceability.
export_evidence(evidence, file, format = c("csv", "rds", "json"))export_evidence(evidence, file, format = c("csv", "rds", "json"))
evidence |
A valid evidence data.frame (as produced by |
file |
Character. Destination file path (including extension). |
format |
Character. One of |
Invisibly returns file with attributes:
POSIXct timestamp of export.
Package version string.
Number of rows written.
ctx <- suppressMessages(r4sub_run_context("STUDY1", "DEV")) ev <- suppressMessages(as_evidence( data.frame( asset_type = "validation", asset_id = "ADSL", source_name = "pinnacle21", indicator_id = "SD0001", indicator_name = "SD0001", indicator_domain = "quality", severity = "high", result = "fail", stringsAsFactors = FALSE ), ctx = ctx )) tmp_csv <- tempfile(fileext = ".csv") out <- suppressMessages(export_evidence(ev, tmp_csv, format = "csv")) file.exists(out)ctx <- suppressMessages(r4sub_run_context("STUDY1", "DEV")) ev <- suppressMessages(as_evidence( data.frame( asset_type = "validation", asset_id = "ADSL", source_name = "pinnacle21", indicator_id = "SD0001", indicator_name = "SD0001", indicator_domain = "quality", severity = "high", result = "fail", stringsAsFactors = FALSE ), ctx = ctx )) tmp_csv <- tempfile(fileext = ".csv") out <- suppressMessages(export_evidence(ev, tmp_csv, format = "csv")) file.exists(out)
Creates a deterministic hash from one or more character inputs. Uses MD5 via base R's digest-like approach for a lightweight, dependency-free implementation.
hash_id(..., prefix = NULL)hash_id(..., prefix = NULL)
... |
Character values to hash together. Concatenated with |
prefix |
Optional prefix prepended to the hash (e.g., |
A character string of the form prefix-hexhash or just hexhash.
hash_id("ADSL", "rule_001") hash_id("my_study", "2024-01-01", prefix = "RUN")hash_id("ADSL", "rule_001") hash_id("my_study", "2024-01-01", prefix = "RUN")
Reads an evidence table that was previously saved by export_evidence(),
then validates it against the evidence schema.
import_evidence(file, format = c("csv", "rds", "json"))import_evidence(file, format = c("csv", "rds", "json"))
file |
Character. Path to the file to read. |
format |
Character. One of |
A validated evidence data.frame conforming to the evidence schema.
ctx <- suppressMessages(r4sub_run_context("STUDY1", "DEV")) ev <- suppressMessages(as_evidence( data.frame( asset_type = "validation", asset_id = "ADSL", source_name = "pinnacle21", indicator_id = "SD0001", indicator_name = "SD0001", indicator_domain = "quality", severity = "high", result = "fail", stringsAsFactors = FALSE ), ctx = ctx )) tmp_rds <- tempfile(fileext = ".rds") suppressMessages(export_evidence(ev, tmp_rds, format = "rds")) ev2 <- suppressMessages(import_evidence(tmp_rds, format = "rds")) nrow(ev2)ctx <- suppressMessages(r4sub_run_context("STUDY1", "DEV")) ev <- suppressMessages(as_evidence( data.frame( asset_type = "validation", asset_id = "ADSL", source_name = "pinnacle21", indicator_id = "SD0001", indicator_name = "SD0001", indicator_domain = "quality", severity = "high", result = "fail", stringsAsFactors = FALSE ), ctx = ctx )) tmp_rds <- tempfile(fileext = ".rds") suppressMessages(export_evidence(ev, tmp_rds, format = "rds")) ev2 <- suppressMessages(import_evidence(tmp_rds, format = "rds")) nrow(ev2)
Converts an R object to a valid JSON string. Returns "{}" on failure
or for NULL/empty inputs.
json_safely(x)json_safely(x)
x |
An R object to serialize. |
A single character string containing valid JSON.
json_safely(list(a = 1, b = "hello")) json_safely(NULL)json_safely(list(a = 1, b = "hello")) json_safely(NULL)
Applies min-max normalization to a numeric vector, optionally clamping values to [0, 1].
normalize_01(x, direction = c("higher_better", "lower_better"), clamp = TRUE)normalize_01(x, direction = c("higher_better", "lower_better"), clamp = TRUE)
x |
Numeric vector. |
direction |
Character. |
clamp |
Logical. If |
Numeric vector normalized to 0–1.
normalize_01(c(10, 20, 30, 40, 50)) normalize_01(c(10, 20, 30), direction = "lower_better")normalize_01(c(10, 20, 30, 40, 50)) normalize_01(c(10, 20, 30), direction = "lower_better")
Converts a data.frame of Pinnacle21-style validation results into the standard evidence table format. Column names are detected case-insensitively.
p21_to_evidence( p21_df, ctx, asset_type = "validation", source_version = NULL, default_domain = "quality" )p21_to_evidence( p21_df, ctx, asset_type = "validation", source_version = NULL, default_domain = "quality" )
p21_df |
A data.frame containing Pinnacle21 validation output. Expected
columns (case-insensitive): |
ctx |
A r4sub_run_context providing run and study metadata. |
asset_type |
Character. Asset type label. Default: |
source_version |
Character or |
default_domain |
Character. Indicator domain. Default: |
A data.frame conforming to the evidence schema.
p21_raw <- data.frame( Rule = c("SD0001", "SD0002"), Message = c("Missing variable label", "Invalid format"), Severity = c("Error", "Warning"), Dataset = c("ADSL", "ADAE"), Variable = c("AGE", "AESTDTC"), Status = c("Failed", "Warning"), stringsAsFactors = FALSE ) ctx <- r4sub_run_context("STUDY1", "DEV") ev <- p21_to_evidence(p21_raw, ctx)p21_raw <- data.frame( Rule = c("SD0001", "SD0002"), Message = c("Missing variable label", "Invalid format"), Severity = c("Error", "Warning"), Dataset = c("ADSL", "ADAE"), Variable = c("AGE", "AESTDTC"), Status = c("Failed", "Warning"), stringsAsFactors = FALSE ) ctx <- r4sub_run_context("STUDY1", "DEV") ev <- p21_to_evidence(p21_raw, ctx)
A run context captures metadata for a particular evidence collection run.
It provides a unique run_id, study identifier, environment label, and
timestamps used throughout evidence ingestion.
r4sub_run_context( study_id, environment = c("DEV", "UAT", "PROD"), user = NULL, run_id = NULL, timestamp = Sys.time() )r4sub_run_context( study_id, environment = c("DEV", "UAT", "PROD"), user = NULL, run_id = NULL, timestamp = Sys.time() )
study_id |
Character. Study identifier (e.g., |
environment |
Character. One of |
user |
Character or |
run_id |
Character or |
timestamp |
POSIXct. Defaults to current time. |
A list of class r4sub_run_context with elements:
run_id, study_id, environment, user, created_at.
ctx <- r4sub_run_context(study_id = "STUDY001", environment = "DEV") ctx$run_id ctx$study_idctx <- r4sub_run_context(study_id = "STUDY001", environment = "DEV") ctx$run_id ctx$study_id
Adds an indicator definition to the local in-memory registry.
register_indicator( indicator_id, domain, description, expected_inputs = character(0), default_thresholds = numeric(0), tags = character(0) )register_indicator( indicator_id, domain, description, expected_inputs = character(0), default_thresholds = numeric(0), tags = character(0) )
indicator_id |
Character. Stable identifier for the indicator. |
domain |
Character. One of |
description |
Character. Human-readable description. |
expected_inputs |
Character vector. Evidence source types this indicator expects. |
default_thresholds |
Named numeric vector. Optional thresholds. |
tags |
Character vector. Optional tags (e.g., |
The indicator definition list, invisibly.
register_indicator( indicator_id = "P21-001", domain = "quality", description = "Required variable is missing from dataset" )register_indicator( indicator_id = "P21-001", domain = "quality", description = "Required variable is missing from dataset" )
Converts canonical result labels to numeric scores.
result_to_score(result)result_to_score(result)
result |
Character vector of canonical result values
( |
Numeric vector: pass=1, warn=0.5, fail=0, na=NA.
result_to_score(c("pass", "fail", "warn", "na"))result_to_score(c("pass", "fail", "warn", "na"))
Converts canonical severity labels to numeric penalty multipliers on a 0–1 scale.
severity_to_weight(severity)severity_to_weight(severity)
severity |
Character vector of canonical severity values
( |
Default mapping:
info = 0.00
low = 0.25
medium = 0.50
high = 0.75
critical = 1.00
Numeric vector of weights.
severity_to_weight(c("low", "high", "critical"))severity_to_weight(c("low", "high", "critical"))
Checks that a data.frame conforms to the evidence schema. Verifies column presence, types, and controlled vocabulary values.
validate_evidence(ev)validate_evidence(ev)
ev |
A data.frame to validate. |
TRUE invisibly if valid; throws an error otherwise.
ctx <- suppressMessages(r4sub_run_context("STUDY1", "DEV")) ev <- suppressMessages(as_evidence( data.frame( asset_type = "validation", asset_id = "ADSL", source_name = "pinnacle21", indicator_id = "SD0001", indicator_name = "SD0001", indicator_domain = "quality", severity = "high", result = "fail", stringsAsFactors = FALSE ), ctx = ctx )) validate_evidence(ev)ctx <- suppressMessages(r4sub_run_context("STUDY1", "DEV")) ev <- suppressMessages(as_evidence( data.frame( asset_type = "validation", asset_id = "ADSL", source_name = "pinnacle21", indicator_id = "SD0001", indicator_name = "SD0001", indicator_domain = "quality", severity = "high", result = "fail", stringsAsFactors = FALSE ), ctx = ctx )) validate_evidence(ev)
Checks that an indicator definition list is well-formed.
validate_indicator(indicator)validate_indicator(indicator)
indicator |
A list with required fields: |
TRUE invisibly if valid; throws an error otherwise.
validate_indicator(list( indicator_id = "P21-001", domain = "quality", description = "Missing required variable" ))validate_indicator(list( indicator_id = "P21-001", domain = "quality", description = "Missing required variable" ))