| Title: | Traceability Engine for Clinical Submission Readiness |
|---|---|
| Description: | Quantifies and explains end-to-end traceability between clinical submission artifacts (ADaM (Analysis Data Model) outputs, derivations, SDTM (Study Data Tabulation Model) sources, specs, code). Builds trace models from metadata and mapping sheets, computes trace levels, and emits standardized R4SUB (R for Regulatory Submission) evidence table rows via 'r4subcore'. |
| Authors: | Pawan Rama Mali [aut, cre, cph] (ORCID: <https://orcid.org/0000-0001-7864-5819>) |
| Maintainer: | Pawan Rama Mali <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.1 |
| Built: | 2026-05-15 09:42:35 UTC |
| Source: | https://github.com/r4sub/r4subtrace |
Constructs a directed trace model (nodes + edges + diagnostics) from ADaM metadata, SDTM metadata, and an optional mapping sheet.
build_trace_model( adam_meta, sdtm_meta, mapping = NULL, spec = NULL, config = trace_config_default() )build_trace_model( adam_meta, sdtm_meta, mapping = NULL, spec = NULL, config = trace_config_default() )
adam_meta |
A data.frame of ADaM variable metadata. Must contain
|
sdtm_meta |
A data.frame of SDTM variable metadata. Must contain
|
mapping |
An optional data.frame describing ADaM-to-SDTM mappings.
Must contain |
spec |
Reserved for future use (ADaM spec ingestion). |
config |
A |
A list of class "trace_model" with elements:
nodes: tibble of asset nodes (datasets and variables)
edges: tibble of relationships between nodes
diagnostics: list of tibbles (orphans, ambiguities, conflicts)
config: the configuration used
adam_meta <- data.frame( dataset = "ADSL", variable = c("STUDYID", "USUBJID", "AGE"), label = c("Study ID", "Unique Subject ID", "Age") ) sdtm_meta <- data.frame( dataset = "DM", variable = c("STUDYID", "USUBJID", "AGE"), label = c("Study ID", "Unique Subject ID", "Age") ) map <- data.frame( adam_dataset = "ADSL", adam_var = c("STUDYID", "USUBJID", "AGE"), sdtm_domain = "DM", sdtm_var = c("STUDYID", "USUBJID", "AGE") ) tm <- build_trace_model(adam_meta, sdtm_meta, mapping = map) tm$nodes tm$edgesadam_meta <- data.frame( dataset = "ADSL", variable = c("STUDYID", "USUBJID", "AGE"), label = c("Study ID", "Unique Subject ID", "Age") ) sdtm_meta <- data.frame( dataset = "DM", variable = c("STUDYID", "USUBJID", "AGE"), label = c("Study ID", "Unique Subject ID", "Age") ) map <- data.frame( adam_dataset = "ADSL", adam_var = c("STUDYID", "USUBJID", "AGE"), sdtm_domain = "DM", sdtm_var = c("STUDYID", "USUBJID", "AGE") ) tm <- build_trace_model(adam_meta, sdtm_meta, mapping = map) tm$nodes tm$edges
Assigns a traceability level (L0–L3) to each ADaM variable in the trace model based on available mapping, derivation text, and confidence scores.
compute_trace_levels(trace_model)compute_trace_levels(trace_model)
trace_model |
A |
Trace levels:
L0: No mapping and no derivation text.
L1: Derivation text present but no SDTM mapping.
L2: Mapping to SDTM variable/domain exists.
L3: Mapping exists AND (confidence >= threshold OR derivation text present alongside mapping).
A tibble with columns: adam_dataset, adam_var, trace_level,
has_mapping, has_derivation_text, n_candidates, max_confidence.
adam_meta <- data.frame( dataset = "ADSL", variable = c("STUDYID", "USUBJID", "AGE", "AGEGR1"), label = c("Study ID", "Unique Subject ID", "Age", "Age Group") ) sdtm_meta <- data.frame( dataset = "DM", variable = c("STUDYID", "USUBJID", "AGE"), label = c("Study ID", "Unique Subject ID", "Age") ) map <- data.frame( adam_dataset = "ADSL", adam_var = c("STUDYID", "USUBJID", "AGE"), sdtm_domain = "DM", sdtm_var = c("STUDYID", "USUBJID", "AGE"), confidence = c(1.0, 1.0, 0.9) ) tm <- build_trace_model(adam_meta, sdtm_meta, mapping = map) compute_trace_levels(tm)adam_meta <- data.frame( dataset = "ADSL", variable = c("STUDYID", "USUBJID", "AGE", "AGEGR1"), label = c("Study ID", "Unique Subject ID", "Age", "Age Group") ) sdtm_meta <- data.frame( dataset = "DM", variable = c("STUDYID", "USUBJID", "AGE"), label = c("Study ID", "Unique Subject ID", "Age") ) map <- data.frame( adam_dataset = "ADSL", adam_var = c("STUDYID", "USUBJID", "AGE"), sdtm_domain = "DM", sdtm_var = c("STUDYID", "USUBJID", "AGE"), confidence = c(1.0, 1.0, 0.9) ) tm <- build_trace_model(adam_meta, sdtm_meta, mapping = map) compute_trace_levels(tm)
Print Trace Model
## S3 method for class 'trace_model' print(x, ...)## S3 method for class 'trace_model' print(x, ...)
x |
A |
... |
Ignored. |
Returns a list of default configuration values for trace model building and evidence emission.
trace_config_default( severity_by_level = c(L0 = "high", L1 = "medium", L2 = "low", L3 = "info"), result_by_level = c(L0 = "fail", L1 = "warn", L2 = "warn", L3 = "pass"), confidence_threshold_L3 = 0.8, uppercase_datasets = TRUE )trace_config_default( severity_by_level = c(L0 = "high", L1 = "medium", L2 = "low", L3 = "info"), result_by_level = c(L0 = "fail", L1 = "warn", L2 = "warn", L3 = "pass"), confidence_threshold_L3 = 0.8, uppercase_datasets = TRUE )
severity_by_level |
Named character vector mapping trace levels to severity. |
result_by_level |
Named character vector mapping trace levels to result. |
confidence_threshold_L3 |
Numeric threshold for L3 classification. A mapping must have confidence >= this value to qualify for L3. |
uppercase_datasets |
Logical; if |
A list of class "trace_config" with elements:
severity_by_level, result_by_level, confidence_threshold_L3,
uppercase_datasets.
cfg <- trace_config_default() cfg$severity_by_level # Override a single setting cfg2 <- trace_config_default(confidence_threshold_L3 = 0.9)cfg <- trace_config_default() cfg$severity_by_level # Override a single setting cfg2 <- trace_config_default(confidence_threshold_L3 = 0.9)
Computes summary metrics from evidence rows generated by
trace_model_to_evidence(). Returns key traceability indicators.
trace_indicator_scores(evidence)trace_indicator_scores(evidence)
evidence |
A data.frame of evidence rows (must contain |
A tibble with columns: indicator, value, description.
library(r4subcore) ctx <- r4sub_run_context(study_id = "TEST001", environment = "DEV") adam_meta <- data.frame( dataset = "ADSL", variable = c("STUDYID", "AGE", "AGEGR1"), label = c("Study ID", "Age", "Age Group") ) sdtm_meta <- data.frame( dataset = "DM", variable = c("STUDYID", "AGE"), label = c("Study ID", "Age") ) map <- data.frame( adam_dataset = "ADSL", adam_var = c("STUDYID", "AGE"), sdtm_domain = "DM", sdtm_var = c("STUDYID", "AGE") ) tm <- build_trace_model(adam_meta, sdtm_meta, mapping = map) ev <- trace_model_to_evidence(tm, ctx = ctx) trace_indicator_scores(ev)library(r4subcore) ctx <- r4sub_run_context(study_id = "TEST001", environment = "DEV") adam_meta <- data.frame( dataset = "ADSL", variable = c("STUDYID", "AGE", "AGEGR1"), label = c("Study ID", "Age", "Age Group") ) sdtm_meta <- data.frame( dataset = "DM", variable = c("STUDYID", "AGE"), label = c("Study ID", "Age") ) map <- data.frame( adam_dataset = "ADSL", adam_var = c("STUDYID", "AGE"), sdtm_domain = "DM", sdtm_var = c("STUDYID", "AGE") ) tm <- build_trace_model(adam_meta, sdtm_meta, mapping = map) ev <- trace_model_to_evidence(tm, ctx = ctx) trace_indicator_scores(ev)
Emits evidence rows compatible with r4subcore::validate_evidence() for
each ADaM variable's trace level, plus diagnostic rows for orphans,
ambiguities, and conflicts.
trace_model_to_evidence( trace_model, ctx, source_name = "r4subtrace", source_version = NULL )trace_model_to_evidence( trace_model, ctx, source_name = "r4subtrace", source_version = NULL )
trace_model |
A |
ctx |
An |
source_name |
Character; the name of the evidence source. |
source_version |
Character or |
A data.frame of evidence rows passing r4subcore::validate_evidence().
library(r4subcore) ctx <- r4sub_run_context(study_id = "TEST001", environment = "DEV") adam_meta <- data.frame( dataset = "ADSL", variable = c("STUDYID", "AGE"), label = c("Study ID", "Age") ) sdtm_meta <- data.frame( dataset = "DM", variable = c("STUDYID", "AGE"), label = c("Study ID", "Age") ) map <- data.frame( adam_dataset = "ADSL", adam_var = c("STUDYID", "AGE"), sdtm_domain = "DM", sdtm_var = c("STUDYID", "AGE") ) tm <- build_trace_model(adam_meta, sdtm_meta, mapping = map) ev <- trace_model_to_evidence(tm, ctx = ctx) r4subcore::validate_evidence(ev)library(r4subcore) ctx <- r4sub_run_context(study_id = "TEST001", environment = "DEV") adam_meta <- data.frame( dataset = "ADSL", variable = c("STUDYID", "AGE"), label = c("Study ID", "Age") ) sdtm_meta <- data.frame( dataset = "DM", variable = c("STUDYID", "AGE"), label = c("Study ID", "Age") ) map <- data.frame( adam_dataset = "ADSL", adam_var = c("STUDYID", "AGE"), sdtm_domain = "DM", sdtm_var = c("STUDYID", "AGE") ) tm <- build_trace_model(adam_meta, sdtm_meta, mapping = map) ev <- trace_model_to_evidence(tm, ctx = ctx) r4subcore::validate_evidence(ev)
Checks that a mapping data.frame contains the required columns
(adam_dataset, adam_var, sdtm_domain, sdtm_var) and canonicalizes
names, trims whitespace, and optionally uppercases dataset/domain names.
validate_mapping(df, uppercase_datasets = TRUE)validate_mapping(df, uppercase_datasets = TRUE)
df |
A data.frame describing ADaM-to-SDTM variable mappings. |
uppercase_datasets |
Logical; if |
A tibble with canonicalized column names and values.
map <- data.frame( ADAM_DATASET = "adsl", ADAM_VAR = "AGE", SDTM_DOMAIN = "dm", SDTM_VAR = "AGE" ) validate_mapping(map)map <- data.frame( ADAM_DATASET = "adsl", ADAM_VAR = "AGE", SDTM_DOMAIN = "dm", SDTM_VAR = "AGE" ) validate_mapping(map)
Checks that an ADaM or SDTM metadata data.frame contains the required
columns (dataset, variable) and canonicalizes column names to lowercase.
validate_metadata(df, kind = c("adam", "sdtm"))validate_metadata(df, kind = c("adam", "sdtm"))
df |
A data.frame of dataset metadata. |
kind |
Character; |
A tibble with canonicalized column names.
meta <- data.frame(DATASET = "ADSL", VARIABLE = "SUBJID", LABEL = "Subject ID") validate_metadata(meta, kind = "adam")meta <- data.frame(DATASET = "ADSL", VARIABLE = "SUBJID", LABEL = "Subject ID") validate_metadata(meta, kind = "adam")