Getting Started

Getting Started with xportr

The demo will make use of a small ADSL data set that is apart of the {admiral} package. The script that generates this ADSL dataset can be created by using this command admiral::use_ad_template("adsl").

The ADSL has the following features:

To create a fully compliant v5 xpt ADSL dataset, that was developed using R, we will need to apply the 6 main functions within the xportr package:

# Loading packages
library(dplyr)
library(labelled)
library(xportr)
library(admiral)

# Loading in our example data
adsl <- admiral::admiral_adsl



NOTE: Dataset can be created by using this command admiral::use_ad_template("adsl").

Preparing your Specification Files


In order to make use of the functions within xportr you will need to create an R data frame that contains your specification file. You will most likely need to do some pre-processing of your spec sheets after loading in the spec files for them to work appropriately with the xportr functions. Please see our example spec sheets in system.file(paste0("specs/", "ADaM_admiral_spec.xlsx"), package = "xportr") to see how xportr expects the specification sheets.


var_spec <- readxl::read_xlsx(
  system.file(paste0("specs/", "ADaM_admiral_spec.xlsx"), package = "xportr"), sheet = "Variables") %>%
  dplyr::rename(type = "Data Type") %>%
  rlang::set_names(tolower) 
  


Below is a quick snapshot of the specification file pertaining to the ADSL data set, which we will make use of in the 6 xportr function calls below. Take note of the order, label, type, length and format columns.



xportr_type()


In order to be compliant with transport v5 specifications an xpt file can only have two data types: character and numeric/dbl. Currently the ADSL data set has chr, dbl, time, factor and date.

look_for(adsl, details = TRUE)
   pos variable label                      col_type values                    
   1   STUDYID  Study Identifier           chr      range: CDISCPILOT01 - CDI~
   2   USUBJID  Unique Subject Identifier  chr      range: 01-701-1015 - 01-7~
   3   SUBJID   Subject Identifier for th~ chr      range: 1001 - 1448        
   4   RFSTDTC  Subject Reference Start D~ chr      range: 2012-07-09 - 2014-~
   5   RFENDTC  Subject Reference End Dat~ chr      range: 2012-09-01 - 2015-~
   6   RFXSTDTC Date/Time of First Study ~ chr      range: 2012-07-09 - 2014-~
   7   RFXENDTC Date/Time of Last Study T~ chr      range: 2012-08-28 - 2015-~
   8   RFICDTC  Date/Time of Informed Con~ chr      range:                    
   9   RFPENDTC Date/Time of End of Parti~ chr      range: 2012-08-13 - 2015-~
   10  DTHDTC   Date/Time of Death         chr      range: 2013-01-14 - 2014-~
   11  DTHFL    Subject Death Flag         chr      range: Y - Y              
   12  SITEID   Study Site Identifier      chr      range: 701 - 718          
   13  AGE      Age                        dbl      range: 50 - 89            
   14  AGEU     Age Units                  chr      range: YEARS - YEARS      
   15  SEX      Sex                        chr      range: F - M              
   16  RACE     Race                       chr      range: AMERICAN INDIAN OR~
   17  ETHNIC   Ethnicity                  chr      range: HISPANIC OR LATINO~
   18  ARMCD    Planned Arm Code           chr      range: Pbo - Xan_Lo       
   19  ARM      Description of Planned Arm chr      range: Placebo - Xanomeli~
   20  ACTARMCD Actual Arm Code            chr      range: Pbo - Xan_Lo       
   21  ACTARM   Description of Actual Arm  chr      range: Placebo - Xanomeli~
   22  COUNTRY  Country                    chr      range: USA - USA          
   23  DMDTC    Date/Time of Collection    chr      range: 2012-07-06 - 2014-~
   24  DMDY     Study Day of Collection    dbl      range: -37 - -2           
   25  TRT01P   Description of Planned Arm chr      range: Placebo - Xanomeli~
   26  TRT01A   Description of Actual Arm  chr      range: Placebo - Xanomeli~
   27  TRTSDTM  —                          dttm     range: 2012-07-09 - 2014-~
   28  TRTSTMF  —                          chr      range: H - H              
   29  TRTEDTM  —                          dttm     range: 2012-08-28 23:59:5~
   30  TRTETMF  —                          chr      range: H - H              
   31  TRTSDT   —                          date     range: 2012-07-09 - 2014-~
   32  TRTEDT   —                          date     range: 2012-08-28 - 2015-~
   33  TRTDURD  —                          dbl      range: 1 - 212            
   34  SCRFDT   —                          date     range: 2012-08-13 - 2014-~
   35  EOSDT    —                          date     range: 2012-09-01 - 2015-~
   36  EOSSTT   —                          chr      range: COMPLETED - DISCON~
   37  FRVDT    —                          date     range: 2013-02-18 - 2014-~
   38  RANDDT   —                          date     range: 2012-07-09 - 2014-~
   39  DTHDT    —                          date     range: 2013-01-14 - 2014-~
   40  DTHADY   —                          dbl      range: 12 - 175           
   41  LDDTHELD —                          dbl      range: 0 - 2              
   42  LSTALVDT —                          date     range: 2012-09-01 - 2015-~
   43  AGEGR1   —                          fct      <18                       
                                                    18-64                     
                                                    >=65                      
   44  SAFFL    —                          chr      range: Y - Y              
   45  RACEGR1  —                          chr      range: Non-white - White  
   46  REGION1  —                          chr      range: NA - NA            
   47  LDDTHGR1 —                          chr      range: <= 30 - <= 30      
   48  DTH30FL  —                          chr      range: Y - Y              
   49  DTHA30FL —                          chr      range:                    
   50  DTHB30FL —                          chr      range: Y - Y


Using xport_type and the supplied specification file, we can coerce the variables in the ADSL set to be either numeric or character.


adsl_type <- xportr_type(adsl, var_spec, domain = "ADSL", verbose = "message") 


Now all appropriate types have been applied to the dataset as seen below.

look_for(adsl_type, details = TRUE)
   pos variable label col_type values                        
   1   STUDYID  —     dbl      range:                        
   2   USUBJID  —     dbl      range:                        
   3   SUBJID   —     dbl      range: 1001 - 1448            
   4   RFSTDTC  —     dbl      range:                        
   5   RFENDTC  —     dbl      range:                        
   6   RFXSTDTC —     dbl      range:                        
   7   RFXENDTC —     dbl      range:                        
   8   RFICDTC  —     dbl      range:                        
   9   RFPENDTC —     dbl      range:                        
   10  DTHDTC   —     dbl      range:                        
   11  DTHFL    —     dbl      range:                        
   12  SITEID   —     dbl      range: 701 - 718              
   13  AGE      —     dbl      range: 50 - 89                
   14  AGEU     —     dbl      range:                        
   15  SEX      —     dbl      range:                        
   16  RACE     —     dbl      range:                        
   17  ETHNIC   —     dbl      range:                        
   18  ARMCD    —     dbl      range:                        
   19  ARM      —     dbl      range:                        
   20  ACTARMCD —     dbl      range:                        
   21  ACTARM   —     dbl      range:                        
   22  COUNTRY  —     dbl      range:                        
   23  DMDTC    —     dbl      range:                        
   24  DMDY     —     dbl      range: -37 - -2               
   25  TRT01P   —     dbl      range:                        
   26  TRT01A   —     dbl      range:                        
   27  TRTSDTM  —     dbl      range: 1341792000 - 1409616000
   28  TRTSTMF  —     chr      range: H - H                  
   29  TRTEDTM  —     dbl      range: 1346198399 - 1425599999
   30  TRTETMF  —     chr      range: H - H                  
   31  TRTSDT   —     dbl      range: 15530 - 16315          
   32  TRTEDT   —     dbl      range: 15580 - 16499          
   33  TRTDURD  —     dbl      range: 1 - 212                
   34  SCRFDT   —     dbl      range: 15565 - 16181          
   35  EOSDT    —     dbl      range: 15584 - 16499          
   36  EOSSTT   —     dbl      range:                        
   37  FRVDT    —     dbl      range: 15754 - 16389          
   38  RANDDT   —     chr      range: 2012-07-09 - 2014-09-02
   39  DTHDT    —     dbl      range: 15719 - 16375          
   40  DTHADY   —     dbl      range: 12 - 175               
   41  LDDTHELD —     dbl      range: 0 - 2                  
   42  LSTALVDT —     dbl      range: 15584 - 16499          
   43  AGEGR1   —     dbl      range: 2 - 3                  
   44  SAFFL    —     dbl      range:                        
   45  RACEGR1  —     dbl      range:                        
   46  REGION1  —     dbl      range:                        
   47  LDDTHGR1 —     dbl      range:                        
   48  DTH30FL  —     dbl      range:                        
   49  DTHA30FL —     dbl      range:                        
   50  DTHB30FL —     dbl      range:

xportr_length()


Next we can apply the lengths from a variable level specification file to the data frame. xportr_length will identify variables that are missing from your specification file. The function will also alert you to how many lengths have been applied successfully. Before we apply the lengths lets verify that no lengths have been applied to the original dataframe.


str(adsl)
  tbl_df [306 × 50] (S3: tbl_df/tbl/data.frame)
   $ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ...
    ..- attr(*, "label")= chr "Study Identifier"
   $ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ...
    ..- attr(*, "label")= chr "Unique Subject Identifier"
   $ SUBJID  : chr [1:306] "1015" "1023" "1028" "1033" ...
    ..- attr(*, "label")= chr "Subject Identifier for the Study"
   $ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
    ..- attr(*, "label")= chr "Subject Reference Start Date/Time"
   $ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ...
    ..- attr(*, "label")= chr "Subject Reference End Date/Time"
   $ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
    ..- attr(*, "label")= chr "Date/Time of First Study Treatment"
   $ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ...
    ..- attr(*, "label")= chr "Date/Time of Last Study Treatment"
   $ RFICDTC : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Date/Time of Informed Consent"
   $ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ...
    ..- attr(*, "label")= chr "Date/Time of End of Participation"
   $ DTHDTC  : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Date/Time of Death"
   $ DTHFL   : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Subject Death Flag"
   $ SITEID  : chr [1:306] "701" "701" "701" "701" ...
    ..- attr(*, "label")= chr "Study Site Identifier"
   $ AGE     : num [1:306] 63 64 71 74 77 85 59 68 81 84 ...
    ..- attr(*, "label")= chr "Age"
   $ AGEU    : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ...
    ..- attr(*, "label")= chr "Age Units"
   $ SEX     : chr [1:306] "F" "M" "M" "M" ...
    ..- attr(*, "label")= chr "Sex"
   $ RACE    : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ...
    ..- attr(*, "label")= chr "Race"
   $ ETHNIC  : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ...
    ..- attr(*, "label")= chr "Ethnicity"
   $ ARMCD   : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
    ..- attr(*, "label")= chr "Planned Arm Code"
   $ ARM     : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Planned Arm"
   $ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
    ..- attr(*, "label")= chr "Actual Arm Code"
   $ ACTARM  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Actual Arm"
   $ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ...
    ..- attr(*, "label")= chr "Country"
   $ DMDTC   : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ...
    ..- attr(*, "label")= chr "Date/Time of Collection"
   $ DMDY    : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ...
    ..- attr(*, "label")= chr "Study Day of Collection"
   $ TRT01P  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Planned Arm"
   $ TRT01A  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Actual Arm"
   $ TRTSDTM : POSIXct[1:306], format: "2014-01-02" "2012-08-05" ...
   $ TRTSTMF : chr [1:306] "H" "H" "H" "H" ...
   $ TRTEDTM : POSIXct[1:306], format: "2014-07-02 23:59:59" "2012-09-01 23:59:59" ...
   $ TRTETMF : chr [1:306] "H" "H" "H" "H" ...
   $ TRTSDT  : Date[1:306], format: "2014-01-02" "2012-08-05" ...
   $ TRTEDT  : Date[1:306], format: "2014-07-02" "2012-09-01" ...
   $ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ...
   $ SCRFDT  : Date[1:306], format: NA NA ...
   $ EOSDT   : Date[1:306], format: "2014-07-02" "2012-09-02" ...
   $ EOSSTT  : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ...
   $ FRVDT   : Date[1:306], format: NA "2013-02-18" ...
   $ RANDDT  : Date[1:306], format: "2014-01-02" "2012-08-05" ...
   $ DTHDT   : Date[1:306], format: NA NA ...
   $ DTHADY  : num [1:306] NA NA NA NA NA NA NA NA NA NA ...
   $ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ...
   $ LSTALVDT: Date[1:306], format: "2014-07-02" "2012-09-02" ...
   $ AGEGR1  : Factor w/ 3 levels "<18","18-64",..: 2 2 3 3 3 3 2 3 3 3 ...
   $ SAFFL   : chr [1:306] "Y" "Y" "Y" "Y" ...
   $ RACEGR1 : chr [1:306] "White" "White" "White" "White" ...
   $ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ...
   $ LDDTHGR1: chr [1:306] NA NA NA NA ...
   $ DTH30FL : chr [1:306] NA NA NA NA ...
   $ DTHA30FL: chr [1:306] NA NA NA NA ...
   $ DTHB30FL: chr [1:306] NA NA NA NA ...


No lengths have been applied to the variables as seen in the printout - the lengths would be in the attr part of each variables. Let’s now use xportr_length to apply our lengths from the specification file.

adsl_length <- adsl %>% xportr_length(var_spec, domain = "ADSL", "message")
  
  ── Variable lengths missing from metadata. ──
  
3 lengths resolved
  Variable(s) present in dataframe but doesn't exist in `metadata`.
  ✖ Problem with `TRTSTMF`, `TRTETMF`, and `RANDDT`


str(adsl_length)
  tbl_df [306 × 50] (S3: tbl_df/tbl/data.frame)
   $ STUDYID : chr [1:306] "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" "CDISCPILOT01" ...
    ..- attr(*, "label")= chr "Study Identifier"
    ..- attr(*, "width")= num 21
   $ USUBJID : chr [1:306] "01-701-1015" "01-701-1023" "01-701-1028" "01-701-1033" ...
    ..- attr(*, "label")= chr "Unique Subject Identifier"
    ..- attr(*, "width")= num 30
   $ SUBJID  : chr [1:306] "1015" "1023" "1028" "1033" ...
    ..- attr(*, "label")= chr "Subject Identifier for the Study"
    ..- attr(*, "width")= num 8
   $ RFSTDTC : chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
    ..- attr(*, "label")= chr "Subject Reference Start Date/Time"
    ..- attr(*, "width")= num 19
   $ RFENDTC : chr [1:306] "2014-07-02" "2012-09-02" "2014-01-14" "2014-04-14" ...
    ..- attr(*, "label")= chr "Subject Reference End Date/Time"
    ..- attr(*, "width")= num 19
   $ RFXSTDTC: chr [1:306] "2014-01-02" "2012-08-05" "2013-07-19" "2014-03-18" ...
    ..- attr(*, "label")= chr "Date/Time of First Study Treatment"
    ..- attr(*, "width")= num 19
   $ RFXENDTC: chr [1:306] "2014-07-02" "2012-09-01" "2014-01-14" "2014-03-31" ...
    ..- attr(*, "label")= chr "Date/Time of Last Study Treatment"
    ..- attr(*, "width")= num 19
   $ RFICDTC : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Date/Time of Informed Consent"
    ..- attr(*, "width")= num 19
   $ RFPENDTC: chr [1:306] "2014-07-02T11:45" "2013-02-18" "2014-01-14T11:10" "2014-09-15" ...
    ..- attr(*, "label")= chr "Date/Time of End of Participation"
    ..- attr(*, "width")= num 19
   $ DTHDTC  : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Date/Time of Death"
    ..- attr(*, "width")= num 19
   $ DTHFL   : chr [1:306] NA NA NA NA ...
    ..- attr(*, "label")= chr "Subject Death Flag"
    ..- attr(*, "width")= num 2
   $ SITEID  : chr [1:306] "701" "701" "701" "701" ...
    ..- attr(*, "label")= chr "Study Site Identifier"
    ..- attr(*, "width")= num 5
   $ AGE     : num [1:306] 63 64 71 74 77 85 59 68 81 84 ...
    ..- attr(*, "label")= chr "Age"
    ..- attr(*, "width")= num 8
   $ AGEU    : chr [1:306] "YEARS" "YEARS" "YEARS" "YEARS" ...
    ..- attr(*, "label")= chr "Age Units"
    ..- attr(*, "width")= num 10
   $ SEX     : chr [1:306] "F" "M" "M" "M" ...
    ..- attr(*, "label")= chr "Sex"
    ..- attr(*, "width")= num 1
   $ RACE    : chr [1:306] "WHITE" "WHITE" "WHITE" "WHITE" ...
    ..- attr(*, "label")= chr "Race"
    ..- attr(*, "width")= num 60
   $ ETHNIC  : chr [1:306] "HISPANIC OR LATINO" "HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" "NOT HISPANIC OR LATINO" ...
    ..- attr(*, "label")= chr "Ethnicity"
    ..- attr(*, "width")= num 100
   $ ARMCD   : chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
    ..- attr(*, "label")= chr "Planned Arm Code"
    ..- attr(*, "width")= num 20
   $ ARM     : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Planned Arm"
    ..- attr(*, "width")= num 200
   $ ACTARMCD: chr [1:306] "Pbo" "Pbo" "Xan_Hi" "Xan_Lo" ...
    ..- attr(*, "label")= chr "Actual Arm Code"
    ..- attr(*, "width")= num 20
   $ ACTARM  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Actual Arm"
    ..- attr(*, "width")= num 200
   $ COUNTRY : chr [1:306] "USA" "USA" "USA" "USA" ...
    ..- attr(*, "label")= chr "Country"
    ..- attr(*, "width")= num 3
   $ DMDTC   : chr [1:306] "2013-12-26" "2012-07-22" "2013-07-11" "2014-03-10" ...
    ..- attr(*, "label")= chr "Date/Time of Collection"
    ..- attr(*, "width")= num 19
   $ DMDY    : num [1:306] -7 -14 -8 -8 -7 -21 NA -9 -13 -7 ...
    ..- attr(*, "label")= chr "Study Day of Collection"
    ..- attr(*, "width")= num 8
   $ TRT01P  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Planned Arm"
    ..- attr(*, "width")= num 40
   $ TRT01A  : chr [1:306] "Placebo" "Placebo" "Xanomeline High Dose" "Xanomeline Low Dose" ...
    ..- attr(*, "label")= chr "Description of Actual Arm"
    ..- attr(*, "width")= num 40
   $ TRTSDTM : POSIXct[1:306], format: "2014-01-02" "2012-08-05" ...
   $ TRTSTMF : chr [1:306] "H" "H" "H" "H" ...
    ..- attr(*, "width")= num 200
   $ TRTEDTM : POSIXct[1:306], format: "2014-07-02 23:59:59" "2012-09-01 23:59:59" ...
   $ TRTETMF : chr [1:306] "H" "H" "H" "H" ...
    ..- attr(*, "width")= num 200
   $ TRTSDT  : Date[1:306], format: "2014-01-02" "2012-08-05" ...
   $ TRTEDT  : Date[1:306], format: "2014-07-02" "2012-09-01" ...
   $ TRTDURD : num [1:306] 182 28 180 14 183 26 NA 190 10 55 ...
    ..- attr(*, "width")= num 8
   $ SCRFDT  : Date[1:306], format: NA NA ...
   $ EOSDT   : Date[1:306], format: "2014-07-02" "2012-09-02" ...
   $ EOSSTT  : chr [1:306] "COMPLETED" "DISCONTINUED" "COMPLETED" "DISCONTINUED" ...
    ..- attr(*, "width")= num 200
   $ FRVDT   : Date[1:306], format: NA "2013-02-18" ...
   $ RANDDT  : Date[1:306], format: "2014-01-02" "2012-08-05" ...
   $ DTHDT   : Date[1:306], format: NA NA ...
   $ DTHADY  : num [1:306] NA NA NA NA NA NA NA NA NA NA ...
    ..- attr(*, "width")= num 8
   $ LDDTHELD: num [1:306] NA NA NA NA NA NA NA NA NA NA ...
    ..- attr(*, "width")= num 8
   $ LSTALVDT: Date[1:306], format: "2014-07-02" "2012-09-02" ...
   $ AGEGR1  : Factor w/ 3 levels "<18","18-64",..: 2 2 3 3 3 3 2 3 3 3 ...
    ..- attr(*, "width")= num 20
   $ SAFFL   : chr [1:306] "Y" "Y" "Y" "Y" ...
    ..- attr(*, "width")= num 2
   $ RACEGR1 : chr [1:306] "White" "White" "White" "White" ...
    ..- attr(*, "width")= num 200
   $ REGION1 : chr [1:306] "NA" "NA" "NA" "NA" ...
    ..- attr(*, "width")= num 80
   $ LDDTHGR1: chr [1:306] NA NA NA NA ...
    ..- attr(*, "width")= num 200
   $ DTH30FL : chr [1:306] NA NA NA NA ...
    ..- attr(*, "width")= num 200
   $ DTHA30FL: chr [1:306] NA NA NA NA ...
    ..- attr(*, "width")= num 200
   $ DTHB30FL: chr [1:306] NA NA NA NA ...
    ..- attr(*, "width")= num 200
   - attr(*, "_xportr.df_arg_")= chr "ADSL"

Note the additional attr(*, "width")= after each variable with the width. These have been directly applied from the specification file that we loaded above!

xportr_order()

Please note that the order of the ADSL variables, see above, does not match specification file order column. We can quickly remedy this with a call to xportr_order(). Note that the variable SITEID has been moved as well as many others to match the specification file order column.

adsl_order <- xportr_order(adsl,var_spec, domain = "ADSL", verbose = "message") 

xportr_format()

Now we apply formats to the dataset. These will typically be DATE9., DATETIME20 or TIME5, but many others can be used. Notice that 8 Date/Time variables are missing a format in our ADSL dataset. Here we just take a peak at a few TRT variables, which have a NULL format.

attr(adsl$TRTSDT, "format.sas")
  NULL
attr(adsl$TRTEDT, "format.sas")
  NULL
attr(adsl$TRTSDTM, "format.sas")
  NULL
attr(adsl$TRTEDTM, "format.sas")
  NULL

Using our xportr_format() we apply our formats.

adsl_fmt <- adsl %>% xportr_format(var_spec, domain = "ADSL", "message")
attr(adsl_fmt$TRTSDT, "format.sas")
  [1] "DATE9."
attr(adsl_fmt$TRTEDT, "format.sas")
  [1] "DATE9."
attr(adsl_fmt$TRTSDTM, "format.sas")
  [1] "DATETIME20."
attr(adsl_fmt$TRTEDTM, "format.sas")
  [1] "DATETIME20."

xportr_label()


Please observe that our ADSL dataset is missing many variable labels. Sometimes these labels can be lost while using R’s function. However, A CDISC compliant data set needs to have each variable with a variable label.

look_for(adsl, details = FALSE)
   pos variable label                             
    1  STUDYID  Study Identifier                  
    2  USUBJID  Unique Subject Identifier         
    3  SUBJID   Subject Identifier for the Study  
    4  RFSTDTC  Subject Reference Start Date/Time 
    5  RFENDTC  Subject Reference End Date/Time   
    6  RFXSTDTC Date/Time of First Study Treatment
    7  RFXENDTC Date/Time of Last Study Treatment 
    8  RFICDTC  Date/Time of Informed Consent     
    9  RFPENDTC Date/Time of End of Participation 
   10  DTHDTC   Date/Time of Death                
   11  DTHFL    Subject Death Flag                
   12  SITEID   Study Site Identifier             
   13  AGE      Age                               
   14  AGEU     Age Units                         
   15  SEX      Sex                               
   16  RACE     Race                              
   17  ETHNIC   Ethnicity                         
   18  ARMCD    Planned Arm Code                  
   19  ARM      Description of Planned Arm        
   20  ACTARMCD Actual Arm Code                   
   21  ACTARM   Description of Actual Arm         
   22  COUNTRY  Country                           
   23  DMDTC    Date/Time of Collection           
   24  DMDY     Study Day of Collection           
   25  TRT01P   Description of Planned Arm        
   26  TRT01A   Description of Actual Arm         
   27  TRTSDTM  —                                 
   28  TRTSTMF  —                                 
   29  TRTEDTM  —                                 
   30  TRTETMF  —                                 
   31  TRTSDT   —                                 
   32  TRTEDT   —                                 
   33  TRTDURD  —                                 
   34  SCRFDT   —                                 
   35  EOSDT    —                                 
   36  EOSSTT   —                                 
   37  FRVDT    —                                 
   38  RANDDT   —                                 
   39  DTHDT    —                                 
   40  DTHADY   —                                 
   41  LDDTHELD —                                 
   42  LSTALVDT —                                 
   43  AGEGR1   —                                 
   44  SAFFL    —                                 
   45  RACEGR1  —                                 
   46  REGION1  —                                 
   47  LDDTHGR1 —                                 
   48  DTH30FL  —                                 
   49  DTHA30FL —                                 
   50  DTHB30FL —


Using the xport_label function we can take the specifications file and label all the variables available. xportr_label will produce a warning message if you the variable in the data set is not in the specification file.


adsl_update <- adsl %>% xportr_label(var_spec, domain = "ADSL", "message")
  ── Variable labels missing from metadata. ──
  
3 labels skipped
  Variable(s) present in dataframe but doesn't exist in `metadata`.
  ✖ Problem with `TRTSTMF`, `TRTETMF`, and `RANDDT`
look_for(adsl_update, details = FALSE)
   pos variable label                                  
    1  STUDYID  Study Identifier                       
    2  USUBJID  Unique Subject Identifier              
    3  SUBJID   Subject Identifier for the Study       
    4  RFSTDTC  Subject Reference Start Date/Time      
    5  RFENDTC  Subject Reference End Date/Time        
    6  RFXSTDTC Date/Time of First Study Treatment     
    7  RFXENDTC Date/Time of Last Study Treatment      
    8  RFICDTC  Date/Time of Informed Consent          
    9  RFPENDTC Date/Time of End of Participation      
   10  DTHDTC   Date / Time of Death                   
   11  DTHFL    Subject Death Flag                     
   12  SITEID   Study Site Identifier                  
   13  AGE      Age                                    
   14  AGEU     Age Units                              
   15  SEX      Sex                                    
   16  RACE     Race                                   
   17  ETHNIC   Ethnicity                              
   18  ARMCD    Planned Arm Code                       
   19  ARM      Description of Planned Arm             
   20  ACTARMCD Actual Arm Code                        
   21  ACTARM   Description of Actual Arm              
   22  COUNTRY  Country                                
   23  DMDTC    Date/Time of Collection                
   24  DMDY     Study Day of Collection                
   25  TRT01P   Planned Treatment for Period 01        
   26  TRT01A   Actual Treatment for Period 01         
   27  TRTSDTM  Datetime of First Exposure to Treatment
   28  TRTSTMF                                         
   29  TRTEDTM  Datetime of Last Exposure to Treatment 
   30  TRTETMF                                         
   31  TRTSDT   Date of First Exposure to Treatment    
   32  TRTEDT   Date of Last Exposure to Treatment     
   33  TRTDURD  Total Duration of Trt  (days)          
   34  SCRFDT   Screen Failure Date                    
   35  EOSDT    End of Study Date                      
   36  EOSSTT   End of Study Status                    
   37  FRVDT    Final Retrievel Visit Date             
   38  RANDDT                                          
   39  DTHDT    Death Date                             
   40  DTHADY   Relative Day of Death                  
   41  LDDTHELD Elapsed Days from Last Dose to Death   
   42  LSTALVDT Date Last Known Alive                  
   43  AGEGR1   Pooled Age Group 1                     
   44  SAFFL    Safety Population Flag                 
   45  RACEGR1  Pooled Race Group 1                    
   46  REGION1  Geographic Region 1                    
   47  LDDTHGR1 Last Does to Death Group               
   48  DTH30FL  Under 30  Group                        
   49  DTHA30FL Over 30  Group                         
   50  DTHB30FL Over 30 plus 30 days Group

xportr_write()


Finally, we arrive at exporting the R data frame object as a xpt file with the function xportr_write(). The xpt file will be written directly to your current working directory. To make it more interesting, we have put together all six functions with the magrittr pipe, %>%. A user can now apply types, length, variable labels, formats, data set label and write out their final xpt file in one pipe! Appropriate warnings and messages will be supplied to a user to the console for any potential issues before sending off to standard clinical data set validator application or data reviewers.

adsl %>%
  xportr_type(var_spec, "ADSL", "message") %>%
  xportr_length(var_spec, "ADSL", "message") %>%
  xportr_label(var_spec, "ADSL", "message") %>%
  xportr_order(var_spec, "ADSL", "message") %>% 
  xportr_format(var_spec, "ADSL", "message") %>% 
  xportr_write("adsl.xpt", label = "Subject-Level Analysis Dataset")

That’s it! We now have a xpt file created in R with all appropriate types, lengths, labels, ordering and formats from our specification file.

As always, we welcome your feedback. If you spot a bug, would like to see a new feature, or if any documentation is unclear - submit an issue on xportr’s Github page.