# Introduction

## What is GGIR?

GGIR is an R-package to process multi-day raw accelerometer data for physical activity and sleep research. GGIR will write all output files into two sub-directories of ./meta and ./results. GGIR is increasingly being used by a number of academic institutes across the world.

## What is postGGIR?

postGGIR is an R-package to data processing after running GGIR for accelerometer data. In detail, all necessary R/Rmd/shell files were generated for data processing after running GGIR for accelerometer data. Then in part 1, all csv files in the GGIR output directory were read, transformed and then merged. In part 2, the GGIR output files were checked and summarized in one excel sheet. In part 3, the merged data was cleaned according to the number of valid hours on each night and the number of valid days for each subject. In part 4, the cleaned activity data was imputed by the average ENMO over all the valid days for each subject. Finally, a comprehensive report of data processing was created using Rmarkdown, and the report includes few explortatory plots and multiple commonly used features extracted from minute level actigraphy data in part 5-7. This vignette provides a general introduction to postGGIR.

# Setting up your work environment

## Install R and RStudio

Download and install R

Download and install RStudio (optional, but recommended)

Download GGIR with its dependencies, you can do this with one command from the console command line:

install.packages("postGGIR", dependencies = TRUE)

## Prepare folder structure

1. folder of .bin files for GGIR or a file listing all .bin files
• R program will check the missing in the GGIR output by comparing with all raw .bin files
2. foder of the GGIR output with two sub-folders
• meta (./basic, ./csv, etc)
• results (partsummary.csv)

# Quick start

## Create a template shell script of postGGIR

library(postGGIR)
create.postGGIR()

The function will create a template shell script of postGGIR in the current directory, names as STUDYNAME_part0.maincall.R.

cat STUDYNAME_part0.maincall.R
options(width=2000)
argv = commandArgs(TRUE);
print(argv)
print(paste("length=",length(argv),sep=""))
mode<-as.numeric(argv[1])
print(c("mode =", mode))
# (Note) Please remove the above lines if you are running this within R console
#        instead of submitting jobs to a cluster.

#########################################################################
# (user-define 1) you need to redefine this according different study!!!!
#########################################################################
# example 1
filename2id.1<-function(x)  unlist(strsplit(y1,"\\."))[1]

#  example 2 (use csv file =c("filename","ggirID"))
filename2id.2<-function(x) {
d<-read.csv("./postGGIR/inst/example/filename2id.csv",head=1,stringsAsFactors=F)
y1<-which(d[,"filename"]==x)
if (length(y1)==0) stop(paste("Missing ",x," in filename2id.csv file",sep=""))
if (length(y1)>=1) y2<-d[y1[1],"newID"]
return(as.character(y2))
}

#########################################################################
#  main call
#########################################################################

call.afterggir<-function(mode,filename2id=filename2id.1){

library(postGGIR)
#########################################################################
# (user-define 2) Fill in parameters of your ggir output
##########################################################################

currentdir =
studyname =
bindir =
outputdir =
setwd(currentdir)

rmDup=FALSE   # keep all subjects in postGGIR
PA.threshold=c(50,100,400)
part5FN="WW_L50M125V500_T5A5"
epochIn = 5
epochOut = 5
flag.epochOut = 60
use.cluster = FALSE
log.multiplier = 9250
QCdays.alpha = 7
QChours.alpha = 16
useIDs.FN<-NULL
Rversion="R"
desiredtz="US/Eastern"
RemoveDaySleeper=FALSE
part5FN=part5FN,
NfileEachBundle=20
trace=FALSE
#########################################################################
#   remove duplicate sample IDs for plotting and feature extraction
#########################################################################
if (mode==3 & rmDup){
# step 1: read ./summary/*remove_temp.csv file (output of mode=2)
keep.last<-TRUE #keep the latest visit for each sample
sumdir<-paste(currentdir,"/summary",sep="")
setwd(sumdir)
inFN<-paste(studyname,"_samples_remove_temp.csv",sep="")
useIDs.FN<-paste(sumdir,"/",studyname,"_samples_remove.csv",sep="")

#########################################################################
# (user-define 3 as rmDup=TRUE)  create useIDs.FN file
#########################################################################
# step 2: create the ./summary/*remove.csv file manually or by R commands
d<-read.csv(inFN,head=1,stringsAsFactors=F)
d<-d[order(d[,"Date"]),]
d<-d[order(d[,"newID"]),]
d[which(is.na(d[,"newID"])),]
S<-duplicated(d[,"newID"],fromLast=keep.last) #keep the last copy for nccr
d[S,"duplicate"]<-"remove"
write.csv(d,file=useIDs.FN,row.names=F)

}

#########################################################################
#   call afterggir
#########################################################################

setwd(currentdir)
afterggir(mode=mode,
useIDs.FN=useIDs.FN,
currentdir=currentdir,
studyname=studyname,
bindir=bindir,
outputdir=outputdir,
epochIn=epochIn,
epochOut=epochOut,
flag.epochOut=flag.epochOut,
log.multiplier=log.multiplier,
use.cluster=use.cluster,
QCdays.alpha=QCdays.alpha,
QChours.alpha=QChours.alpha,
QCnights.feature.alpha=QCnights.feature.alpha,
Rversion=Rversion,
filename2id=filename2id,
PA.threshold=PA.threshold,
desiredtz=desiredtz,
RemoveDaySleeper=RemoveDaySleeper,
part5FN=part5FN,
NfileEachBundle=NfileEachBundle,
trace=trace)

}
#########################################################################
call.afterggir(mode)
#########################################################################

#   Note:   call.afterggir(mode)
#        mode = 0 : creat sw/Rmd file
#        mode = 1 : data transform using cluster or not
#        mode = 2 : summary
#        mode = 3 : clean
#        mode = 4 : impu

## Edit shell script

Three places were marked as “user-define” and need to be edited by user in the STUDYNAME_part0.maincall.R file. Please rename the file by replacing your real studyname after the edition.

### 1. Define the function filename2id( )

This user-defined function will change the filename of the raw accelerometer file to the short ID. For example, the first example change “0002__026907_2016-03-11 13-05-59.bin” to new ID of “0002”. If you prefer to define new ID by other way, you could create a .CSV file including “filename” and “newID” at least and then defined this function as the second example. The new variable of “newID”, included in the output files, could be used as the key ID in the summary report of postGGIR and be used to define the duplicate samples as well.

### 2. Parameters of shell script

User needs to define the following parameters as follows,

Variables Description
rmDup Set rmDup = TRUE if user want to remove some samples such as duplicates. Set rmDup = FALSE if user want to keep all samples.
mode Specify which of the five parts need to be run, e.g. mode = 0 makes that all R/Rmd/sh files are generated for other parts. When mode = 1, all csv files in the GGIR output directory were read, transformed and then merged. When mode = 2, the GGIR output files were checked and summarized in one excel sheet. When mode = 3, the merged data was cleaned according to the number of valid hours on each night and the number of valid days for each subject. When mode = 4, the cleaned data was imputed.
useIDs.FN Filename with or without directory for sample information in CSV format, which including “filename” and “duplicate” in the headlines at least. If duplicate=“remove”, the accelerometer files will not be used in the data analysis of part 5-7. Defaut is NULL, which makes all accelerometer files will be used in part 5-7.
currentdir Directory where the output needs to be stored. Note that this directory must exist.
studyname Specify the study name that used in the output file names
bindir Directory where the accelerometer files are stored or list
outputdir Directory where the GGIR output was stored.
epochIn Epoch size to which acceleration was averaged (seconds) in GGIR output. Defaut is 5 seconds.
epochOut Epoch size to which acceleration was averaged (seconds) in part1. Defaut is 5 seconds.
flag.epochOut Epoch size to which acceleration was averaged (seconds) in part 3. Defaut is 60 seconds.
log.multiplier The coefficient used in the log transformation of the ENMO data, i.e. log( log.multiplier * ENMO + 1), which have been used in part 5-7. Defaut is 9250.
use.cluster Specify if part1 will be done by parallel computing. Default is TRUE, and the CSV file in GGIR output will be merged for every 20 files first, and then combined for all.
QCdays.alpha Minimum required number of valid days in subject specific analysis as a quality control step in part2. Default is 7 days.
QChours.alpha Minimum required number of valid hours in day specific analysis as a quality control step in part2. Default is 16 hours.
QCnights.feature.alpha Minimum required number of valid nights in day specific mean and SD analysis as a quality control step in the JIVE analysis. Default is c(0,0), i.e. no additional data cleaning in this step.
Rversion R version, eg. “R/3.6.3”. Default is “R”.
filename2id User defined function for converting filename to sample IDs. Default is NULL.
PA.threshold Threshold for light, moderate and vigorous physical activity. Default is c(50,100,400).
desiredtz desired timezone: see also https://en.wikipedia.org/wiki/Zone.tab. Used in g.inspectfile(). Default is “US/Eastern”.
RemoveDaySleeper Specify if the daysleeper nights are removed from the calculation of number of valid days for each subject. Default is FALSE.
part5FN Specify which output is used in the GGIR part5 results. Defaut is “WW_L50M125V500_T5A5”, which means that part5_daysummary_WW_L50M125V500_T5A5.csv and part5_personsummary_WW_L50M125V500_T5A5.csv are used in the analysis.
NfileEachBundle Number of files in each bundle when the csv data were read and processed in a cluster. Default is 20.
trace Specify if the intermediate results is printed when the function was executed. Default is FALSE.

### 3. Subset of samples (optional)

The postGGIR package not only simply transform/merge the activity and sleep data, but it also can do some prelimary data analysis such as principle componet analysis and feature extraction. Therefore, the basic data clean will be processed first as follows,

• data clean by removing valid days/samples defined by minimum required number of valid hours/days in the activity data
• remove duplicate samples

If you prefer to use all samples, just skip this part and use rmDup=FALSE as the default. Otherwise, if you want to remove some samples such as duplicates, there are two ways as follows,

• Edit R codes of “step 2” in this part. For example, the template will keep the later copy for duplicate samples
• Remove R codes of “step 2” in this part, and create studyname_samples_remove.csv file by filling “remove” in the “duplicate” column in the template file of studyname_samples_remove_temp.csv. The data will be kept unless duplicate=“remove”.

## Run R script

call.afterggir(mode,filename2id)   
Variables Description
mode Specify which of the five parts need to be run, e.g. mode = 0 makes that all R/Rmd/sh files are generated for other parts. When mode = 1, all csv files in the GGIR output directory were read, transformed and then merged. When mode = 2, the GGIR output files were checked and summarized in one excel sheet. When mode = 3, the merged data was cleaned according to the number of valid hours on each night and the number of valid days for each subject. When mode = 4, the cleaned data was imputed.
filename2id This user-defined function will change the filename of the raw accelerometer file to the short ID for the purpose of identifying duplicate IDs.

## Run script in a cluster

#!/bin/bash
#
#$-cwd #$ -j y
#\$ -S /bin/bash
source ~/.bash_profile

cd /postGGIR/inst/example/afterGGIR;
module load R ;
R --no-save --no-restore --args  < studyname_ggir9s_postGGIR.pipeline.maincall.R  0
R --no-save --no-restore --args  < studyname_ggir9s_postGGIR.pipeline.maincall.R  1
R --no-save --no-restore --args  < studyname_ggir9s_postGGIR.pipeline.maincall.R  2
R --no-save --no-restore --args  < studyname_ggir9s_postGGIR.pipeline.maincall.R  3
R --no-save --no-restore --args  < studyname_ggir9s_postGGIR.pipeline.maincall.R  4

R -e "rmarkdown::render('part5_studyname_postGGIR.report.Rmd'   )"
R -e "rmarkdown::render('part6_studyname_postGGIR.nonwear.report.Rmd'   )"
R -e "rmarkdown::render('part7a_studyname_postGGIR_JIVE_1_somefeatures.Rmd'   )"
R -e "rmarkdown::render('part7b_studyname_postGGIR_JIVE_2_allfeatures.Rmd'   )"
R -e "rmarkdown::render('part7c_studyname_postGGIR_JIVE_3_excelReport.Rmd'   )" 

# Software Functionalities

## Part 1

The functions of part 1 read the activity data measured by Euclidian norm minus one (ENMO) in long csv-spreadsheets and then the data was merged and transformed into a square matrix for all subjects. To be noted, a day could have 23 or 25 hours when daylight saving time started or ended. The ENMO data would take the average for those duplicate timestamps caused by daylight saving time so as to be merged systematically with other regular 24-hour days. In addition, the angle of the z-axis (ANGLEZ) relative to the horizontal plane (degrees) utilizing all three axes was also merged and saved into an excel file. Besides that, more data were output including the light mean, light peak, temperature mean and clipping score and Euclidian norm metric (EN) if available. All outputs were stored under the ./data directory for part 1.

## Part 2

The functions of part 2 introduced some descriptive variables of all accelerometer files in the GGIR output. In part 2, an excel file was output under the ./summary directory, which includes nine pages as follows, 1) List of files in the GGIR output (2) Summary of numbers of output files (3) List of duplicate IDs (4) ID errors (5) Number of valid days (6) Table of number of valid/missing days (7) Missing pattern (8) Frequency of the missing pattern (9) Description of all accelerometer files. Multiple plots were generated in a pdf file including the number of valid hours, days, missing pattern, etc. Technically, the raw accelerometer data was visited to obtain a complete input list for the purpose of examining missingness of GGIR output when user specified the path. Additionally, the summary results from GGIR output were also visited here to form a comprehensive data quality report in the part 2 of postGGIR.

## Part 3 and Part 4

The functions of part 3 introduced some ‘flag’ variables for data cleaning of the merged ENMO and ANGLEZ data. As default, only days with more than 16 hours were marked as valid days and subjects with more than 7 valid days were marked as valid samples. User could specify these two parameters in the main function. Further, the data could be aggregated into minute-level or hour-level data as required by users. In part 4, the ENMO data was imputed by taking the subject-level mean over all the valid days for each subject. All output was stored under the ./data directory. In the output file, a few description variables were included as follows (1) the number of valid hours; (2) missing pattern for each subject; (3) non-wear time in minutes; (4) an indicator variable to indicate if the visit should be removed for having multiple visits and (5) the number of missing values after imputation. Samples were removed when systematic missingness was observed, i.e., the activity data was missing on same timestamps among all days and therefore could not be imputed.

## Part 5

The functions of part 5 generated a comprehensive report in the .html format, which included data quality check and an exploratory data analysis. In the report, first, the file lists were summarized in each GGIR folders and summary files so that user could easily check the missingness in each GGIR parts. Second, the duplicate samples were checked and marked, where the duplication might be caused by having multiply visits for some samples but only one visit will be kept in the data analysis such as functional principal component analysis (FPCA). Thirdly, as shown in Figure 1, data quality was checked such as the number of valid days, non-wear time and missing pattern, which was helpful to better understand the data by the graphical presentation. As an exploratory analysis, the data correlation, and the output of FPCA analysis was plotted in the report.

## Part 6 (Optional)

The non-wear data was loaded from the M$$metalong$$nonwearscore variable of the R data that was stored in the folder of /meta/basic/meta of the GGIR output, and the matrix to clarify when data was imputed for each long epoch time window and the reason for imputation. This function will generate a non-wear matrix at minute level, and it could be skipped if user chose to use the imputation data in the JIVE application as default.

## Part 7

In part 7, about 100 features were extracted from minute level activity data of three domains of sleep, physical activity, and circadian rhythmicity, which were based on outputs from GGIR v2.4.0 and calculated by R ActFrag and ActCR packages (Di et al., 2019). The standard deviation across days on each subject was also created for each feature. The weekday and weekend specific features were extracted as well since most features in the sleep and physical activity showed significant difference between weekdays and weekends. In brief, sleep domain referred to sleep duration, midpoint, efficiency, etc. Physical activity referred to daily motor activity such as sedentary behavior, light, and moderate-to-vigorous physical activity (MVPA). Circadian rhythms were natural rhythms that regulates the sleep-wake cycle within every 24 hours. For example, the cosinor curve and FPCA analysis were used in modeling of biological rhythms. A comprehensive list of all features could be found in the supplementary table and the detailed definition could be found in the GGIR manual and Di et al.’s publication in 2019.

# Running postGGIR and Inspecting the results

## Input and output of part 0

• Command = call.afterggir(mode=0)
• Output folder = ./
Output Description
part1_data.transform.R (use.cluster=TRUE, optional) R code for data transformation and merge for every 20 files in each partition. When the number of .bin files is large ( > 1000), the data merge could take long time, user could split the job and submit the job to a cluster for parallel computing.
part1_data.transform.sw (use.cluster=TRUE, optional) Submit the job to a cluster for parallel computing
part1_data.transform.merge.sw (use.cluster=TRUE, optional) Merge all partitions for the ENMO and ANGLEZ data
part5_studyname_postGGIR.report.Rmd R markdown file for generate a comprehensive report of data processing and explortatory plots.
part6_studyname_postGGIR.nonwear.report.Rmd R markdown file for generate a report of nonwear score.
part7a_studyname_postGGIR_JIVE_1_somefeatures.Rmd Extract some features from the actigraphy data using R
part7b_studyname_postGGIR_JIVE_2_allfeatures.Rmd Extract other features from the GGIR output and merge all features together
part7c_studyname_postGGIR_JIVE_3_runJIVE.Rmd Perform JIVE Decomposition for All Features using r.jive
part7d_studyname_postGGIR_JIVE_4_somefeatures_weekday.Rmd Extract some weekday/weekend specific features from the actigraphy data using R

part9_swarm.sh | shell script to submit all jobs to the cluster

## Input and output of part 1

• Main input files: csv files under /meta/csv folders of GGIR output
• Command = call.afterggir(mode=1)
• Output folder = ./data
Output Description
studyname_filesummary_csvlist.csv File list in the ./csv folder of GGIR
studyname_filesummary_Rdatalist.csv File list in the ./basic folder of GGIR
All_studyname_ANGLEZ.data.csv Raw data of ANGLEZ after merge
All_studyname_ENMO.data.csv Raw data of ENMO after merge
nonwearscore_studyname_f0_f1_Xs.csv Data matrix of nonwearscore
nonwearscore_studyname_f0_f1_Xs.pdf Plots for nonwearscore
plot.nonwearVSnvalidhours.csv Nonwear data for plot
plot.nonwearVSnvalidhours.pdf Nonwear plots
lightmean_studyname_f0_f1_Xs.csv Data matrix of lightmean
lightpeak_studyname_f0_f1_Xs.csv Data matrix of lightpeak
temperaturemean_studyname_f0_f1_Xs.csv Data matrix of temperaturemean
clippingscore_studyname_f0_f1_Xs.csv Raw data of clippingscore
EN_studyname_f0_f1_Xs.csv Data matrix of EN

f0 and f1 are the file index to start and finish with
Xs is the epoch size to which acceleration was averaged (seconds) in GGIR output

## Input and output of part 2

• Main input files
+ ./data/All_studyname_ENMO.data.csv
+ GGIR results: part2, part4 and part5 (please specify $$part5FN$$ in the main function)
+ GGIR raw data when bindir was specified

• Command = call.afterggir(mode=2)

• Output folder = ./summary

Output Description
studyname_ggir_output_summary.xlsx Description of all accelerometer files in the GGIR output. This excel file includs 9 pages as follows, (1) List of files in the GGIR output (2) Summary of files (3) List of duplicate IDs (4) ID errors (5) Number of valid days (6) Table of number of valid/missing days (7) Missing patten (8) Frequency of the missing pattern (9) Description of all accelerometer files.
part2daysummary.info.csv Intermediate results for description of each accelerometer file.
studyname_ggir_output_summary_plot.pdf Some plots such as the number of valid days, which were included in the part2a_studyname_postGGIR.report.html file as well.
studyname_samples_remove_temp.csv Create studyname_samples_remove.csv file by filling “remove” in the “duplicate” column in this template. If duplicate=“remove”, the accelerometer files will not be used in the data analysis of part5.

## Input and output of part 3

• Main input file: ./data/All_studyname_ENMO.data.csv
• Command = call.afterggir(mode=3)
• Output folder = ./data
Output Description
flag_All_studyname_ANGLEZ.data.Xs.csv Adding flags for data cleaning of the raw ANGLEZ data
flag_All_studyname_ENMO.data.Xs.csv Adding flags for data cleaning of the raw ENMO data
IDMatrix.flag_All_studyname_ENMO.data.60s.csv ID matrix

*Xs is the epoch size to which acceleration was averaged (seconds) in GGIR output

## Input and output of part 4

• Main input file: ./data/flag_All_ studyname _ENMO.data.5s.csv
• Command = call.afterggir(mode=4)
• Output folder = ./data
Output Description
impu.flag_All_studyname_ENMO.data.60s.csv Imputation data for the merged ENMO data, and the missing values were imputated by the average ENMO over all the valid days for each subject.

## Description of flag variables in the output data

Variable Description
filename accelerometer file name
Date date recored from the GGIR part2.summary file
id IDs recored from the GGIR part2.summary file
calender_date date in the format of yyyy-mm-dd
N.valid.hours number of hours with valid data recored from the part2_daysummary.csv file in the GGIR output
N.hours number of hours of measurement recored from the part2_daysummary.csv file in the GGIR output
weekday day of the week-Day of the week
measurementday day of measurement-Day number relative to start of the measurement
newID new IDs defined as the user-defined function of filename2id(), e.g. substrings of the filename
Nmiss_c9_c31 number of NAs from the 9th to 31th column in the part2_daysummary.csv file in the GGIR output
missing “M” indicates missing for an invalid day, and “C” indicates completeness for a valid day
Ndays number of days of measurement
ith_day rank of the measurementday, for example, the value is 1,2,3,4,-3,-2,-1 for measurementday = 1,…,7
Nmiss number of missing (invalid) days
Nnonmiss number of non-missing (valid) days
misspattern indicators of missing/nonmissing for all measurement days at the subject level
RowNonWear number of columnns in the non-wearing matrix
NonWearMin number of minutes of non-wearing
remove16h7day indicator of a key qulity control output. If remove16h7day=1, the day need to be removed. If remove16h7day=0, the day need to be kept.
duplicate If duplicate=“remove”, the accelerometer files will not be used in the data analysis of part5.
ImpuMiss.b number of missing values on the ENMO data before imputation
ImpuMiss.a number of missing values on the ENMO data after imputation
KEEP The value is “keep”/“remove”, e.g. KEEP=“remove” if remove16h7day=1 or duplicate=“remove” or ImpuMiss.a>0

## Input and output of part 5

• Main input files
• ./summary/studyname_ggir_output_summary.xlsx
• ./summary/part24daysummary.info.csv
• ./data/plot.nonwearVSnvalidhours.csv
• ./data/impu.flag_All_studyname_ENMO.data.flag.epochOuts.csv
• Command: run part5_studyname_postGGIR.report.Rmd
• Output folder = ./
Output Description
part5_studyname_postGGIR.report.html A comprehensive report of data processing and explortatory plots.

## Input and output of part 6

• Main input file: nonwearscore_studyname_01_xx_900s.csv
• Command = run part6_studyname_postGGIR.nonwear.report.Rmd
• Output folder = ./
Folder Output Description
./ part6_studyname_postGGIR.nonwear.report.html A report of nonwear score.
./data JIVEraw_nonwearscore_studyname_f0_f1_Xs.csv Imputation data matrix of nonwearscore (1/0)
./data JIVEimpu_nonwearscore_studyname_f0_f1_Xs.csv Data matrix of nonwearscore (1/0/NA)

f0 and f1 are the file index to start and finish with
Xs is the epoch size to which acceleration was averaged (seconds) in GGIR output

## Input and output of part 7a

• Main input
• ./data/impu.flag_All_studyname_ENMO.data.flag.epochOuts.csv
• GGIR: /results/QC/part4_nightsummary_sleep_full.csv
• Command = run part7a_studyname_postGGIR_JIVE_1_somefeatures.Rmd
• Output folder = ./
Output Description
part7_studyname_all_features_dictionary.xlsx Description of features
part7a_studyname_postGGIR_JIVE_1_somefeatures.html Extract some features from the actigraphy data using R
part7a_studyname_some_features_page1_features.csv List of some features
part7a_studyname_some_features_page2_face_day_PCs.csv Function PCA at the day level using fpca.face( )
part7a_studyname_some_features_page3_face_subject_PCs.csv Function PCA at the subject level using fpca.face( )
part7a_studyname_some_features_page4_denseFLMM_day_PCs.csv Function PCA at the day level using denseFLMM( )
part7a_studyname_some_features_page5_denseFLMM_subject_PCs.csv Function PCA at the subject level using denseFLMM( )

## Input and output of part 7b

• Main inputs
• GGIR: part2_summary.csv
• GGIR: part2_daysummary.csv
• GGIR: part4_nightsummary_sleep_cleaned.csv
• GGIR: /results/QC/part4_nightsummary_sleep_full.csv
• GGIR: part5_daysummary_part5FN.csv
• part7_studyname_all_features_dictionary.xlsx
• Command = run part7b_studyname_postGGIR_JIVE_2_allfeatures.Rmd
• Output folder = ./
Output Description
part7b_studyname_postGGIR_JIVE_2_allfeatures.html Extract other features from the GGIR output and merge all features together
part7b_studyname_all_features_1.csv Raw data of all features
part7b_studyname_all_features_2.csv Keep sample with valid ENMO inputs
part7b_studyname_all_features_2.csv.log Log file of each variable of part5b_studyname_all_features_2.csv
plot_part7b_studyname_all_features_2.csv.pdf Plot of each variable of part5b_studyname_all_features_2.csv
part7b_studyname_all_features_3.csv Average variable at the subject level
part7b_studyname_all_features_3.csv.log Log file of each variable of part5b_studyname_all_features_3.csv
plot_part7b_studyname_all_features_3.csv.pdf Plot of each variable of part5b_studyname_all_features_3.csv
part7b_studyname_all_features_4.csv subject level SD of each feature

## Input and output of part 7c

• Main inputs
• part7b_studyname_all_features_3_subject.csv
• part7b_studyname_all_features_4_subjectSD.csv
• Command = run part7c_studyname_postGGIR_JIVE_3_runJIVE.Rmd
• Output folder = ./
Output Description
part7c_studyname_postGGIR_JIVE_4_outputReport.html Perform JIVE Decomposition for All Features using r.jive
part7c_studyname_jive_Decomposition.csv Joint and individual structure estimates
part7c_studyname_jive_predScore.csv PCA scores of JIVE ( missing when jive.predict failes)
part7c_studyname_jive_predScore.csv PCA scores of JIVE ( missing when jive.predict failes)

## Input and output of part 7d

• Main input: ./data/impu.flag_All_studyname_ENMO.data.flag.epochOuts.csv
• Command = run part7d_studyname_postGGIR_JIVE_5_somefeatures_weekday.Rmd
• Output folder = ./
Output Description
part7d_studyname_some_features_page1.csv Perform JIVE Decomposition for All Features using r.jive
part7d_weekday_studyname_all_features_3.csv subject level mean of each feature on weekday
part7d_weekday_studyname_some_features_page4_denseFLMM_day_PCs.csv Function PCA at the day level using denseFLMM( ) on weekday
part7d_weekday_studyname_some_features_page5_denseFLMM_subject_PCs.csv Function PCA at the subject level using denseFLMM( ) on weekday
part7d_weekend_studyname_all_features_3.csv subject level mean of each feature on weekend
part7d_weekend_studyname_some_features_page4_denseFLMM_day_PCs.csv Function PCA at the day level using denseFLMM( ) on weekend
part7d_weekend_studyname_some_features_page5_denseFLMM_subject_PCs.csv Function PCA at the subject level using denseFLMM( ) on weekend

## Description of features of domains of physical activity, sleep and circadian rhythmicity

Sleep Domain

Variable Description
sleeponset Detected onset of sleep expressed as hours since the midnight of the previous night.
wakeup Detected waking time (after sleep period) expressed as hours since the midnight of the previous night.
number_sib_wakinghours Number of sustained inactivity bouts during the day, with day referring to the time outside the Sleep Period Time window.
SleepDurationInSpt Total sleep duration, which equals the accumulated nocturnal sustained inactivity bouts within the Sleep Period Time
sleep_Midpoint Sleep Midpoint = (sleeponset + wakeup)/2.
sleep_efficiency Sleep Efficiency = SleepDurationInSpt / (wakeup - sleeponset), which was the fraction of time spent asleep in the Sleep Period Time window.
N_atleast5minwakenight Number of times awake during the night for at least 5 minutes
ACC_spt_sleep_mg Average acceleration during sleep (mg)
Nblocks_spt_sleep Number of blocks of night sleep within the Sleep period time window.

Physical Activity Domain

Variable Description
TAC Total volume of physical activity in the daytime
TLAC Total volume of log-transformed physical activity in the daytime
TAC_24h Total volume of physical activity within 24 hours
TLAC_24h Total volume of log-transformed physical activity within 24 hours
sed_dur Total daytime duration in minutes of sedentary activity
light_dur Total daytime duration in minutes of light activity
mod_dur Total daytime duration in minutes of moderate activity
vig_dur Total daytime duration in minutes of vigorous activity
MVPA_dur Total daytime duration in minutes of moderate to vigorous activity
sed_dur_M10 Total duration in minutes of sedentary activity within M10 window
light_dur_M10 Total duration in minutes of light activity within M10 window
mod_dur_M10 Total duration in minutes of moderate activity within M10 window
vig_dur_M10 Total duration in minutes of vigorous activity within M10 window
MVPA_dur_M10 Total duration in minutes of moderate to vigorous activity within M10 window
sed_dur_24h Total duration in minutes of sedentary activity within 24 hours
light_dur_24h Total duration in minutes of light activity within 24 hours
mod_dur_24h Total duration in minutes of moderate activity within 24 hours
vig_dur_24h Total duration in minutes of vigorous activity within 24 hours
MVPA_dur_24h Total duration in minutes of moderate to vigorous activity within 24 hours
mean_r Mean sedentary bout duration.
mean_a Mean active bout duration.
SATP Sedentary to active transition probabilities.
ASTP Active to sedentary transition probabilities.
Gini_r Gini index for active bout, absolute variability normalized to the average bout duration (resting).
Gini_a Gini index for sedentary bout, absolute variability normalized to the average bout duration (active).
alpha_r Power law parameter for sedentary bout.
alpha_a Power law parameter for active bout.
h_r Hazard function for sedentary bout.
h_a Hazard function for active bout.
dur_day_total_IN_min Total duration of day in minutes spent in total inactivity during the day
dur_day_total_LIG_min Total duration of day in minutes of light activity during the day
dur_day_total_MOD_min Total duration of day in minutes of moderate activity during the day
dur_day_total_VIG_min Total duration of day in minutes of vigorous activity during the day
dur_day_MVPA_bts_10_min Total duration in minutes of Moderate and Vigorous Physical Activity (MVPA) for bouts 10 minutes or more
dur_day_MVPA_bts_5_10_min Total duration in minutes of Moderate and Vigorous Physical Activity (MVPA) for bouts 5 to 10 minutes
dur_day_MVPA_bts_1_5_min Total duration in minutes of Moderate and Vigorous Physical Activity (MVPA) for bouts 1 to 5 minutes
Nbouts_day_IN_bts_30 Number of bouts of inactivity for bouts 30 minutes or more
Nbouts_day_IN_bts_20_30 Number of bouts of inactivity for bouts 20 to 30 minutes
Nbouts_day_IN_bts_10_20 Number of bouts of inactivity for bouts 10 to 20 minutes
Nbouts_day_LIG_bts_10 Number of bouts of light for bouts 10 minutes or more
Nbouts_day_LIG_bts_5_10 Number of bouts of light for bouts 5 to 10 minutes
Nbouts_day_LIG_bts_1_5 Number of bouts of light for bouts 1 to 5 minutes
Nbouts_day_MVPA_bts_10 Number of bouts of Moderate and Vigorous Physical Activity (MVPA) for bouts 10 minutes or more
Nbouts_day_MVPA_bts_5_10 Number of bouts of Moderate and Vigorous Physical Activity (MVPA) for bouts 5 to 10 minutes
Nbouts_day_MVPA_bts_1_5 Number of bouts of Moderate and Vigorous Physical Activity (MVPA) for bouts 1 to 5 minutes
Nblocks_day_total_IN Number of blocks of total inactivity during day
Nblocks_day_total_LIG Number of blocks of total light activity during day
Nblocks_day_total_MOD Number of blocks of total moderate activity during day
Nblocks_day_total_VIG Number of blocks of total vigorous activity during day

Circadian Rhythmicity Domain

Variable Description
L5TIME_num Timing of least active 5 hours
M10TIME_num Timing of most active 10 hours
L5VALUE Average acceleration value (mg) of least active 5 hours
M10VALUE Average acceleration value (mg) of most active 10 hours
RA_ggir Relative amplitude reflects the difference between M10 activity and L5 activity and RA_ggir = (M10VALUE-L5VALUE)/(M10VALUE+L5VALUE).
L5 Average acceleration value (mg) of least active 5 hours
M10 Average acceleration value (mg) of most active 10 hours
L5TIME Timing of least active 5 hours
M10TIME Timing of most active 10 hours
RA Relative amplitude: (M10-L5)/(M10+L5) where M10= most active 10 hrs, L5 = least active 5 hour.
IV Intra-daily variability measures fragmentation in the rest/activity rhythms.
IS Inter-daily stability measures fragmentation in the rest/activity rhythms between different days.
IV_intradailyvariability Intra-daily variability measures fragmentation in the rest/activity rhythms.
IS_interdailystability Inter-daily stability measures fragmentation in the rest/activity rhythms between different days.
mesor MESOR which is short for midline statistics of rhythm, which is a rhythm adjusted mean. This represents mean activity level.
amp Amplitude, a measure of half the extend of predictable variation within a cycle. This represents the highest activity one can achieve.
acro Acrophase, a measure of the time of the overall high values recurring in each cycle. Here it has a unit of radian. This represents time to reach the peak.
mesor_L Mesor in the cosinor model by using log-transformed data.
amp_L Amplitude log-transformed in the cosinor model by using log-transformed data.
acro_L Acrophase log-transformed in the cosinor model by using log-transformed data.
PC1 1st principal component score from functional principal component analysis (FPCA).
PC2 2nd principal component score from functional principal component analysis (FPCA).
PC3 3rd principal component score from functional principal component analysis (FPCA).
PC4 4th principal component score from functional principal component analysis (FPCA).
PC5 5th principal component score from functional principal component analysis (FPCA).
PC6 6th principal component score from functional principal component analysis (FPCA).
PC7 7th principal component score from functional principal component analysis (FPCA).
PC8 8th principal component score from functional principal component analysis (FPCA).
PC9 9th principal component score from functional principal component analysis (FPCA).
PC10 10th principal component score from functional principal component analysis (FPCA).

# Reference

• Wei Guo, Vadim Zipunnikov, Andrew Leroux, PhD, Kathleen R Merikangas (2021) postGGIR: An Open-Source R/R-Markdown Package for Accelerometer Data Processing after Running GGIR. (Manuscript)