Tidyndr

R-CMD-check Codecov test coverage CRAN status

The goal of {tidyndr} is to provide a specialized, simple and easy to use functions that wrap around existing functions in R for manipulation of the NDR patient line-list file allowing the user to focus on the tasks to be completed rather than the code/formula details.

The functions presented are similar to the PEPFAR MER indicators and are currently grouped into four categories:

Installation

You can install the released version of tidyndr from CRAN with:

install.packages("tidyndr")

Or the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("stephenbalogun/tidyndr",
build_vignette = TRUE)

Usage

library(tidyndr)
#> Attaching package: 'tidyndr' 
#> A package for analysis of the front-end patient-level data from the Nigeria National Data Repository.

read_ndr

read_ndr() reads the downloaded “.csv” file into R using vroom::vroom() behind the scene and passing appropriate column types to the col_types argument. It also formats the variable names using the snakecase style.

## read from a local file path (not run)

# file_path <- system.file("extdata", "ndr_example.csv", package = "tidyndr")

# read_ndr(file_path, time_stamp = "2021-02-15")

### read line-list available on the internet
path <- "https://raw.githubusercontent.com/stephenbalogun/example_files/main/ndr_example.csv"

ndr_example <- read_ndr(path, time_stamp = "2021-02-20")
#> 
#> Three new variables created: 
#> [1] `date_lost` 
#> [2] `appointment_date 
#> [2] `current_status

Treatment Indicators

The functions included in this group are:

## Subset "TX_NEW"
tx_new(ndr_example, from = "2021-07-01", to = "2021-09-30")
#> Warning: One or more parsing issues, see `problems()` for details
#> # A tibble: 0 x 52
#> # ... with 52 variables: ip <fct>, state <fct>, lga <fct>, facility <fct>,
#> #   datim_code <fct>, sex <fct>, patient_identifier <chr>,
#> #   hospital_number <chr>, date_of_birth <date>, age_at_art_initiation <dbl>,
#> #   current_age <dbl>, art_start_date <date>, art_start_date_source <fct>,
#> #   last_drug_pickup_date <date>, last_drug_pickup_date_q1 <date>,
#> #   last_drug_pickup_date_q2 <date>, last_drug_pickup_date_q3 <date>,
#> #   last_drug_pickup_date_q4 <date>, last_regimen <fct>, ...


## Generate line-list of clients with medication refill in October 2021
ndr_example %>%
  tx_appointment(from = "2021-10-01",
                 to = "2021-10-31"
                 )
#> # A tibble: 0 x 52
#> # ... with 52 variables: ip <fct>, state <fct>, lga <fct>, facility <fct>,
#> #   datim_code <fct>, sex <fct>, patient_identifier <chr>,
#> #   hospital_number <chr>, date_of_birth <date>, age_at_art_initiation <dbl>,
#> #   current_age <dbl>, art_start_date <date>, art_start_date_source <fct>,
#> #   last_drug_pickup_date <date>, last_drug_pickup_date_q1 <date>,
#> #   last_drug_pickup_date_q2 <date>, last_drug_pickup_date_q3 <date>,
#> #   last_drug_pickup_date_q4 <date>, last_regimen <fct>, ...

## Generate list of clients who were active at the beginning of October 2021 but became inactive at the end of December 2021.
  tx_ml(new_data = ndr_example,
        from = "2021-10-01",
        to = "2021-12-31")
#> # A tibble: 0 x 52
#> # ... with 52 variables: ip <fct>, state <fct>, lga <fct>, facility <fct>,
#> #   datim_code <fct>, sex <fct>, patient_identifier <chr>,
#> #   hospital_number <chr>, date_of_birth <date>, age_at_art_initiation <dbl>,
#> #   current_age <dbl>, art_start_date <date>, art_start_date_source <fct>,
#> #   last_drug_pickup_date <date>, last_drug_pickup_date_q1 <date>,
#> #   last_drug_pickup_date_q2 <date>, last_drug_pickup_date_q3 <date>,
#> #   last_drug_pickup_date_q4 <date>, last_regimen <fct>, ...

Viral Suppression Indicators

The tx_vl_eligible(), tx_pvls_den() and the tx_pvls_num() functions come in handy when you need to generate the line-list of clients who are eligible for viral load test at a given point for a given facility/state, those who have a valid viral load result (not more than 1 year for people aged 20 years and above and not more than 6 months for paediatrics and adolescents less or equal to 19 years), and those who are virally suppressed (out of those with valid viral load results). When the sample = TRUE attribute is supplied to the tx_vl_eligible() function, it generates the line-list of only those who are due for a viral load test out of all those who are eligible.

## Generate list of clients who are eligible for VL (i.e. expected to have a documented VL result)
ndr_example %>%
  tx_vl_eligible(ref = "2021-12-31")
#> # A tibble: 16,206 x 52
#>    ip     state lga   facility datim_code sex   patient_identif~ hospital_number
#>    <fct>  <fct> <fct> <fct>    <fct>      <fct> <chr>            <chr>          
#>  1 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 2001       0001           
#>  2 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ M     State 3001       0001           
#>  3 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 1002       0001           
#>  4 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 3005       0001           
#>  5 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 2005       0001           
#>  6 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ M     State 1004       0003           
#>  7 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 3006       0002           
#>  8 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 1005       0004           
#>  9 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 3009       0004           
#> 10 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 2009       0001           
#> # ... with 16,196 more rows, and 44 more variables: date_of_birth <date>,
#> #   age_at_art_initiation <dbl>, current_age <dbl>, art_start_date <date>,
#> #   art_start_date_source <fct>, last_drug_pickup_date <date>,
#> #   last_drug_pickup_date_q1 <date>, last_drug_pickup_date_q2 <date>,
#> #   last_drug_pickup_date_q3 <date>, last_drug_pickup_date_q4 <date>,
#> #   last_regimen <fct>, last_clinic_visit_date <date>,
#> #   days_of_arv_refill <dbl>, pregnancy_status <fct>, ...

## Generate list of clients that will be expected to have a viral load test done by March 2022
ndr_example %>%
  tx_vl_eligible("2022-03-31",
                 sample = TRUE)
#> # A tibble: 16,206 x 52
#>    ip     state lga   facility datim_code sex   patient_identif~ hospital_number
#>    <fct>  <fct> <fct> <fct>    <fct>      <fct> <chr>            <chr>          
#>  1 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 2001       0001           
#>  2 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ M     State 3001       0001           
#>  3 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 1002       0001           
#>  4 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 3005       0001           
#>  5 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 2005       0001           
#>  6 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ M     State 1004       0003           
#>  7 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 3006       0002           
#>  8 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 1005       0004           
#>  9 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 3009       0004           
#> 10 IP_na~ Stat~ LGA0~ Facilit~ datim_cod~ F     State 2009       0001           
#> # ... with 16,196 more rows, and 44 more variables: date_of_birth <date>,
#> #   age_at_art_initiation <dbl>, current_age <dbl>, art_start_date <date>,
#> #   art_start_date_source <fct>, last_drug_pickup_date <date>,
#> #   last_drug_pickup_date_q1 <date>, last_drug_pickup_date_q2 <date>,
#> #   last_drug_pickup_date_q3 <date>, last_drug_pickup_date_q4 <date>,
#> #   last_regimen <fct>, last_clinic_visit_date <date>,
#> #   days_of_arv_refill <dbl>, pregnancy_status <fct>, ...

### Calculate the Viral Load Coverage as of December 2021
no_of_vl_results <- tx_pvls_den(ndr_example,
                                ref = "2021-12-31") %>%
  nrow()
no_of_vl_eligible <- tx_vl_eligible(ndr_example,
                                    ref = "2021-12-31") %>%
  nrow()

vl_coverage <- scales::percent(no_of_vl_results / no_of_vl_eligible)

print(vl_coverage)
#> [1] "1%"

For all the ‘Treatment’ and ‘Viral Suppression’ indicators (except tx_ml_outcomes(), which should be use with tx_ml()), you have control over the level of action (state or facility) by supplying to the states and/or facilities arguments the values of interest . For more than one state or facility, combine the values with the c() e.g.

## subset clients that have medication appointment in between January and March of 2021 in 
## and are also due for viral load
ndr_example %>%
  tx_appointment(from = "2022-01-01",
                 to = "2022-03-31",
                 ) %>%
  tx_vl_eligible(sample = TRUE)
#> # A tibble: 0 x 52
#> # ... with 52 variables: ip <fct>, state <fct>, lga <fct>, facility <fct>,
#> #   datim_code <fct>, sex <fct>, patient_identifier <chr>,
#> #   hospital_number <chr>, date_of_birth <date>, age_at_art_initiation <dbl>,
#> #   current_age <dbl>, art_start_date <date>, art_start_date_source <fct>,
#> #   last_drug_pickup_date <date>, last_drug_pickup_date_q1 <date>,
#> #   last_drug_pickup_date_q2 <date>, last_drug_pickup_date_q3 <date>,
#> #   last_drug_pickup_date_q4 <date>, last_regimen <fct>, ...

Summarising your Indicators

You might want to generate a summary table of all the indicators you have pulled out. The summarise_ndr() (or summarize_ndr()) allows you to do this with ease. It accepts all the line-lists you are interested in creating a summary table for, the level at which you want the summary to be created (country/ip, state or facility), and the names you want to give to each of your summary column.

## generates line-list of TX_NEW between July and December 2021
new <- tx_new(ndr_example, from = "2021-07-01", to = "202112-31")

## generates line-list of currently active clients
curr <- tx_curr(ndr_example)

## generates line-list of clients who were active at the beginning of the October but inactive at end of December 2021
ml <- tx_ml(new_data = ndr_example, from = "2021-10-01", to = "2021-12-31") 

summarise_ndr(new, curr, ml,
              level = "state",
              names = c("tx_new", "tx_curr", "tx_ml"))
#> # A tibble: 1 x 5
#>   ip    state tx_new tx_curr tx_ml
#>   <chr> <chr>  <int>   <int> <int>
#> 1 Total -          0   16207     0

The disaggregate() allows you to summarise an indicator of interest into finer details based on “current_age”, “sex” “pregnancy_status”, “art_duration”, “months_dispensed (of ARV)” or “age_sex”. These are supplied to the by parameter of the function. The default disaggregates the variable of interest at the level of “states” but can also do this at “country/ip”, “lga” or “facility” level when any of this is supplied to the level parameter.

## generates line-list of TX_NEW between July and September 2021
new_clients <- tx_new(ndr_example, from = "2021-07-01", to = "2021-09-30")

disaggregate(new_clients,
             by = "current_age", pivot_wide = FALSE)
#> # A tibble: 1 x 4
#>   ip    state current_age number
#>   <chr> <chr> <chr>        <int>
#> 1 Total -     -                0

## disaggregate 'TX_CURR' by sex

ndr_example %>%
  tx_curr() %>%
  disaggregate(by = "sex")
#> # A tibble: 4 x 5
#>   ip      state    Male Female unknown
#>   <chr>   <chr>   <int>  <int>   <int>
#> 1 IP_name State 1  1050   2505       0
#> 2 IP_name State 2  1539   3307       1
#> 3 IP_name State 3  3354   4451       0
#> 4 Total   -        5943  10263       1

Code of Conduct

Please note that the tidyndr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.