This function retrieves and merges covariate data from one or more NHANES data files across one or more waves of the study. Variables are merged using the NHANES unique subject identifier (SEQN).

process_covar(
  waves = c("C", "D"),
  varnames = c("SDDSRVYR", "WTMEC2YR", "WTINT2YR", "SDMVPSU", "SDMVSTRA", "RIDAGEMN",
    "RIDAGEEX", "RIDRETH1", "RIAGENDR", "BMXWT", "BMXHT", "BMXBMI", "DMDEDUC2", "ALQ101",
    "ALQ110", "ALQ120Q", "ALQ120U", "ALQ130", "SMQ020", "SMD030", "SMQ040", "MCQ220",
    "MCQ160F", "MCQ160B", "MCQ160C", "PFQ049", "PFQ054", "PFQ057", "PFQ059", "PFQ061B",
    "PFQ061C", "DIQ010"),
  localpath = NULL,
  extractAll = FALSE
)

Arguments

waves

character vector with entries of (capitalized) letter of the alphabet corresponding to the NHANES wave of interest. Defaults to a vector containing "C" and "D" corresponding to the NHANES 2003-2004 and 2005-2006 waves.

varnames

character vector indicating which column names are to be searched for. Will check all .XPT files in located in the directory specified by dataPath. If extractAll = TRUE, then this argument is effectively ignored. Defaults to variables which are required to create the processed data matrices Covariate_C and Covariate_D. If "SEQN" is not included in varnames, it will be autmatically added.

localpath

file path where covariate data are saved. Covariate data must be in .XPT format, and should be in their own folder. For example, PAXRAW_C.XPT should not be located in the folder with your covariate files. This will not cause an error, but the code will take much longer to run.

extractAll

logical argument indicating whether all columns of all .XPT files in the search path should be returned. If extractALL = TRUE, all variables from all .XPT files with Defaults to FALSE.

Value

This function will return a list with number of elements equal to the number of waves of data specified by the "waves" argument. The name of each element is Covariate_\* where \* corresponds to each element of the "waves" argument. If none of the variables listed in the "varnames" arguemnt (and/or SEQN if SEQN was not supplied to the "varnames" argument) for a particular wave are found, then the element of the returned object will be NULL. If none of the user specified variables are found, but subject identifiers (SEQN) are found, the corresponding elements will still be NULL. See the examples below for illustrations of these scenarios.

Most variables in NHANES are measured once per individual. In the event that a user requests a variable which has multiple records for a subject, this function will return the variable in matrix format, with one row per participant and number of columns equal to the number of observations per participant. This matrix is returned within each dataframe using an object with class "AsIs" (See I for details). For a concrete example, see the examples below.

Details

This function will search all .XPT files which match the NHANES naming convention associated with the character vector supplied to the "waves" argument in the specified data directory (either the "localpath" argument, or the raw NHANES data included in the rnhanesdata package). Any file which matches the relevant naming convention AND has "SEQN" as their first column name will be searched for the variables requested in the "varnames" argument.

It is recommended that if using the process_covar function to merge variables locally, that the local directory include the demographic dataset for each wave (DEMO_C.XPT and DEMO_D.XPT for the 2003-2004 and 2005-2006 waves, respectively). The reason for this is that without the demographic dataset, there is no guarantee that all participants in a wave will be included in the returned results. If the demographic datasets are not in the directory specified by localpath a warnining will be produced. In addition, it is recommended that the local directory contain only .XPT files associated with NHANES.

Examples

library("rnhanesdata") ## retrieve default variables covar_ls <- process_covar()
#> | | | 0% #> For C cohort, 32 Covariates Found of 32 specified. Missing covariates:
#>
#> | |=================================== | 50% #> For D cohort, 32 Covariates Found of 32 specified. Missing covariates:
#>
#> | |======================================================================| 100%
## re-code gender for the both the 2003-2004 and 2005-2006 waves covar_ls$Covariate_C$Gender <- factor(covar_ls$Covariate_C$RIAGENDR, levels=1:2, labels=c("Male","Female"), ordered=FALSE) covar_ls$Covariate_D$Gender <- factor(covar_ls$Covariate_D$RIAGENDR, levels=1:2, labels=c("Male","Female"), ordered=FALSE) ## check that this matches the gender information in the processed data identical(covar_ls$Covariate_C[,c("SEQN","Gender")], Covariate_C[,c("SEQN","Gender")])
#> [1] TRUE
identical(covar_ls$Covariate_D[,c("SEQN","Gender")], Covariate_D[,c("SEQN","Gender")])
#> [1] TRUE
## See the data processing package vignette ## for code to fully reproduce the processed data ## included in the package ## Example where only the participant identifer (SEQN) is found for ## the 2003-2004 and 2005-2006 waves, and no data is found for the 2007-2008 wave. covar_ls2 <- process_covar(waves=c("C","D","E"), varnames=c("ThisIsNotValid"))
#> One or more demographic files were not found in the data directory (DEMO_C.XPT, DEMO_D.XPT, DEMO_E.XPT). #> There is no guarantee all participants for a particular wave will be included in the returned object.
#> Warning:
#> | | | 0% #> No variables specified by the varnames argument was found for wave C
#>
#> | |======================= | 33% #> No variables specified by the varnames argument was found for wave D
#>
#> | |=============================================== | 67% #> No data associated with wave E was found.
#>
#> | |======================================================================| 100%
str(covar_ls2)
#> List of 3 #> $ Covariate_C: NULL #> $ Covariate_D: NULL #> $ Covariate_E: NULL
## Example of variables with possibly multiple responses per participant. ## These variables correspond to self reported physical activity types: ## PADACTIV: physical activity type (i.e. basketball, swimming, etc.) ## PADLEVEL: intensity of activity identified by PADACTIV (moderate or vigorous) ## PADTIMES: # of times activity identified by PADACTIV was done in the past 30 days ## See the codebook at https://wwwn.cdc.gov/Nchs/Nhanes/2003-2004/PAQIAF_C.htm#PADTIMES ## for additional descriptions of these variables for the 2003-2004 wave covar_ls3 <- process_covar(waves=c("C","D"), varnames=c("PADACTIV","PADLEVEL","PADTIMES"))
#> | | | 0% #> For C cohort, 3 Covariates Found of 3 specified. Missing covariates:
#>
#> #> Variables with repeated observations per subject found for the following variables: PADACTIV,PADLEVEL,PADTIMES Note that these variables will be stored with class AsIs() objects in resulting data frames. See ?I for details on AsIs class.
#>
#> | |=================================== | 50% #> For D cohort, 3 Covariates Found of 3 specified. Missing covariates:
#>
#> #> Variables with repeated observations per subject found for the following variables: PADACTIV,PADLEVEL,PADTIMES Note that these variables will be stored with class AsIs() objects in resulting data frames. See ?I for details on AsIs class.
#>
#> | |======================================================================| 100%
str(covar_ls3)
#> List of 2 #> $ Covariate_C:'data.frame': 4782 obs. of 4 variables: #> ..$ SEQN : int [1:4782] 21007 21008 21009 21010 21012 21013 21015 21016 21024 21025 ... #> ..$ PADACTIV: 'AsIs' num [1:4782, 1:20] 38 43 42 42 42 15 36 12 42 12 ... #> ..$ PADLEVEL: 'AsIs' num [1:4782, 1:20] 2 1 1 1 1 1 1 1 1 2 ... #> ..$ PADTIMES: 'AsIs' num [1:4782, 1:20] 1 13 4 30 1 9 90 13 8 21 ... #> $ Covariate_D:'data.frame': 4901 obs. of 4 variables: #> ..$ SEQN : int [1:4901] 31129 31132 31134 31136 31137 31139 31141 31143 31144 31148 ... #> ..$ PADACTIV: 'AsIs' num [1:4901, 1:19] 42 19 42 40 15 10 11 13 11 23 ... #> ..$ PADLEVEL: 'AsIs' num [1:4901, 1:19] 1 1 1 1 1 1 1 2 1 1 ... #> ..$ PADTIMES: 'AsIs' num [1:4901, 1:19] 13 20 13 13 17 30 3 30 1 2 ...