Introduction to Continuous NHANES

Deepayan Sarkar


The National Health and Nutrition Examination Survey (NHANES) is a program of the National Center for Health Statistics (NCHS), which is part of the US Centers for Disease Control and Prevention (CDC). It measures the health and nutritional status of adults and children in the United States in a series of surveys that combine interviews and physical examinations.

Although the program began in the early 1960s, its structure was changed in the 1990s. Since 1999, the program has been conducted on an ongoing basis, where a nationally representative sample of about 5,000 persons (across 15 counties) is examined each year, with public-use data released in two-year cycles. This phase of the program is referred to as continuous NHANES.

The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel. Although the details of the responses recorded vary from cycle to cycle, there is a substantial amount of consistency, making it possible to compare data across cycles. Sampling weights are provided along with demographic details for each participant; see the NHANES analytic guidelines for details.

Public-use data: web resources

NHANES makes a large volume of data available for download. However, rather than a single download, these data are made available as a number of separate SAS transport files, referred to as “data files”, for each cycle. Each such data file contains records for several related variables. A comprehensive list of data files available for download is available here, along with subsets broken up into the following “components”: Demographics, Dietary, Examination, Laboratory, and Questionnaire.

For each data file listed in these tables, a link to a Doc File (which is an HTML webpage describing the data file) and a link to a SAS transport file is provided. An additional list of limited access data files are documented here, but the corresponding data file download links are not available.

A table of variables is separately available for each component, and gives more detailed information about both the variables and the data files they are recorded in, although these tables do not provide download links directly: Demographics, Dietary, Examination, Laboratory, Questionnaire.

In addition, a search interface is also available.

For reasons not specified, NHANES releases data files as SAS transport files, and provides links to proprietary Windows-only software that can supposedly be used to convert these files to CSV files.

Public-use data: R resources

One of the goals of the Epiconductor project is to provide and document an alternative access path to NHANES data and documentation via the R ecosystem. It builds on the nhanesA R package, along with utilities such as SQL databases and docker, to enable efficient and reproducible analyses of NHANES data.

The nhanesA package

The nhanesA package provides a user-friendly interface to download and process data and documentation files from the NHANES website. To use the utilities in this package, we first need to know a few more details about how NHANES data and documentation are structured.

Each available data file, which we henceforth call an NHANES table, can be identified uniquely by a name. Generally speaking, each public-use table has a corresponding data file (a SAS transport file, with extension xpt) and a corresponding documentation file (a webpage, with extension htm). The URLs from which these files can be downloaded can usually be predicted from the table name, and the cycle it belongs to. Cycles are typically of 2-year duration, starting from 1999-2000.

Although there are exceptions, a table that is available for one cycle will typically be available for other cycles as well, with a suffix appended to the name of the table indicating the cycle. To make these details concrete, let us use the nhanesManifest() function in the nhanesA package to download the list of available tables and look at the names and URLs for the DEMO data files, which contain demographic information and sampling weights for each study participant.

library(nhanesA)
manifest <- nhanesManifest("public")
manifest <- manifest[order(manifest$Table), ]
subset(manifest, startsWith(Table, "DEMO"))
     Table                            DocURL                           DataURL     Years
354   DEMO   /Nchs/Nhanes/1999-2000/DEMO.htm   /Nchs/Nhanes/1999-2000/DEMO.XPT 1999-2000
353 DEMO_B /Nchs/Nhanes/2001-2002/DEMO_B.htm /Nchs/Nhanes/2001-2002/DEMO_B.XPT 2001-2002
352 DEMO_C /Nchs/Nhanes/2003-2004/DEMO_C.htm /Nchs/Nhanes/2003-2004/DEMO_C.XPT 2003-2004
350 DEMO_D /Nchs/Nhanes/2005-2006/DEMO_D.htm /Nchs/Nhanes/2005-2006/DEMO_D.XPT 2005-2006
351 DEMO_E /Nchs/Nhanes/2007-2008/DEMO_E.htm /Nchs/Nhanes/2007-2008/DEMO_E.XPT 2007-2008
355 DEMO_F /Nchs/Nhanes/2009-2010/DEMO_F.htm /Nchs/Nhanes/2009-2010/DEMO_F.XPT 2009-2010
356 DEMO_G /Nchs/Nhanes/2011-2012/DEMO_G.htm /Nchs/Nhanes/2011-2012/DEMO_G.XPT 2011-2012
357 DEMO_H /Nchs/Nhanes/2013-2014/DEMO_H.htm /Nchs/Nhanes/2013-2014/DEMO_H.XPT 2013-2014
358 DEMO_I /Nchs/Nhanes/2015-2016/DEMO_I.htm /Nchs/Nhanes/2015-2016/DEMO_I.XPT 2015-2016
359 DEMO_J /Nchs/Nhanes/2017-2018/DEMO_J.htm /Nchs/Nhanes/2017-2018/DEMO_J.XPT 2017-2018
            Date.Published
354 Updated September 2009
353 Updated September 2009
352 Updated September 2009
350 Updated September 2009
351         September 2009
355         September 2011
356   Updated January 2015
357           October 2015
358         September 2017
359          February 2020

The nhanesA package allows both data and documentation files to be accessed, either by specifying their URL explicitly, or simply using the table name, in which case the relevant URL is constructed from it. For example,

demo_b <- nhanesFromURL("/Nchs/Nhanes/2001-2002/DEMO_B.XPT", translated = FALSE)
demo_c <- nhanes("DEMO_C", translated = FALSE)
str(demo_b[1:10])
'data.frame':	11039 obs. of  10 variables:
 $ SEQN    : num  9966 9967 9968 9969 9970 ...
 $ SDDSRVYR: num  2 2 2 2 2 2 2 2 2 2 ...
 $ RIDSTATR: num  2 2 2 2 2 2 2 2 2 1 ...
 $ RIDEXMON: num  2 1 1 2 2 2 1 2 1 NA ...
 $ RIAGENDR: num  1 1 2 2 1 2 1 2 1 1 ...
 $ RIDAGEYR: num  39 23 84 51 16 14 44 63 13 80 ...
 $ RIDAGEMN: num  472 283 1011 612 200 ...
 $ RIDAGEEX: num  473 284 1012 612 200 ...
 $ RIDRETH1: num  3 4 3 3 2 2 3 1 4 3 ...
 $ RIDRETH2: num  1 2 1 1 5 5 1 3 2 1 ...
str(demo_c[1:10])
'data.frame':	10122 obs. of  10 variables:
 $ SEQN    : int  21030 21056 21076 21126 21217 21423 21479 21546 21597 21653 ...
 $ SDDSRVYR: num  3 3 3 3 3 3 3 3 3 3 ...
 $ RIDSTATR: num  1 1 1 1 1 1 1 1 1 1 ...
 $ RIDEXMON: num  NA NA NA NA NA NA NA NA NA NA ...
 $ RIAGENDR: num  2 1 1 1 2 1 1 2 2 2 ...
 $ RIDAGEYR: num  30 36 47 4 76 23 5 2 5 0 ...
 $ RIDAGEMN: num  361 439 573 51 922 277 70 33 60 0 ...
 $ RIDAGEEX: num  NA NA NA NA NA NA NA NA NA NA ...
 $ RIDRETH1: num  3 3 4 3 3 3 5 1 3 1 ...
 $ RIDRETH2: num  1 1 2 1 1 1 4 3 1 3 ...

The data in these files appear as numeric codes, and must be interpreted using codebooks available in the documentation files, which can be parsed as follows.

demo_b_codebook <-
    nhanesCodebookFromURL("/Nchs/Nhanes/2001-2002/DEMO_B.htm")
demo_b_codebook$RIDSTATR 
$`Variable Name:`
[1] "RIDSTATR"

$`SAS Label:`
[1] "Interview/Examination Status"

$`English Text:`
[1] "Interview and Examination Status of the Sample Person."

$`Target:`
[1] "Both males and females 0 YEARS -\r 150 YEARS"

$RIDSTATR
# A tibble: 3 × 5
  `Code or Value` `Value Description`               Count Cumulative `Skip to Item`
  <chr>           <chr>                             <int>      <int> <lgl>         
1 1               Interviewed Only                    562        562 NA            
2 2               Both Interviewed and MEC examined 10477      11039 NA            
3 .               Missing                               0      11039 NA            
demo_b_codebook$RIAGENDR
$`Variable Name:`
[1] "RIAGENDR"

$`SAS Label:`
[1] "Gender"

$`English Text:`
[1] "Gender of the sample person"

$`Target:`
[1] "Both males and females 0 YEARS -\r 150 YEARS"

$RIAGENDR
# A tibble: 3 × 5
  `Code or Value` `Value Description` Count Cumulative `Skip to Item`
  <chr>           <chr>               <int>      <int> <lgl>         
1 1               Male                 5331       5331 NA            
2 2               Female               5708      11039 NA            
3 .               Missing                 0      11039 NA            

By default, the data access step converts the raw data into more meaningful values using the corresponding codebook.

demo_c <- nhanes("DEMO_C", translated = TRUE)
str(demo_c[1:10])
'data.frame':	10122 obs. of  10 variables:
 $ SEQN    : int  21145 21538 21829 21597 21764 21767 21848 21076 21160 21217 ...
 $ SDDSRVYR: chr  "NHANES 2003-2004 Public Release" "NHANES 2003-2004 Public Release" "NHANES 2003-2004 Public Release" "NHANES 2003-2004 Public Release" ...
 $ RIDSTATR: chr  "Interviewed Only" "Interviewed Only" "Interviewed Only" "Interviewed Only" ...
 $ RIDEXMON: chr  NA NA NA NA ...
 $ RIAGENDR: chr  "Male" "Male" "Male" "Female" ...
 $ RIDRETH1: chr  "Non-Hispanic Black" "Mexican American" "Non-Hispanic White" "Non-Hispanic White" ...
 $ RIDRETH2: chr  "Non-Hispanic Black" "Mexican American" "Non-Hispanic White" "Non-Hispanic White" ...
 $ DMQMILIT: chr  "No" "No" "Yes" NA ...
 $ DMDBORN : chr  "Born in 50 US States or Washington DC" "Born in 50 US States or Washington DC" "Born in 50 US States or Washington DC" "Born in 50 US States or Washington DC" ...
 $ DMDCITZN: chr  "Citizen by birth or naturalization" "Citizen by birth or naturalization" "Citizen by birth or naturalization" "Citizen by birth or naturalization" ...

Further analysis can be performed on these resulting datasets which are regular R data frames.

Other tools that make it easier to work with these datasets by creating a local database, or creating a search interface, are described elsewhere. We conclude this document with a brief look at how frequently NHANES data files are published and / or updated, based on the information contained in the table manifest.

Frequency of NHANES data releases

Recall from above that the NHANES table manifest includes a Date.Published column. This allows us to tabulate NHANES data release dates. We expect that bulk releases of tables happen all together, generally in two year intervals, while some tables may be released or updated on a as-needed basis.

The release information (available by month of release) can be summarized by tabulating the Date.Published field:

xtabs(~ Date.Published, manifest) |> sort() |> tail(20)
Date.Published
           March 2008         December 2007             July 2010             June 2020 
                   12                    13                    13                    13 
 Updated October 2014             July 2022           August 2021         December 2018 
                   14                    15                    17                    17 
        November 2007         November 2021 Updated November 2020              May 2004 
                   17                    18                    19                    21 
            June 2002        September 2011          October 2015        September 2013 
                   34                    37                    38                    38 
       September 2017        September 2009         February 2020    Updated April 2022 
                   40                    41                    48                    59 

Parsing these dates systematically, we get

pubdate <- manifest$Date.Published
updates <- startsWith(pubdate, "Updated")
datesplit <- strsplit(pubdate, split = "[[:space:]]")
datesplit[updates] <- lapply(datesplit[updates], "[", -1)
pub_summary <-
    data.frame(updated = updates,
               year = sapply(datesplit, "[[", 2) |> as.numeric(),
               month = sapply(datesplit, "[[", 1) |> factor(levels = month.name))

Although there are a few too many months, we can plot the number of releases + updates by month as follows.

pubfreq <- xtabs(~ interaction(month, year, sep = "-") + updated, pub_summary)
npub <- rowSums(pubfreq)
npub.date <- as.Date(paste0("01", "-", names(npub)), format = "%d-%B-%Y")
xyplot(npub ~ npub.date, type = "h", grid = TRUE,
       xlab = "Month", ylab = "Number of tables published / updated") +
    latticeExtra::layer(panel.text(x[y > 30], y[y > 30],
                                   format(x[y > 30], "%Y-%m"),
                                   pos = 3, cex = 0.75))

We can also plot the release / update frequency by year as follows.

xtabs(~ year + updated, pub_summary) |>
    barchart(horizontal = FALSE, ylab = "Number of tables",
             auto.key = list(text = c("Original", "Update"), columns = 2),
             scales = list(x = list(rot = 45)))

A full table of number of releases by month is given by the following, showing that there is at least one update almost every month.

pubfreq0 <- pubfreq[rowSums(pubfreq) > 0, , drop = FALSE]
pubfreq0
                                   updated
interaction(month, year, sep = "-") FALSE TRUE
                     June-2002         34    0
                     February-2003      0    1
                     September-2003     1    0
                     January-2004       5    0
                     May-2004          21    3
                     June-2004          2    0
                     July-2004         10    1
                     September-2004     5    3
                     November-2004      2    0
                     December-2004      2    0
                     January-2005       3    1
                     February-2005      4    2
                     April-2005         1    0
                     June-2005          1    3
                     August-2005        1    0
                     October-2005       0    2
                     November-2005      7    0
                     December-2005      7    1
                     January-2006       2    0
                     February-2006      6    0
                     March-2006         3    3
                     April-2006         7    6
                     May-2006           2    2
                     June-2006          6    2
                     July-2006          8    1
                     August-2006       11    7
                     September-2006     4    1
                     November-2006      1    0
                     December-2006      4    0
                     January-2007       1    2
                     February-2007      1    0
                     March-2007         1    5
                     May-2007           1    1
                     June-2007          0    2
                     July-2007          2    3
                     August-2007        0    3
                     September-2007     0    1
                     October-2007       2    3
                     November-2007     17    7
                     December-2007     13    2
                     January-2008      12    1
                     February-2008      4    0
                     March-2008        12    2
                     April-2008        12    2
                     May-2008           4    2
                     June-2008          4    3
                     July-2008          7    0
                     August-2008        1    0
                     September-2008     2    1
                     October-2008       3    0
                     December-2008      4    0
                     January-2009       2    1
                     February-2009      1    0
                     March-2009         4    0
                     April-2009         4    0
                     May-2009           1    0
                     June-2009          1    4
                     July-2009          0    5
                     August-2009        1    1
                     September-2009    41    5
                     October-2009       3    1
                     December-2009      2    3
                     January-2010       8    0
                     February-2010      1    1
                     March-2010         5    0
                     April-2010         2    5
                     May-2010           2    5
                     June-2010          3    1
                     July-2010         13    4
                     August-2010        3    2
                     September-2010     6    1
                     October-2010       1    3
                     November-2010      2    2
                     March-2011         0    1
                     April-2011         0    3
                     June-2011          3    0
                     August-2011        2    1
                     September-2011    37    6
                     October-2011       4    1
                     November-2011      1    0
                     December-2011      5    0
                     January-2012      11    5
                     February-2012      3    1
                     March-2012         1    6
                     April-2012         6    0
                     May-2012           2    0
                     June-2012         12    0
                     July-2012          2    0
                     August-2012        4    1
                     September-2012     3    1
                     October-2012       1    0
                     November-2012      2    0
                     December-2012      1    0
                     January-2013       3    1
                     February-2013      4    1
                     March-2013         2    1
                     April-2013         2    2
                     May-2013           0    9
                     June-2013          3    3
                     July-2013          5    0
                     August-2013        0    1
                     September-2013    38    0
                     October-2013       2    2
                     November-2013      8    0
                     December-2013      1    1
                     January-2014       4    0
                     February-2014      3    2
                     March-2014         7    1
                     April-2014         1    1
                     May-2014           0    1
                     June-2014          1    0
                     July-2014          4    1
                     August-2014        1    1
                     September-2014     6    1
                     October-2014       0   14
                     November-2014      1    0
                     December-2014      4    4
                     January-2015       4    2
                     February-2015      4    6
                     March-2015         1    0
                     May-2015           0    5
                     June-2015          0    2
                     July-2015          1    0
                     August-2015        1    0
                     September-2015     1    1
                     October-2015      38    5
                     November-2015      2    0
                     December-2015      3    0
                     January-2016      10    1
                     February-2016      3    0
                     March-2016         7    3
                     April-2016         4    0
                     May-2016           3    1
                     June-2016          4    0
                     July-2016          2    0
                     August-2016        4    3
                     September-2016     5    4
                     October-2016       1    2
                     November-2016      0    1
                     December-2016      6    6
                     February-2017      3    1
                     March-2017         4    3
                     April-2017         4    0
                     June-2017          1    0
                     August-2017        1    4
                     September-2017    40    3
                     October-2017       2    0
                     December-2017      6    5
                     January-2018       1    0
                     February-2018      3    1
                     March-2018         3    0
                     April-2018         5    3
                     May-2018           3    0
                     June-2018          9    0
                     July-2018          5    0
                     September-2018     5    0
                     October-2018       4    0
                     November-2018      7    1
                     December-2018     17    0
                     January-2019       6    0
                     February-2019      4    6
                     March-2019         0    2
                     April-2019         5    1
                     May-2019           5    0
                     June-2019          1    1
                     August-2019        2    0
                     September-2019     9    0
                     October-2019       0    2
                     November-2019      3    4
                     December-2019      6    3
                     January-2020       2    1
                     February-2020     48    1
                     March-2020        11    0
                     April-2020         2    1
                     May-2020           3    0
                     June-2020         13    1
                     July-2020          5    0
                     August-2020        8    1
                     October-2020       5    0
                     November-2020      7   19
                     December-2020      3    0
                     February-2021      1    1
                     March-2021         0    1
                     April-2021         7    0
                     May-2021          10    1
                     June-2021         12    1
                     July-2021          8    0
                     August-2021       17    1
                     September-2021     9    1
                     October-2021       6    0
                     November-2021     18    7
                     December-2021      4    1
                     January-2022       2    0
                     February-2022      1    6
                     March-2022         6    0
                     April-2022         1   59
                     May-2022           1    5
                     June-2022          4    0
                     July-2022         15    9
                     August-2022        5    2
                     September-2022     8    0
                     October-2022       0    2
                     November-2022      2    0
                     December-2022      0    2
                     January-2023       1    0
                     February-2023      3    0
                     March-2023         1    0
                     April-2023         1    0
                     May-2023           2    3
                     July-2023          1    0
                     September-2023     3    2
                     October-2023       0    2
                     November-2023      2    0