percent of missing data in df r

To calculate the percentage of missing data in a dataframe in R, you can follow these steps:

  1. Use the is.na() function to identify missing values in the dataframe. This function returns a logical matrix with the same dimensions as the input dataframe, where TRUE represents missing values.

  2. Use the colSums() function to calculate the sum of missing values for each column in the dataframe. This function sums up the logical values (TRUE and FALSE) across columns, where TRUE is treated as 1 and FALSE as 0.

  3. Divide the resulting sums by the number of rows in the dataframe and multiply by 100 to get the percentage of missing data for each column.

Here's an example code snippet that demonstrates these steps:

# Example dataframe
df <- data.frame(
  col1 = c(1, NA, 3, 4),
  col2 = c(NA, NA, 2, NA),
  col3 = c(5, 6, NA, NA)
)

# Step 1: Identify missing values
missing_values <- is.na(df)

# Step 2: Calculate column sums
missing_sums <- colSums(missing_values)

# Step 3: Calculate percentage of missing data
percentage_missing <- (missing_sums / nrow(df)) * 100

# Print the result
percentage_missing

This will give you the percentage of missing data for each column in the dataframe.