percent of missing data in df r
To calculate the percentage of missing data in a dataframe in R, you can follow these steps:
Use the
is.na()
function to identify missing values in the dataframe. This function returns a logical matrix with the same dimensions as the input dataframe, whereTRUE
represents missing values.Use the
colSums()
function to calculate the sum of missing values for each column in the dataframe. This function sums up the logical values (TRUE
andFALSE
) across columns, whereTRUE
is treated as 1 andFALSE
as 0.Divide the resulting sums by the number of rows in the dataframe and multiply by 100 to get the percentage of missing data for each column.
Here's an example code snippet that demonstrates these steps:
# Example dataframe
df <- data.frame(
col1 = c(1, NA, 3, 4),
col2 = c(NA, NA, 2, NA),
col3 = c(5, 6, NA, NA)
)
# Step 1: Identify missing values
missing_values <- is.na(df)
# Step 2: Calculate column sums
missing_sums <- colSums(missing_values)
# Step 3: Calculate percentage of missing data
percentage_missing <- (missing_sums / nrow(df)) * 100
# Print the result
percentage_missing
This will give you the percentage of missing data for each column in the dataframe.