apply - R - count at each row the number of columns in the "row neighbourhood" that contain only NA -
how create data frame vector gives each row number of columns "na" (or custom value) in row , n rows above , m rows below.
so if m = n = 1 (i.e. how many columns in each row na , have na before , after) , dataframe is
structure(list(x = 1:8, = c(3l, na, 10l, na, 6l, na, 5l, na ), b = c(6l, na, na, na, 8l, na, 13l, na), c = c(na, 12l, 14l, na, na, na, 9l, na), d = c(na, na, na, na, na, 11l, 7l, na)), .names = c("x", "a", "b", "c", "d"), class = "data.frame", row.names = c(na, -8l))
i.e.
t x b c d 1 1 3 6 na na 2 2 na na 12 na 3 3 10 na 14 na 4 4 na na na na 5 5 6 8 na na 6 6 na na na 11 7 7 5 13 9 7 8 8 na na na na
i want vector
count 0 1 2 1 1 0 0 0
(if first , last entries na
's that's fine). i'm trying mimic countifs
function in excel, i.e. countifs(b2:f2,"",b3:f3,"",b4:f4,"")
row 3.
i think mean.
suppose dataframe called x
.
first, each (row
, colum
n) in x
, need see if there na
in cell, , na
in same column n
rows before , m
rows after.
first, let's in case of single row, row i = 2
say. have n = 1
, m = 1
(from example in question).
i <- 2 n <- 1 m <- 1
let's count number of nas in each column rows i - n
i + m
inclusive (is.na
returns true
if current value na, colsums
gives column sums)
y <- colsums(is.na(x[(i - n):(i + m), ])) # x b c d # 0 1 2 1 3
now have na
in previous, current, , next row if counted 3 na
s (i.e. column d qualifies here):
y == n + m + 1 # x b c d # false false false false true
so number of columns satisfy our criteria (hence i
th element of output) is:
sum(y == n + m + 1) # 1
we can use sapply
apply on each row:
countifs <- function (df, n, m) { sapply(1:nrow(df), function (i) { nrows <- nrow(df) startrow <- max(i - n, 1) endrow <- min(i + m, nrows) y <- colsums(is.na(x[startrow:endrow, ])) sum(y == n + m + 1) }) } countifs(x, 1, 1) # [1] 0 1 2 1 1 0 0 0
you mentioned might want compare custom value rather na
. in case, instead of doing is.na(x[...])
, can x[...] == value
(but not if value
na
, in use is.na
)
also, save bit of work using sapply
on rows n + 1
nrow(df) - m - 1
, setting first n
, last m
elements 0 automatically.
Comments
Post a Comment