apply - R - count at each row the number of columns in the "row neighbourhood" that contain only NA -
how create data frame vector gives each row number of columns "na" (or custom value) in row , n rows above , m rows below.
so if m = n = 1 (i.e. how many columns in each row na , have na before , after) , dataframe is
structure(list(x = 1:8, = c(3l, na, 10l, na, 6l, na, 5l, na ), b = c(6l, na, na, na, 8l, na, 13l, na), c = c(na, 12l, 14l,   na, na, na, 9l, na), d = c(na, na, na, na, na, 11l, 7l, na)), .names = c("x",  "a", "b", "c", "d"), class = "data.frame", row.names = c(na,  -8l))   i.e.
 t x   b  c  d 1 1  3  6 na na  2 2 na na 12 na  3 3 10 na 14 na  4 4 na na na na 5 5  6  8 na na 6 6 na na na 11 7 7  5 13  9  7 8 8 na na na na   i want vector
count 0 1 2 1 1 0 0 0   (if first , last entries na's that's fine). i'm trying mimic countifs function in excel, i.e. countifs(b2:f2,"",b3:f3,"",b4:f4,"") row 3.    
i think mean.
suppose dataframe called x.
first, each (row, column) in x, need see if there na in cell, , na in same column n rows before , m rows after.
first, let's in case of single row, row i = 2 say. have n = 1 , m = 1 (from example in question).
i <- 2 n <- 1 m <- 1   let's count number of nas in each column rows i - n i + m inclusive (is.na returns true if current value na, colsums gives column sums)
y <- colsums(is.na(x[(i - n):(i + m), ])) # x b c d  # 0 1 2 1 3    now have na in previous, current, , next row if counted 3 nas (i.e. column d qualifies here):
y == n + m + 1 #     x         b     c     d  # false false false false  true   so number of columns satisfy our criteria (hence ith element of output) is:
sum(y == n + m + 1) # 1    we can use sapply apply on each row:
countifs <- function (df, n, m) {     sapply(1:nrow(df),            function (i) {                nrows <- nrow(df)                startrow <- max(i - n, 1)                endrow   <- min(i + m, nrows)                y <- colsums(is.na(x[startrow:endrow, ]))                sum(y == n + m + 1)            }) }  countifs(x, 1, 1) # [1] 0 1 2 1 1 0 0 0   you mentioned might want compare custom value rather na. in case, instead of doing is.na(x[...]), can x[...] == value (but not if value na, in use is.na)
also, save bit of work using sapply on rows n + 1 nrow(df) - m - 1 , setting first n , last m elements 0 automatically.
Comments
Post a Comment