Replacing zeroes with NA for values preceding non-zero

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
6
down vote

favorite












I'm new to R and have been struggling with the following for a while now so I was hoping someone would be able to help me out.



The sample data represents stock price returns (each row is a monthly period). The real data set is much bigger and is structured like the input below:



Input:



stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- cbind(stock1,stock2,stock3,stock4)

stock1 stock2 stock3 stock4
[1,] 0.01 0.00 0.00 0.00
[2,] -0.02 0.00 0.00 -0.02
[3,] 0.01 0.02 0.02 0.01
[4,] 0.05 0.04 0.00 0.00
[5,] 0.04 -0.03 -0.01 0.00
[6,] -0.02 0.02 0.03 -0.02


Any zeroes that precedes a non-zero for a given stock represents missing data as opposed to a return of zero for the period. I would like to set these values as NA so the output I would like to achieve is the following:



Desired Output:



stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(NA, NA, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(NA, NA, 0.02, 0, -0.01, 0.03)
stock4 <- c(NA, -0.02, 0.01, 0, 0, -0.02)
df <- cbind(stock1,stock2,stock3,stock4)

stock1 stock2 stock3 stock4
[1,] 0.01 NA NA NA
[2,] -0.02 NA NA -0.02
[3,] 0.01 0.02 0.02 0.01
[4,] 0.05 0.04 0.00 0.00
[5,] 0.04 -0.03 -0.01 0.00
[6,] -0.02 0.02 0.03 -0.02


I've tried a few things but they only seem to work for a single vector as opposed to a data set with multiple columns. I've tried using lapply to get around this but haven't had any luck so far. The closest I've gotten is shown below.



My single vector solution:



stock1[1:min(which(stock1!=0))-1 <- NA


My multiple vector solution which does not work:



lapply(df,function(x) x[1:min(which(x!=0))-1 <- NA]


Would greatly appreciate any guidance! Thanks!







share|improve this question


















  • 1




    Is it so that only the first, leading zeros should be changed? That means, if e.g. stock1 <- c(0.01, -0.02, 0.01, 0, 0, -0.02) you also want to keep them as 0, although there are two consecutive zeros? In your example, you have only single 0 in other places but not two consecutive ones.
    – Daniel Fischer
    Aug 14 at 6:26















up vote
6
down vote

favorite












I'm new to R and have been struggling with the following for a while now so I was hoping someone would be able to help me out.



The sample data represents stock price returns (each row is a monthly period). The real data set is much bigger and is structured like the input below:



Input:



stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- cbind(stock1,stock2,stock3,stock4)

stock1 stock2 stock3 stock4
[1,] 0.01 0.00 0.00 0.00
[2,] -0.02 0.00 0.00 -0.02
[3,] 0.01 0.02 0.02 0.01
[4,] 0.05 0.04 0.00 0.00
[5,] 0.04 -0.03 -0.01 0.00
[6,] -0.02 0.02 0.03 -0.02


Any zeroes that precedes a non-zero for a given stock represents missing data as opposed to a return of zero for the period. I would like to set these values as NA so the output I would like to achieve is the following:



Desired Output:



stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(NA, NA, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(NA, NA, 0.02, 0, -0.01, 0.03)
stock4 <- c(NA, -0.02, 0.01, 0, 0, -0.02)
df <- cbind(stock1,stock2,stock3,stock4)

stock1 stock2 stock3 stock4
[1,] 0.01 NA NA NA
[2,] -0.02 NA NA -0.02
[3,] 0.01 0.02 0.02 0.01
[4,] 0.05 0.04 0.00 0.00
[5,] 0.04 -0.03 -0.01 0.00
[6,] -0.02 0.02 0.03 -0.02


I've tried a few things but they only seem to work for a single vector as opposed to a data set with multiple columns. I've tried using lapply to get around this but haven't had any luck so far. The closest I've gotten is shown below.



My single vector solution:



stock1[1:min(which(stock1!=0))-1 <- NA


My multiple vector solution which does not work:



lapply(df,function(x) x[1:min(which(x!=0))-1 <- NA]


Would greatly appreciate any guidance! Thanks!







share|improve this question


















  • 1




    Is it so that only the first, leading zeros should be changed? That means, if e.g. stock1 <- c(0.01, -0.02, 0.01, 0, 0, -0.02) you also want to keep them as 0, although there are two consecutive zeros? In your example, you have only single 0 in other places but not two consecutive ones.
    – Daniel Fischer
    Aug 14 at 6:26













up vote
6
down vote

favorite









up vote
6
down vote

favorite











I'm new to R and have been struggling with the following for a while now so I was hoping someone would be able to help me out.



The sample data represents stock price returns (each row is a monthly period). The real data set is much bigger and is structured like the input below:



Input:



stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- cbind(stock1,stock2,stock3,stock4)

stock1 stock2 stock3 stock4
[1,] 0.01 0.00 0.00 0.00
[2,] -0.02 0.00 0.00 -0.02
[3,] 0.01 0.02 0.02 0.01
[4,] 0.05 0.04 0.00 0.00
[5,] 0.04 -0.03 -0.01 0.00
[6,] -0.02 0.02 0.03 -0.02


Any zeroes that precedes a non-zero for a given stock represents missing data as opposed to a return of zero for the period. I would like to set these values as NA so the output I would like to achieve is the following:



Desired Output:



stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(NA, NA, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(NA, NA, 0.02, 0, -0.01, 0.03)
stock4 <- c(NA, -0.02, 0.01, 0, 0, -0.02)
df <- cbind(stock1,stock2,stock3,stock4)

stock1 stock2 stock3 stock4
[1,] 0.01 NA NA NA
[2,] -0.02 NA NA -0.02
[3,] 0.01 0.02 0.02 0.01
[4,] 0.05 0.04 0.00 0.00
[5,] 0.04 -0.03 -0.01 0.00
[6,] -0.02 0.02 0.03 -0.02


I've tried a few things but they only seem to work for a single vector as opposed to a data set with multiple columns. I've tried using lapply to get around this but haven't had any luck so far. The closest I've gotten is shown below.



My single vector solution:



stock1[1:min(which(stock1!=0))-1 <- NA


My multiple vector solution which does not work:



lapply(df,function(x) x[1:min(which(x!=0))-1 <- NA]


Would greatly appreciate any guidance! Thanks!







share|improve this question














I'm new to R and have been struggling with the following for a while now so I was hoping someone would be able to help me out.



The sample data represents stock price returns (each row is a monthly period). The real data set is much bigger and is structured like the input below:



Input:



stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- cbind(stock1,stock2,stock3,stock4)

stock1 stock2 stock3 stock4
[1,] 0.01 0.00 0.00 0.00
[2,] -0.02 0.00 0.00 -0.02
[3,] 0.01 0.02 0.02 0.01
[4,] 0.05 0.04 0.00 0.00
[5,] 0.04 -0.03 -0.01 0.00
[6,] -0.02 0.02 0.03 -0.02


Any zeroes that precedes a non-zero for a given stock represents missing data as opposed to a return of zero for the period. I would like to set these values as NA so the output I would like to achieve is the following:



Desired Output:



stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(NA, NA, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(NA, NA, 0.02, 0, -0.01, 0.03)
stock4 <- c(NA, -0.02, 0.01, 0, 0, -0.02)
df <- cbind(stock1,stock2,stock3,stock4)

stock1 stock2 stock3 stock4
[1,] 0.01 NA NA NA
[2,] -0.02 NA NA -0.02
[3,] 0.01 0.02 0.02 0.01
[4,] 0.05 0.04 0.00 0.00
[5,] 0.04 -0.03 -0.01 0.00
[6,] -0.02 0.02 0.03 -0.02


I've tried a few things but they only seem to work for a single vector as opposed to a data set with multiple columns. I've tried using lapply to get around this but haven't had any luck so far. The closest I've gotten is shown below.



My single vector solution:



stock1[1:min(which(stock1!=0))-1 <- NA


My multiple vector solution which does not work:



lapply(df,function(x) x[1:min(which(x!=0))-1 <- NA]


Would greatly appreciate any guidance! Thanks!









share|improve this question













share|improve this question




share|improve this question








edited Aug 14 at 6:17









Ronak Shah

21.4k83451




21.4k83451










asked Aug 14 at 5:24









bubs7

334




334







  • 1




    Is it so that only the first, leading zeros should be changed? That means, if e.g. stock1 <- c(0.01, -0.02, 0.01, 0, 0, -0.02) you also want to keep them as 0, although there are two consecutive zeros? In your example, you have only single 0 in other places but not two consecutive ones.
    – Daniel Fischer
    Aug 14 at 6:26













  • 1




    Is it so that only the first, leading zeros should be changed? That means, if e.g. stock1 <- c(0.01, -0.02, 0.01, 0, 0, -0.02) you also want to keep them as 0, although there are two consecutive zeros? In your example, you have only single 0 in other places but not two consecutive ones.
    – Daniel Fischer
    Aug 14 at 6:26








1




1




Is it so that only the first, leading zeros should be changed? That means, if e.g. stock1 <- c(0.01, -0.02, 0.01, 0, 0, -0.02) you also want to keep them as 0, although there are two consecutive zeros? In your example, you have only single 0 in other places but not two consecutive ones.
– Daniel Fischer
Aug 14 at 6:26





Is it so that only the first, leading zeros should be changed? That means, if e.g. stock1 <- c(0.01, -0.02, 0.01, 0, 0, -0.02) you also want to keep them as 0, although there are two consecutive zeros? In your example, you have only single 0 in other places but not two consecutive ones.
– Daniel Fischer
Aug 14 at 6:26













3 Answers
3






active

oldest

votes

















up vote
7
down vote



accepted










There are three issues. First, writing:



df <- cbind(stock1,stock2,stock3,stock4)


doesn't create a data frame. It creates a matrix. This is an issue when you try to use lapply, which will operate over the columns of a data frame but over the elements of a matrix. Instead, you should write:



df <- data.frame(stock1,stock2,stock3,stock4)


Second, the function you're using in lapply needs to return the modified vector. Otherwise, the return value will be something unexpected (in this case, the assignment will return a single NA, and the lapply will return a data frame of one row of NAs instead of the data frame you want).



Third, you need to take care with 1:n when n can be zero (i.e., when the first stock quote is non-zero) because 1:0 gives the sequence c(1,0) instead of an empty sequence. (This is arguably one of R's stupidest features.)



Therefore, the following will give you what you want:



stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(stock1,stock2,stock3,stock4)

as.data.frame(lapply(df, function(x)
n <- min(which(x != 0)) - 1
if (n > 0)
x[1:n] <- NA
x
))


The output is as expected:



 stock1 stock2 stock3 stock4
1 0.01 NA NA NA
2 -0.02 NA NA -0.02
3 0.01 0.02 0.02 0.01
4 0.05 0.04 0.00 0.00
5 0.04 -0.03 -0.01 0.00
6 -0.02 0.02 0.03 -0.02


Update: As @Daniel_Fischer notes, there's a clever trick to avoid the 1:0 problem. You can instead write:



as.data.frame(lapply(df, function(x) 
n <- min(which(x != 0)) - 1
x[0:n] <- NA # use 0:n instead of 1:n
x
))


This takes advantage of the fact that R ignores zeros in this type of indexing operation, so:



x[0:0] <- NA # same as x[0] <- NA and does nothing
x[0:1] <- NA # same as x[1] <- NA
x[0:2] <- NA # same as x[1:2] <- NA, etc.





share|improve this answer


















  • 1




    Oh, I see that @Daniel_Fischer has a nice workaround for the n > 0 issue. If you do x[0:n] <- NA, that will work whether n is zero or non-zero, so you can skip the if statement, too.
    – K. A. Buhr
    Aug 14 at 6:09






  • 1




    Thanks so much, really appreciate the help and prompt response! I'm still trying to get my head around the subtle but important differences between various "data structures" so thanks for pointing out the data.frame and as.data.frame functions too!
    – bubs7
    Aug 14 at 6:46

















up vote
4
down vote













This might be not the most elegant way, but I think it works



changeValues <- function(x)
place <- min(which(diff(c(0,cumsum(x==0)))==0))-1;
x[0:place] <- NA
x


apply(df,2,changeValues)


EDIT: Some brief explanation to the function: First I create a vector that increases at each position where is a zero in your column, then I check at which position this vector does not increase (=that means, there are not two zeros next to each other) and then I still take the minimum of that and make sure that these are only leading zeros (so that not values from within the matrix are changed)






share|improve this answer




















  • Okay, it was too early here, so my answer is certainly over-complicating things and min(which(x!=0))-1 is the shorter way to get the place...
    – Daniel Fischer
    Aug 14 at 6:22

















up vote
3
down vote













stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(stock1,stock2,stock3,stock4) #the following function only works if df is actually a data.frame

df <- lapply(df, function(x) ifelse(cumsum(x) == 0 & x == 0, NA, x))

df

stock1 stock2 stock3 stock4
1 0.01 NA NA NA
2 -0.02 NA NA -0.02
3 0.01 0.02 0.02 0.01
4 0.05 0.04 0.00 0.00
5 0.04 -0.03 -0.01 0.00
6 -0.02 0.02 0.03 -0.02


Some explanation: first check for each cell whether the cumulative colSum ánd the current cell are equal to 0. If so, return NA, else the original value. The brackets behind df make sure the lapply function returns a dataframe again that is assigned to df.



Also, if you don't really need df to be a dataframe, this works as well:



df <- cbind(stock1,stock2,stock3,stock4)
apply(df, 2, function(x) ifelse(cumsum(x) == 0 & x == 0, NA, x))





share|improve this answer






















    Your Answer





    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51834220%2freplacing-zeroes-with-na-for-values-preceding-non-zero%23new-answer', 'question_page');

    );

    Post as a guest






























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    7
    down vote



    accepted










    There are three issues. First, writing:



    df <- cbind(stock1,stock2,stock3,stock4)


    doesn't create a data frame. It creates a matrix. This is an issue when you try to use lapply, which will operate over the columns of a data frame but over the elements of a matrix. Instead, you should write:



    df <- data.frame(stock1,stock2,stock3,stock4)


    Second, the function you're using in lapply needs to return the modified vector. Otherwise, the return value will be something unexpected (in this case, the assignment will return a single NA, and the lapply will return a data frame of one row of NAs instead of the data frame you want).



    Third, you need to take care with 1:n when n can be zero (i.e., when the first stock quote is non-zero) because 1:0 gives the sequence c(1,0) instead of an empty sequence. (This is arguably one of R's stupidest features.)



    Therefore, the following will give you what you want:



    stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
    stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
    stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
    stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
    df <- data.frame(stock1,stock2,stock3,stock4)

    as.data.frame(lapply(df, function(x)
    n <- min(which(x != 0)) - 1
    if (n > 0)
    x[1:n] <- NA
    x
    ))


    The output is as expected:



     stock1 stock2 stock3 stock4
    1 0.01 NA NA NA
    2 -0.02 NA NA -0.02
    3 0.01 0.02 0.02 0.01
    4 0.05 0.04 0.00 0.00
    5 0.04 -0.03 -0.01 0.00
    6 -0.02 0.02 0.03 -0.02


    Update: As @Daniel_Fischer notes, there's a clever trick to avoid the 1:0 problem. You can instead write:



    as.data.frame(lapply(df, function(x) 
    n <- min(which(x != 0)) - 1
    x[0:n] <- NA # use 0:n instead of 1:n
    x
    ))


    This takes advantage of the fact that R ignores zeros in this type of indexing operation, so:



    x[0:0] <- NA # same as x[0] <- NA and does nothing
    x[0:1] <- NA # same as x[1] <- NA
    x[0:2] <- NA # same as x[1:2] <- NA, etc.





    share|improve this answer


















    • 1




      Oh, I see that @Daniel_Fischer has a nice workaround for the n > 0 issue. If you do x[0:n] <- NA, that will work whether n is zero or non-zero, so you can skip the if statement, too.
      – K. A. Buhr
      Aug 14 at 6:09






    • 1




      Thanks so much, really appreciate the help and prompt response! I'm still trying to get my head around the subtle but important differences between various "data structures" so thanks for pointing out the data.frame and as.data.frame functions too!
      – bubs7
      Aug 14 at 6:46














    up vote
    7
    down vote



    accepted










    There are three issues. First, writing:



    df <- cbind(stock1,stock2,stock3,stock4)


    doesn't create a data frame. It creates a matrix. This is an issue when you try to use lapply, which will operate over the columns of a data frame but over the elements of a matrix. Instead, you should write:



    df <- data.frame(stock1,stock2,stock3,stock4)


    Second, the function you're using in lapply needs to return the modified vector. Otherwise, the return value will be something unexpected (in this case, the assignment will return a single NA, and the lapply will return a data frame of one row of NAs instead of the data frame you want).



    Third, you need to take care with 1:n when n can be zero (i.e., when the first stock quote is non-zero) because 1:0 gives the sequence c(1,0) instead of an empty sequence. (This is arguably one of R's stupidest features.)



    Therefore, the following will give you what you want:



    stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
    stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
    stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
    stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
    df <- data.frame(stock1,stock2,stock3,stock4)

    as.data.frame(lapply(df, function(x)
    n <- min(which(x != 0)) - 1
    if (n > 0)
    x[1:n] <- NA
    x
    ))


    The output is as expected:



     stock1 stock2 stock3 stock4
    1 0.01 NA NA NA
    2 -0.02 NA NA -0.02
    3 0.01 0.02 0.02 0.01
    4 0.05 0.04 0.00 0.00
    5 0.04 -0.03 -0.01 0.00
    6 -0.02 0.02 0.03 -0.02


    Update: As @Daniel_Fischer notes, there's a clever trick to avoid the 1:0 problem. You can instead write:



    as.data.frame(lapply(df, function(x) 
    n <- min(which(x != 0)) - 1
    x[0:n] <- NA # use 0:n instead of 1:n
    x
    ))


    This takes advantage of the fact that R ignores zeros in this type of indexing operation, so:



    x[0:0] <- NA # same as x[0] <- NA and does nothing
    x[0:1] <- NA # same as x[1] <- NA
    x[0:2] <- NA # same as x[1:2] <- NA, etc.





    share|improve this answer


















    • 1




      Oh, I see that @Daniel_Fischer has a nice workaround for the n > 0 issue. If you do x[0:n] <- NA, that will work whether n is zero or non-zero, so you can skip the if statement, too.
      – K. A. Buhr
      Aug 14 at 6:09






    • 1




      Thanks so much, really appreciate the help and prompt response! I'm still trying to get my head around the subtle but important differences between various "data structures" so thanks for pointing out the data.frame and as.data.frame functions too!
      – bubs7
      Aug 14 at 6:46












    up vote
    7
    down vote



    accepted







    up vote
    7
    down vote



    accepted






    There are three issues. First, writing:



    df <- cbind(stock1,stock2,stock3,stock4)


    doesn't create a data frame. It creates a matrix. This is an issue when you try to use lapply, which will operate over the columns of a data frame but over the elements of a matrix. Instead, you should write:



    df <- data.frame(stock1,stock2,stock3,stock4)


    Second, the function you're using in lapply needs to return the modified vector. Otherwise, the return value will be something unexpected (in this case, the assignment will return a single NA, and the lapply will return a data frame of one row of NAs instead of the data frame you want).



    Third, you need to take care with 1:n when n can be zero (i.e., when the first stock quote is non-zero) because 1:0 gives the sequence c(1,0) instead of an empty sequence. (This is arguably one of R's stupidest features.)



    Therefore, the following will give you what you want:



    stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
    stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
    stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
    stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
    df <- data.frame(stock1,stock2,stock3,stock4)

    as.data.frame(lapply(df, function(x)
    n <- min(which(x != 0)) - 1
    if (n > 0)
    x[1:n] <- NA
    x
    ))


    The output is as expected:



     stock1 stock2 stock3 stock4
    1 0.01 NA NA NA
    2 -0.02 NA NA -0.02
    3 0.01 0.02 0.02 0.01
    4 0.05 0.04 0.00 0.00
    5 0.04 -0.03 -0.01 0.00
    6 -0.02 0.02 0.03 -0.02


    Update: As @Daniel_Fischer notes, there's a clever trick to avoid the 1:0 problem. You can instead write:



    as.data.frame(lapply(df, function(x) 
    n <- min(which(x != 0)) - 1
    x[0:n] <- NA # use 0:n instead of 1:n
    x
    ))


    This takes advantage of the fact that R ignores zeros in this type of indexing operation, so:



    x[0:0] <- NA # same as x[0] <- NA and does nothing
    x[0:1] <- NA # same as x[1] <- NA
    x[0:2] <- NA # same as x[1:2] <- NA, etc.





    share|improve this answer














    There are three issues. First, writing:



    df <- cbind(stock1,stock2,stock3,stock4)


    doesn't create a data frame. It creates a matrix. This is an issue when you try to use lapply, which will operate over the columns of a data frame but over the elements of a matrix. Instead, you should write:



    df <- data.frame(stock1,stock2,stock3,stock4)


    Second, the function you're using in lapply needs to return the modified vector. Otherwise, the return value will be something unexpected (in this case, the assignment will return a single NA, and the lapply will return a data frame of one row of NAs instead of the data frame you want).



    Third, you need to take care with 1:n when n can be zero (i.e., when the first stock quote is non-zero) because 1:0 gives the sequence c(1,0) instead of an empty sequence. (This is arguably one of R's stupidest features.)



    Therefore, the following will give you what you want:



    stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
    stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
    stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
    stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
    df <- data.frame(stock1,stock2,stock3,stock4)

    as.data.frame(lapply(df, function(x)
    n <- min(which(x != 0)) - 1
    if (n > 0)
    x[1:n] <- NA
    x
    ))


    The output is as expected:



     stock1 stock2 stock3 stock4
    1 0.01 NA NA NA
    2 -0.02 NA NA -0.02
    3 0.01 0.02 0.02 0.01
    4 0.05 0.04 0.00 0.00
    5 0.04 -0.03 -0.01 0.00
    6 -0.02 0.02 0.03 -0.02


    Update: As @Daniel_Fischer notes, there's a clever trick to avoid the 1:0 problem. You can instead write:



    as.data.frame(lapply(df, function(x) 
    n <- min(which(x != 0)) - 1
    x[0:n] <- NA # use 0:n instead of 1:n
    x
    ))


    This takes advantage of the fact that R ignores zeros in this type of indexing operation, so:



    x[0:0] <- NA # same as x[0] <- NA and does nothing
    x[0:1] <- NA # same as x[1] <- NA
    x[0:2] <- NA # same as x[1:2] <- NA, etc.






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Aug 14 at 6:12

























    answered Aug 14 at 6:06









    K. A. Buhr

    12.7k11235




    12.7k11235







    • 1




      Oh, I see that @Daniel_Fischer has a nice workaround for the n > 0 issue. If you do x[0:n] <- NA, that will work whether n is zero or non-zero, so you can skip the if statement, too.
      – K. A. Buhr
      Aug 14 at 6:09






    • 1




      Thanks so much, really appreciate the help and prompt response! I'm still trying to get my head around the subtle but important differences between various "data structures" so thanks for pointing out the data.frame and as.data.frame functions too!
      – bubs7
      Aug 14 at 6:46












    • 1




      Oh, I see that @Daniel_Fischer has a nice workaround for the n > 0 issue. If you do x[0:n] <- NA, that will work whether n is zero or non-zero, so you can skip the if statement, too.
      – K. A. Buhr
      Aug 14 at 6:09






    • 1




      Thanks so much, really appreciate the help and prompt response! I'm still trying to get my head around the subtle but important differences between various "data structures" so thanks for pointing out the data.frame and as.data.frame functions too!
      – bubs7
      Aug 14 at 6:46







    1




    1




    Oh, I see that @Daniel_Fischer has a nice workaround for the n > 0 issue. If you do x[0:n] <- NA, that will work whether n is zero or non-zero, so you can skip the if statement, too.
    – K. A. Buhr
    Aug 14 at 6:09




    Oh, I see that @Daniel_Fischer has a nice workaround for the n > 0 issue. If you do x[0:n] <- NA, that will work whether n is zero or non-zero, so you can skip the if statement, too.
    – K. A. Buhr
    Aug 14 at 6:09




    1




    1




    Thanks so much, really appreciate the help and prompt response! I'm still trying to get my head around the subtle but important differences between various "data structures" so thanks for pointing out the data.frame and as.data.frame functions too!
    – bubs7
    Aug 14 at 6:46




    Thanks so much, really appreciate the help and prompt response! I'm still trying to get my head around the subtle but important differences between various "data structures" so thanks for pointing out the data.frame and as.data.frame functions too!
    – bubs7
    Aug 14 at 6:46












    up vote
    4
    down vote













    This might be not the most elegant way, but I think it works



    changeValues <- function(x)
    place <- min(which(diff(c(0,cumsum(x==0)))==0))-1;
    x[0:place] <- NA
    x


    apply(df,2,changeValues)


    EDIT: Some brief explanation to the function: First I create a vector that increases at each position where is a zero in your column, then I check at which position this vector does not increase (=that means, there are not two zeros next to each other) and then I still take the minimum of that and make sure that these are only leading zeros (so that not values from within the matrix are changed)






    share|improve this answer




















    • Okay, it was too early here, so my answer is certainly over-complicating things and min(which(x!=0))-1 is the shorter way to get the place...
      – Daniel Fischer
      Aug 14 at 6:22














    up vote
    4
    down vote













    This might be not the most elegant way, but I think it works



    changeValues <- function(x)
    place <- min(which(diff(c(0,cumsum(x==0)))==0))-1;
    x[0:place] <- NA
    x


    apply(df,2,changeValues)


    EDIT: Some brief explanation to the function: First I create a vector that increases at each position where is a zero in your column, then I check at which position this vector does not increase (=that means, there are not two zeros next to each other) and then I still take the minimum of that and make sure that these are only leading zeros (so that not values from within the matrix are changed)






    share|improve this answer




















    • Okay, it was too early here, so my answer is certainly over-complicating things and min(which(x!=0))-1 is the shorter way to get the place...
      – Daniel Fischer
      Aug 14 at 6:22












    up vote
    4
    down vote










    up vote
    4
    down vote









    This might be not the most elegant way, but I think it works



    changeValues <- function(x)
    place <- min(which(diff(c(0,cumsum(x==0)))==0))-1;
    x[0:place] <- NA
    x


    apply(df,2,changeValues)


    EDIT: Some brief explanation to the function: First I create a vector that increases at each position where is a zero in your column, then I check at which position this vector does not increase (=that means, there are not two zeros next to each other) and then I still take the minimum of that and make sure that these are only leading zeros (so that not values from within the matrix are changed)






    share|improve this answer












    This might be not the most elegant way, but I think it works



    changeValues <- function(x)
    place <- min(which(diff(c(0,cumsum(x==0)))==0))-1;
    x[0:place] <- NA
    x


    apply(df,2,changeValues)


    EDIT: Some brief explanation to the function: First I create a vector that increases at each position where is a zero in your column, then I check at which position this vector does not increase (=that means, there are not two zeros next to each other) and then I still take the minimum of that and make sure that these are only leading zeros (so that not values from within the matrix are changed)







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Aug 14 at 6:06









    Daniel Fischer

    2,0071223




    2,0071223











    • Okay, it was too early here, so my answer is certainly over-complicating things and min(which(x!=0))-1 is the shorter way to get the place...
      – Daniel Fischer
      Aug 14 at 6:22
















    • Okay, it was too early here, so my answer is certainly over-complicating things and min(which(x!=0))-1 is the shorter way to get the place...
      – Daniel Fischer
      Aug 14 at 6:22















    Okay, it was too early here, so my answer is certainly over-complicating things and min(which(x!=0))-1 is the shorter way to get the place...
    – Daniel Fischer
    Aug 14 at 6:22




    Okay, it was too early here, so my answer is certainly over-complicating things and min(which(x!=0))-1 is the shorter way to get the place...
    – Daniel Fischer
    Aug 14 at 6:22










    up vote
    3
    down vote













    stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
    stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
    stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
    stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
    df <- data.frame(stock1,stock2,stock3,stock4) #the following function only works if df is actually a data.frame

    df <- lapply(df, function(x) ifelse(cumsum(x) == 0 & x == 0, NA, x))

    df

    stock1 stock2 stock3 stock4
    1 0.01 NA NA NA
    2 -0.02 NA NA -0.02
    3 0.01 0.02 0.02 0.01
    4 0.05 0.04 0.00 0.00
    5 0.04 -0.03 -0.01 0.00
    6 -0.02 0.02 0.03 -0.02


    Some explanation: first check for each cell whether the cumulative colSum ánd the current cell are equal to 0. If so, return NA, else the original value. The brackets behind df make sure the lapply function returns a dataframe again that is assigned to df.



    Also, if you don't really need df to be a dataframe, this works as well:



    df <- cbind(stock1,stock2,stock3,stock4)
    apply(df, 2, function(x) ifelse(cumsum(x) == 0 & x == 0, NA, x))





    share|improve this answer


























      up vote
      3
      down vote













      stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
      stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
      stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
      stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
      df <- data.frame(stock1,stock2,stock3,stock4) #the following function only works if df is actually a data.frame

      df <- lapply(df, function(x) ifelse(cumsum(x) == 0 & x == 0, NA, x))

      df

      stock1 stock2 stock3 stock4
      1 0.01 NA NA NA
      2 -0.02 NA NA -0.02
      3 0.01 0.02 0.02 0.01
      4 0.05 0.04 0.00 0.00
      5 0.04 -0.03 -0.01 0.00
      6 -0.02 0.02 0.03 -0.02


      Some explanation: first check for each cell whether the cumulative colSum ánd the current cell are equal to 0. If so, return NA, else the original value. The brackets behind df make sure the lapply function returns a dataframe again that is assigned to df.



      Also, if you don't really need df to be a dataframe, this works as well:



      df <- cbind(stock1,stock2,stock3,stock4)
      apply(df, 2, function(x) ifelse(cumsum(x) == 0 & x == 0, NA, x))





      share|improve this answer
























        up vote
        3
        down vote










        up vote
        3
        down vote









        stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
        stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
        stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
        stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
        df <- data.frame(stock1,stock2,stock3,stock4) #the following function only works if df is actually a data.frame

        df <- lapply(df, function(x) ifelse(cumsum(x) == 0 & x == 0, NA, x))

        df

        stock1 stock2 stock3 stock4
        1 0.01 NA NA NA
        2 -0.02 NA NA -0.02
        3 0.01 0.02 0.02 0.01
        4 0.05 0.04 0.00 0.00
        5 0.04 -0.03 -0.01 0.00
        6 -0.02 0.02 0.03 -0.02


        Some explanation: first check for each cell whether the cumulative colSum ánd the current cell are equal to 0. If so, return NA, else the original value. The brackets behind df make sure the lapply function returns a dataframe again that is assigned to df.



        Also, if you don't really need df to be a dataframe, this works as well:



        df <- cbind(stock1,stock2,stock3,stock4)
        apply(df, 2, function(x) ifelse(cumsum(x) == 0 & x == 0, NA, x))





        share|improve this answer














        stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
        stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
        stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
        stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
        df <- data.frame(stock1,stock2,stock3,stock4) #the following function only works if df is actually a data.frame

        df <- lapply(df, function(x) ifelse(cumsum(x) == 0 & x == 0, NA, x))

        df

        stock1 stock2 stock3 stock4
        1 0.01 NA NA NA
        2 -0.02 NA NA -0.02
        3 0.01 0.02 0.02 0.01
        4 0.05 0.04 0.00 0.00
        5 0.04 -0.03 -0.01 0.00
        6 -0.02 0.02 0.03 -0.02


        Some explanation: first check for each cell whether the cumulative colSum ánd the current cell are equal to 0. If so, return NA, else the original value. The brackets behind df make sure the lapply function returns a dataframe again that is assigned to df.



        Also, if you don't really need df to be a dataframe, this works as well:



        df <- cbind(stock1,stock2,stock3,stock4)
        apply(df, 2, function(x) ifelse(cumsum(x) == 0 & x == 0, NA, x))






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Aug 14 at 6:38

























        answered Aug 14 at 6:15









        Len

        3,1002316




        3,1002316






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51834220%2freplacing-zeroes-with-na-for-values-preceding-non-zero%23new-answer', 'question_page');

            );

            Post as a guest













































































            這個網誌中的熱門文章

            How to combine Bézier curves to a surface?

            Mutual Information Always Non-negative

            Why am i infinitely getting the same tweet with the Twitter Search API?