Split columns in dataframe with NA

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
11
down vote

favorite
2












I have a df like this:



df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

> df
FOO
1 A|B|C
2 A|B
3 B|C
4 A
5 C


And I would like to have an output like this:



> df
X1 X2 X3
1 A B C
2 A B
3 B C
4 A
5 C


So far I tried with this example: Split column at delimiter in data frame but it is not spliting the columns without repeating values, what I get there is:



df <- data.frame(do.call('rbind', strsplit(as.character(df$FOO),'|',fixed=TRUE)))

> df
X1 X2 X3
1 A B C
2 A B A
3 B C B
4 A A A
5 C C C


And I also get this warning:




Warning message: In rbind(c("A", "B", "C"), c("A", "B"), c("B", "C"),
"A", "C") : number of columns of result is not a multiple of vector
length (arg 2)




What can I do in those cases? Preferably with baseR







share|improve this question






















  • Would the dataframe have values A, B, C... always ? I mean how do we decide which column a particular value should go?
    – Ronak Shah
    Aug 16 at 8:08











  • No, values might change but names remain the same. For example: row1 = A|B|C and row22 = B|D|F.
    – Biocrazy
    Aug 16 at 8:12










  • So if you have only these two rows how many columns would you have? 6 (A, B, C, D, E, F) or 5 (A, B, C, D, F) ?
    – Ronak Shah
    Aug 16 at 8:14










  • At the begining (in df) I have only one column with a changing number of rows. What it is inside the rows (A|B|C) always changes. Sometimes the max length can be 7 (A|B|C|D|E|F|G), others 1 (A), others 2 (A|C), and so on.
    – Biocrazy
    Aug 16 at 8:18






  • 1




    One last thing I want to tell you: When you look at df you see your column is of type factor. That's often a very unwanted thing. So you could add ,stringsAsFactors=F into the data.frame definition.
    – Andre Elrico
    Aug 16 at 8:54















up vote
11
down vote

favorite
2












I have a df like this:



df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

> df
FOO
1 A|B|C
2 A|B
3 B|C
4 A
5 C


And I would like to have an output like this:



> df
X1 X2 X3
1 A B C
2 A B
3 B C
4 A
5 C


So far I tried with this example: Split column at delimiter in data frame but it is not spliting the columns without repeating values, what I get there is:



df <- data.frame(do.call('rbind', strsplit(as.character(df$FOO),'|',fixed=TRUE)))

> df
X1 X2 X3
1 A B C
2 A B A
3 B C B
4 A A A
5 C C C


And I also get this warning:




Warning message: In rbind(c("A", "B", "C"), c("A", "B"), c("B", "C"),
"A", "C") : number of columns of result is not a multiple of vector
length (arg 2)




What can I do in those cases? Preferably with baseR







share|improve this question






















  • Would the dataframe have values A, B, C... always ? I mean how do we decide which column a particular value should go?
    – Ronak Shah
    Aug 16 at 8:08











  • No, values might change but names remain the same. For example: row1 = A|B|C and row22 = B|D|F.
    – Biocrazy
    Aug 16 at 8:12










  • So if you have only these two rows how many columns would you have? 6 (A, B, C, D, E, F) or 5 (A, B, C, D, F) ?
    – Ronak Shah
    Aug 16 at 8:14










  • At the begining (in df) I have only one column with a changing number of rows. What it is inside the rows (A|B|C) always changes. Sometimes the max length can be 7 (A|B|C|D|E|F|G), others 1 (A), others 2 (A|C), and so on.
    – Biocrazy
    Aug 16 at 8:18






  • 1




    One last thing I want to tell you: When you look at df you see your column is of type factor. That's often a very unwanted thing. So you could add ,stringsAsFactors=F into the data.frame definition.
    – Andre Elrico
    Aug 16 at 8:54













up vote
11
down vote

favorite
2









up vote
11
down vote

favorite
2






2





I have a df like this:



df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

> df
FOO
1 A|B|C
2 A|B
3 B|C
4 A
5 C


And I would like to have an output like this:



> df
X1 X2 X3
1 A B C
2 A B
3 B C
4 A
5 C


So far I tried with this example: Split column at delimiter in data frame but it is not spliting the columns without repeating values, what I get there is:



df <- data.frame(do.call('rbind', strsplit(as.character(df$FOO),'|',fixed=TRUE)))

> df
X1 X2 X3
1 A B C
2 A B A
3 B C B
4 A A A
5 C C C


And I also get this warning:




Warning message: In rbind(c("A", "B", "C"), c("A", "B"), c("B", "C"),
"A", "C") : number of columns of result is not a multiple of vector
length (arg 2)




What can I do in those cases? Preferably with baseR







share|improve this question














I have a df like this:



df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

> df
FOO
1 A|B|C
2 A|B
3 B|C
4 A
5 C


And I would like to have an output like this:



> df
X1 X2 X3
1 A B C
2 A B
3 B C
4 A
5 C


So far I tried with this example: Split column at delimiter in data frame but it is not spliting the columns without repeating values, what I get there is:



df <- data.frame(do.call('rbind', strsplit(as.character(df$FOO),'|',fixed=TRUE)))

> df
X1 X2 X3
1 A B C
2 A B A
3 B C B
4 A A A
5 C C C


And I also get this warning:




Warning message: In rbind(c("A", "B", "C"), c("A", "B"), c("B", "C"),
"A", "C") : number of columns of result is not a multiple of vector
length (arg 2)




What can I do in those cases? Preferably with baseR









share|improve this question













share|improve this question




share|improve this question








edited Aug 16 at 8:20

























asked Aug 16 at 7:58









Biocrazy

14312




14312











  • Would the dataframe have values A, B, C... always ? I mean how do we decide which column a particular value should go?
    – Ronak Shah
    Aug 16 at 8:08











  • No, values might change but names remain the same. For example: row1 = A|B|C and row22 = B|D|F.
    – Biocrazy
    Aug 16 at 8:12










  • So if you have only these two rows how many columns would you have? 6 (A, B, C, D, E, F) or 5 (A, B, C, D, F) ?
    – Ronak Shah
    Aug 16 at 8:14










  • At the begining (in df) I have only one column with a changing number of rows. What it is inside the rows (A|B|C) always changes. Sometimes the max length can be 7 (A|B|C|D|E|F|G), others 1 (A), others 2 (A|C), and so on.
    – Biocrazy
    Aug 16 at 8:18






  • 1




    One last thing I want to tell you: When you look at df you see your column is of type factor. That's often a very unwanted thing. So you could add ,stringsAsFactors=F into the data.frame definition.
    – Andre Elrico
    Aug 16 at 8:54

















  • Would the dataframe have values A, B, C... always ? I mean how do we decide which column a particular value should go?
    – Ronak Shah
    Aug 16 at 8:08











  • No, values might change but names remain the same. For example: row1 = A|B|C and row22 = B|D|F.
    – Biocrazy
    Aug 16 at 8:12










  • So if you have only these two rows how many columns would you have? 6 (A, B, C, D, E, F) or 5 (A, B, C, D, F) ?
    – Ronak Shah
    Aug 16 at 8:14










  • At the begining (in df) I have only one column with a changing number of rows. What it is inside the rows (A|B|C) always changes. Sometimes the max length can be 7 (A|B|C|D|E|F|G), others 1 (A), others 2 (A|C), and so on.
    – Biocrazy
    Aug 16 at 8:18






  • 1




    One last thing I want to tell you: When you look at df you see your column is of type factor. That's often a very unwanted thing. So you could add ,stringsAsFactors=F into the data.frame definition.
    – Andre Elrico
    Aug 16 at 8:54
















Would the dataframe have values A, B, C... always ? I mean how do we decide which column a particular value should go?
– Ronak Shah
Aug 16 at 8:08





Would the dataframe have values A, B, C... always ? I mean how do we decide which column a particular value should go?
– Ronak Shah
Aug 16 at 8:08













No, values might change but names remain the same. For example: row1 = A|B|C and row22 = B|D|F.
– Biocrazy
Aug 16 at 8:12




No, values might change but names remain the same. For example: row1 = A|B|C and row22 = B|D|F.
– Biocrazy
Aug 16 at 8:12












So if you have only these two rows how many columns would you have? 6 (A, B, C, D, E, F) or 5 (A, B, C, D, F) ?
– Ronak Shah
Aug 16 at 8:14




So if you have only these two rows how many columns would you have? 6 (A, B, C, D, E, F) or 5 (A, B, C, D, F) ?
– Ronak Shah
Aug 16 at 8:14












At the begining (in df) I have only one column with a changing number of rows. What it is inside the rows (A|B|C) always changes. Sometimes the max length can be 7 (A|B|C|D|E|F|G), others 1 (A), others 2 (A|C), and so on.
– Biocrazy
Aug 16 at 8:18




At the begining (in df) I have only one column with a changing number of rows. What it is inside the rows (A|B|C) always changes. Sometimes the max length can be 7 (A|B|C|D|E|F|G), others 1 (A), others 2 (A|C), and so on.
– Biocrazy
Aug 16 at 8:18




1




1




One last thing I want to tell you: When you look at df you see your column is of type factor. That's often a very unwanted thing. So you could add ,stringsAsFactors=F into the data.frame definition.
– Andre Elrico
Aug 16 at 8:54





One last thing I want to tell you: When you look at df you see your column is of type factor. That's often a very unwanted thing. So you could add ,stringsAsFactors=F into the data.frame definition.
– Andre Elrico
Aug 16 at 8:54













5 Answers
5






active

oldest

votes

















up vote
7
down vote



accepted










Simply do:



splt <- strsplit(as.character(df$FOO),"\|")
all_val <- sort(unique(unlist(splt)))
t(sapply(splt,function(x)all_val[!(all_val %in% x)]<-NA;all_val))


# [,1] [,2] [,3]
#[1,] "A" "B" "C"
#[2,] "A" "B" NA
#[3,] NA "B" "C"
#[4,] "A" NA NA
#[5,] NA NA "C"


data:



df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))


Please note:



My version is base:: (no libraries needed) and general:



It would also work with:



df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F'))





share|improve this answer


















  • 1




    Short, general and using base::... It works great! Thank you.
    – Biocrazy
    Aug 16 at 8:42







  • 3




    It was a well-posed question and an interesting problem!
    – Andre Elrico
    Aug 16 at 8:44







  • 1




    Please note the result is a matrix if that is problematic for you, just wrap as.data.frame() around it.
    – Andre Elrico
    Aug 16 at 8:47

















up vote
5
down vote













Overlooked that OP asked for a base R solution. Please try @AndreElrico's, @r.user.05apr's or @milan's solutions.




This can be done with cSplit_e from the splitstackshape package:



library(splitstackshape)
cSplit_e(
data = df,
split.col = "FOO",
sep = "|",
mode = "value",
type = "character",
fill = " ",
drop = TRUE
)
# FOO_A FOO_B FOO_C
#1 A B C
#2 A B
#3 B C
#4 A
#5 C



Does also work in case of the following df (see OP's comment above).



(df1 <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')))
# FOO
#1 A|B|C
#2 A|B
#3 B|C
#4 A
#5 C
#6 B|D|F

cSplit_e(df1, "FOO", "|", "value", "character", TRUE, fill = " ")
# FOO_A FOO_B FOO_C FOO_D FOO_F
#1 A B C
#2 A B
#3 B C
#4 A
#5 C
#6 B D F





share|improve this answer





























    up vote
    4
    down vote













    In base R:



    df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

    dummy <- strsplit(as.character(df$FOO), "[|]")
    want <- data.frame(values = unlist(dummy),
    ids = rep(1:length(dummy), unlist(lapply(dummy, length))),
    stringsAsFactors = FALSE)

    library(reshape2)
    want <- dcast(want, ids ~ values, value.var = "values", fill = " ")[, -1] # first col removed
    names(want) <- paste0("X", seq_along(unique(unlist(dummy))))
    want
    # X1 X2 X3
    #1 A B C
    #2 A B
    #3 B C
    #4 A
    #5 C





    share|improve this answer






















    • Thanks for your help, but why does this retrieves the values as rows? Is there a way to keep the columns of the df?
      – Biocrazy
      Aug 16 at 8:28










    • Sorry! I just saw you used a library to "reshape" the columns. I thought they were different examples :-P
      – Biocrazy
      Aug 16 at 8:30

















    up vote
    2
    down vote













    Use unique and strsplit to find all unique values (A, B and C in this case). Use grep to search for the unique values, and return the values when there's a match or character(0) otherwise. cbind the resulting characters. Use apply and ifelse to replace character(0) with NA.



    vals <- unique(unlist(sapply(a1, function(x) strsplit(x, '|', fixed = T))))

    out <- NULL
    for(i in vals)
    out <- cbind(out, as.character((lapply(df$FOO, function(x) grep(x, i, value=T)))))


    apply(out, 2, function(x) ifelse(x=="character(0)", NA, x))

    [,1] [,2] [,3]
    [1,] "A" "B" "C"
    [2,] "A" "B" NA
    [3,] NA "B" "C"
    [4,] "A" NA NA
    [5,] NA NA "C"





    share|improve this answer






















    • Let's say I have 80 letters (A, B, C, D...) then do I have to use grep for each one of the cases?
      – Biocrazy
      Aug 16 at 9:01











    • Changed it now so that you can easily apply it to many values.
      – milan
      Aug 16 at 9:06

















    up vote
    0
    down vote













    You can try a tidyverse as well



    library(tidyverse)
    df %>%
    rownames_to_column() %>%
    separate_rows(FOO, sep="[|]") %>%
    mutate(L=factor(FOO, labels = paste0("X",1:length(unique(FOO))))) %>%
    spread(L, FOO) %>%
    select(-1)
    X1 X2 X3
    1 A B C
    2 A B <NA>
    3 <NA> B C
    4 A <NA> <NA>
    5 <NA> <NA> C


    It is also generally working e.g. df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')). In addition you can set the levels e.g. B>C>A by yourself using levels = c("B", "C", "A") in the factor function in the mutate step.






    share|improve this answer






















      Your Answer





      StackExchange.ifUsing("editor", function ()
      StackExchange.using("externalEditor", function ()
      StackExchange.using("snippets", function ()
      StackExchange.snippets.init();
      );
      );
      , "code-snippets");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "1"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: false,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );








       

      draft saved


      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51872267%2fsplit-columns-in-dataframe-with-na%23new-answer', 'question_page');

      );

      Post as a guest






























      5 Answers
      5






      active

      oldest

      votes








      5 Answers
      5






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      7
      down vote



      accepted










      Simply do:



      splt <- strsplit(as.character(df$FOO),"\|")
      all_val <- sort(unique(unlist(splt)))
      t(sapply(splt,function(x)all_val[!(all_val %in% x)]<-NA;all_val))


      # [,1] [,2] [,3]
      #[1,] "A" "B" "C"
      #[2,] "A" "B" NA
      #[3,] NA "B" "C"
      #[4,] "A" NA NA
      #[5,] NA NA "C"


      data:



      df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))


      Please note:



      My version is base:: (no libraries needed) and general:



      It would also work with:



      df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F'))





      share|improve this answer


















      • 1




        Short, general and using base::... It works great! Thank you.
        – Biocrazy
        Aug 16 at 8:42







      • 3




        It was a well-posed question and an interesting problem!
        – Andre Elrico
        Aug 16 at 8:44







      • 1




        Please note the result is a matrix if that is problematic for you, just wrap as.data.frame() around it.
        – Andre Elrico
        Aug 16 at 8:47














      up vote
      7
      down vote



      accepted










      Simply do:



      splt <- strsplit(as.character(df$FOO),"\|")
      all_val <- sort(unique(unlist(splt)))
      t(sapply(splt,function(x)all_val[!(all_val %in% x)]<-NA;all_val))


      # [,1] [,2] [,3]
      #[1,] "A" "B" "C"
      #[2,] "A" "B" NA
      #[3,] NA "B" "C"
      #[4,] "A" NA NA
      #[5,] NA NA "C"


      data:



      df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))


      Please note:



      My version is base:: (no libraries needed) and general:



      It would also work with:



      df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F'))





      share|improve this answer


















      • 1




        Short, general and using base::... It works great! Thank you.
        – Biocrazy
        Aug 16 at 8:42







      • 3




        It was a well-posed question and an interesting problem!
        – Andre Elrico
        Aug 16 at 8:44







      • 1




        Please note the result is a matrix if that is problematic for you, just wrap as.data.frame() around it.
        – Andre Elrico
        Aug 16 at 8:47












      up vote
      7
      down vote



      accepted







      up vote
      7
      down vote



      accepted






      Simply do:



      splt <- strsplit(as.character(df$FOO),"\|")
      all_val <- sort(unique(unlist(splt)))
      t(sapply(splt,function(x)all_val[!(all_val %in% x)]<-NA;all_val))


      # [,1] [,2] [,3]
      #[1,] "A" "B" "C"
      #[2,] "A" "B" NA
      #[3,] NA "B" "C"
      #[4,] "A" NA NA
      #[5,] NA NA "C"


      data:



      df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))


      Please note:



      My version is base:: (no libraries needed) and general:



      It would also work with:



      df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F'))





      share|improve this answer














      Simply do:



      splt <- strsplit(as.character(df$FOO),"\|")
      all_val <- sort(unique(unlist(splt)))
      t(sapply(splt,function(x)all_val[!(all_val %in% x)]<-NA;all_val))


      # [,1] [,2] [,3]
      #[1,] "A" "B" "C"
      #[2,] "A" "B" NA
      #[3,] NA "B" "C"
      #[4,] "A" NA NA
      #[5,] NA NA "C"


      data:



      df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))


      Please note:



      My version is base:: (no libraries needed) and general:



      It would also work with:



      df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F'))






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Aug 16 at 8:38

























      answered Aug 16 at 8:32









      Andre Elrico

      2,758723




      2,758723







      • 1




        Short, general and using base::... It works great! Thank you.
        – Biocrazy
        Aug 16 at 8:42







      • 3




        It was a well-posed question and an interesting problem!
        – Andre Elrico
        Aug 16 at 8:44







      • 1




        Please note the result is a matrix if that is problematic for you, just wrap as.data.frame() around it.
        – Andre Elrico
        Aug 16 at 8:47












      • 1




        Short, general and using base::... It works great! Thank you.
        – Biocrazy
        Aug 16 at 8:42







      • 3




        It was a well-posed question and an interesting problem!
        – Andre Elrico
        Aug 16 at 8:44







      • 1




        Please note the result is a matrix if that is problematic for you, just wrap as.data.frame() around it.
        – Andre Elrico
        Aug 16 at 8:47







      1




      1




      Short, general and using base::... It works great! Thank you.
      – Biocrazy
      Aug 16 at 8:42





      Short, general and using base::... It works great! Thank you.
      – Biocrazy
      Aug 16 at 8:42





      3




      3




      It was a well-posed question and an interesting problem!
      – Andre Elrico
      Aug 16 at 8:44





      It was a well-posed question and an interesting problem!
      – Andre Elrico
      Aug 16 at 8:44





      1




      1




      Please note the result is a matrix if that is problematic for you, just wrap as.data.frame() around it.
      – Andre Elrico
      Aug 16 at 8:47




      Please note the result is a matrix if that is problematic for you, just wrap as.data.frame() around it.
      – Andre Elrico
      Aug 16 at 8:47












      up vote
      5
      down vote













      Overlooked that OP asked for a base R solution. Please try @AndreElrico's, @r.user.05apr's or @milan's solutions.




      This can be done with cSplit_e from the splitstackshape package:



      library(splitstackshape)
      cSplit_e(
      data = df,
      split.col = "FOO",
      sep = "|",
      mode = "value",
      type = "character",
      fill = " ",
      drop = TRUE
      )
      # FOO_A FOO_B FOO_C
      #1 A B C
      #2 A B
      #3 B C
      #4 A
      #5 C



      Does also work in case of the following df (see OP's comment above).



      (df1 <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')))
      # FOO
      #1 A|B|C
      #2 A|B
      #3 B|C
      #4 A
      #5 C
      #6 B|D|F

      cSplit_e(df1, "FOO", "|", "value", "character", TRUE, fill = " ")
      # FOO_A FOO_B FOO_C FOO_D FOO_F
      #1 A B C
      #2 A B
      #3 B C
      #4 A
      #5 C
      #6 B D F





      share|improve this answer


























        up vote
        5
        down vote













        Overlooked that OP asked for a base R solution. Please try @AndreElrico's, @r.user.05apr's or @milan's solutions.




        This can be done with cSplit_e from the splitstackshape package:



        library(splitstackshape)
        cSplit_e(
        data = df,
        split.col = "FOO",
        sep = "|",
        mode = "value",
        type = "character",
        fill = " ",
        drop = TRUE
        )
        # FOO_A FOO_B FOO_C
        #1 A B C
        #2 A B
        #3 B C
        #4 A
        #5 C



        Does also work in case of the following df (see OP's comment above).



        (df1 <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')))
        # FOO
        #1 A|B|C
        #2 A|B
        #3 B|C
        #4 A
        #5 C
        #6 B|D|F

        cSplit_e(df1, "FOO", "|", "value", "character", TRUE, fill = " ")
        # FOO_A FOO_B FOO_C FOO_D FOO_F
        #1 A B C
        #2 A B
        #3 B C
        #4 A
        #5 C
        #6 B D F





        share|improve this answer
























          up vote
          5
          down vote










          up vote
          5
          down vote









          Overlooked that OP asked for a base R solution. Please try @AndreElrico's, @r.user.05apr's or @milan's solutions.




          This can be done with cSplit_e from the splitstackshape package:



          library(splitstackshape)
          cSplit_e(
          data = df,
          split.col = "FOO",
          sep = "|",
          mode = "value",
          type = "character",
          fill = " ",
          drop = TRUE
          )
          # FOO_A FOO_B FOO_C
          #1 A B C
          #2 A B
          #3 B C
          #4 A
          #5 C



          Does also work in case of the following df (see OP's comment above).



          (df1 <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')))
          # FOO
          #1 A|B|C
          #2 A|B
          #3 B|C
          #4 A
          #5 C
          #6 B|D|F

          cSplit_e(df1, "FOO", "|", "value", "character", TRUE, fill = " ")
          # FOO_A FOO_B FOO_C FOO_D FOO_F
          #1 A B C
          #2 A B
          #3 B C
          #4 A
          #5 C
          #6 B D F





          share|improve this answer














          Overlooked that OP asked for a base R solution. Please try @AndreElrico's, @r.user.05apr's or @milan's solutions.




          This can be done with cSplit_e from the splitstackshape package:



          library(splitstackshape)
          cSplit_e(
          data = df,
          split.col = "FOO",
          sep = "|",
          mode = "value",
          type = "character",
          fill = " ",
          drop = TRUE
          )
          # FOO_A FOO_B FOO_C
          #1 A B C
          #2 A B
          #3 B C
          #4 A
          #5 C



          Does also work in case of the following df (see OP's comment above).



          (df1 <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')))
          # FOO
          #1 A|B|C
          #2 A|B
          #3 B|C
          #4 A
          #5 C
          #6 B|D|F

          cSplit_e(df1, "FOO", "|", "value", "character", TRUE, fill = " ")
          # FOO_A FOO_B FOO_C FOO_D FOO_F
          #1 A B C
          #2 A B
          #3 B C
          #4 A
          #5 C
          #6 B D F






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Aug 16 at 17:46

























          answered Aug 16 at 8:22









          markus

          4,971626




          4,971626




















              up vote
              4
              down vote













              In base R:



              df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

              dummy <- strsplit(as.character(df$FOO), "[|]")
              want <- data.frame(values = unlist(dummy),
              ids = rep(1:length(dummy), unlist(lapply(dummy, length))),
              stringsAsFactors = FALSE)

              library(reshape2)
              want <- dcast(want, ids ~ values, value.var = "values", fill = " ")[, -1] # first col removed
              names(want) <- paste0("X", seq_along(unique(unlist(dummy))))
              want
              # X1 X2 X3
              #1 A B C
              #2 A B
              #3 B C
              #4 A
              #5 C





              share|improve this answer






















              • Thanks for your help, but why does this retrieves the values as rows? Is there a way to keep the columns of the df?
                – Biocrazy
                Aug 16 at 8:28










              • Sorry! I just saw you used a library to "reshape" the columns. I thought they were different examples :-P
                – Biocrazy
                Aug 16 at 8:30














              up vote
              4
              down vote













              In base R:



              df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

              dummy <- strsplit(as.character(df$FOO), "[|]")
              want <- data.frame(values = unlist(dummy),
              ids = rep(1:length(dummy), unlist(lapply(dummy, length))),
              stringsAsFactors = FALSE)

              library(reshape2)
              want <- dcast(want, ids ~ values, value.var = "values", fill = " ")[, -1] # first col removed
              names(want) <- paste0("X", seq_along(unique(unlist(dummy))))
              want
              # X1 X2 X3
              #1 A B C
              #2 A B
              #3 B C
              #4 A
              #5 C





              share|improve this answer






















              • Thanks for your help, but why does this retrieves the values as rows? Is there a way to keep the columns of the df?
                – Biocrazy
                Aug 16 at 8:28










              • Sorry! I just saw you used a library to "reshape" the columns. I thought they were different examples :-P
                – Biocrazy
                Aug 16 at 8:30












              up vote
              4
              down vote










              up vote
              4
              down vote









              In base R:



              df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

              dummy <- strsplit(as.character(df$FOO), "[|]")
              want <- data.frame(values = unlist(dummy),
              ids = rep(1:length(dummy), unlist(lapply(dummy, length))),
              stringsAsFactors = FALSE)

              library(reshape2)
              want <- dcast(want, ids ~ values, value.var = "values", fill = " ")[, -1] # first col removed
              names(want) <- paste0("X", seq_along(unique(unlist(dummy))))
              want
              # X1 X2 X3
              #1 A B C
              #2 A B
              #3 B C
              #4 A
              #5 C





              share|improve this answer














              In base R:



              df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

              dummy <- strsplit(as.character(df$FOO), "[|]")
              want <- data.frame(values = unlist(dummy),
              ids = rep(1:length(dummy), unlist(lapply(dummy, length))),
              stringsAsFactors = FALSE)

              library(reshape2)
              want <- dcast(want, ids ~ values, value.var = "values", fill = " ")[, -1] # first col removed
              names(want) <- paste0("X", seq_along(unique(unlist(dummy))))
              want
              # X1 X2 X3
              #1 A B C
              #2 A B
              #3 B C
              #4 A
              #5 C






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Aug 16 at 12:34

























              answered Aug 16 at 8:24









              r.user.05apr

              1,5231619




              1,5231619











              • Thanks for your help, but why does this retrieves the values as rows? Is there a way to keep the columns of the df?
                – Biocrazy
                Aug 16 at 8:28










              • Sorry! I just saw you used a library to "reshape" the columns. I thought they were different examples :-P
                – Biocrazy
                Aug 16 at 8:30
















              • Thanks for your help, but why does this retrieves the values as rows? Is there a way to keep the columns of the df?
                – Biocrazy
                Aug 16 at 8:28










              • Sorry! I just saw you used a library to "reshape" the columns. I thought they were different examples :-P
                – Biocrazy
                Aug 16 at 8:30















              Thanks for your help, but why does this retrieves the values as rows? Is there a way to keep the columns of the df?
              – Biocrazy
              Aug 16 at 8:28




              Thanks for your help, but why does this retrieves the values as rows? Is there a way to keep the columns of the df?
              – Biocrazy
              Aug 16 at 8:28












              Sorry! I just saw you used a library to "reshape" the columns. I thought they were different examples :-P
              – Biocrazy
              Aug 16 at 8:30




              Sorry! I just saw you used a library to "reshape" the columns. I thought they were different examples :-P
              – Biocrazy
              Aug 16 at 8:30










              up vote
              2
              down vote













              Use unique and strsplit to find all unique values (A, B and C in this case). Use grep to search for the unique values, and return the values when there's a match or character(0) otherwise. cbind the resulting characters. Use apply and ifelse to replace character(0) with NA.



              vals <- unique(unlist(sapply(a1, function(x) strsplit(x, '|', fixed = T))))

              out <- NULL
              for(i in vals)
              out <- cbind(out, as.character((lapply(df$FOO, function(x) grep(x, i, value=T)))))


              apply(out, 2, function(x) ifelse(x=="character(0)", NA, x))

              [,1] [,2] [,3]
              [1,] "A" "B" "C"
              [2,] "A" "B" NA
              [3,] NA "B" "C"
              [4,] "A" NA NA
              [5,] NA NA "C"





              share|improve this answer






















              • Let's say I have 80 letters (A, B, C, D...) then do I have to use grep for each one of the cases?
                – Biocrazy
                Aug 16 at 9:01











              • Changed it now so that you can easily apply it to many values.
                – milan
                Aug 16 at 9:06














              up vote
              2
              down vote













              Use unique and strsplit to find all unique values (A, B and C in this case). Use grep to search for the unique values, and return the values when there's a match or character(0) otherwise. cbind the resulting characters. Use apply and ifelse to replace character(0) with NA.



              vals <- unique(unlist(sapply(a1, function(x) strsplit(x, '|', fixed = T))))

              out <- NULL
              for(i in vals)
              out <- cbind(out, as.character((lapply(df$FOO, function(x) grep(x, i, value=T)))))


              apply(out, 2, function(x) ifelse(x=="character(0)", NA, x))

              [,1] [,2] [,3]
              [1,] "A" "B" "C"
              [2,] "A" "B" NA
              [3,] NA "B" "C"
              [4,] "A" NA NA
              [5,] NA NA "C"





              share|improve this answer






















              • Let's say I have 80 letters (A, B, C, D...) then do I have to use grep for each one of the cases?
                – Biocrazy
                Aug 16 at 9:01











              • Changed it now so that you can easily apply it to many values.
                – milan
                Aug 16 at 9:06












              up vote
              2
              down vote










              up vote
              2
              down vote









              Use unique and strsplit to find all unique values (A, B and C in this case). Use grep to search for the unique values, and return the values when there's a match or character(0) otherwise. cbind the resulting characters. Use apply and ifelse to replace character(0) with NA.



              vals <- unique(unlist(sapply(a1, function(x) strsplit(x, '|', fixed = T))))

              out <- NULL
              for(i in vals)
              out <- cbind(out, as.character((lapply(df$FOO, function(x) grep(x, i, value=T)))))


              apply(out, 2, function(x) ifelse(x=="character(0)", NA, x))

              [,1] [,2] [,3]
              [1,] "A" "B" "C"
              [2,] "A" "B" NA
              [3,] NA "B" "C"
              [4,] "A" NA NA
              [5,] NA NA "C"





              share|improve this answer














              Use unique and strsplit to find all unique values (A, B and C in this case). Use grep to search for the unique values, and return the values when there's a match or character(0) otherwise. cbind the resulting characters. Use apply and ifelse to replace character(0) with NA.



              vals <- unique(unlist(sapply(a1, function(x) strsplit(x, '|', fixed = T))))

              out <- NULL
              for(i in vals)
              out <- cbind(out, as.character((lapply(df$FOO, function(x) grep(x, i, value=T)))))


              apply(out, 2, function(x) ifelse(x=="character(0)", NA, x))

              [,1] [,2] [,3]
              [1,] "A" "B" "C"
              [2,] "A" "B" NA
              [3,] NA "B" "C"
              [4,] "A" NA NA
              [5,] NA NA "C"






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Aug 17 at 5:59

























              answered Aug 16 at 8:38









              milan

              2,1441721




              2,1441721











              • Let's say I have 80 letters (A, B, C, D...) then do I have to use grep for each one of the cases?
                – Biocrazy
                Aug 16 at 9:01











              • Changed it now so that you can easily apply it to many values.
                – milan
                Aug 16 at 9:06
















              • Let's say I have 80 letters (A, B, C, D...) then do I have to use grep for each one of the cases?
                – Biocrazy
                Aug 16 at 9:01











              • Changed it now so that you can easily apply it to many values.
                – milan
                Aug 16 at 9:06















              Let's say I have 80 letters (A, B, C, D...) then do I have to use grep for each one of the cases?
              – Biocrazy
              Aug 16 at 9:01





              Let's say I have 80 letters (A, B, C, D...) then do I have to use grep for each one of the cases?
              – Biocrazy
              Aug 16 at 9:01













              Changed it now so that you can easily apply it to many values.
              – milan
              Aug 16 at 9:06




              Changed it now so that you can easily apply it to many values.
              – milan
              Aug 16 at 9:06










              up vote
              0
              down vote













              You can try a tidyverse as well



              library(tidyverse)
              df %>%
              rownames_to_column() %>%
              separate_rows(FOO, sep="[|]") %>%
              mutate(L=factor(FOO, labels = paste0("X",1:length(unique(FOO))))) %>%
              spread(L, FOO) %>%
              select(-1)
              X1 X2 X3
              1 A B C
              2 A B <NA>
              3 <NA> B C
              4 A <NA> <NA>
              5 <NA> <NA> C


              It is also generally working e.g. df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')). In addition you can set the levels e.g. B>C>A by yourself using levels = c("B", "C", "A") in the factor function in the mutate step.






              share|improve this answer


























                up vote
                0
                down vote













                You can try a tidyverse as well



                library(tidyverse)
                df %>%
                rownames_to_column() %>%
                separate_rows(FOO, sep="[|]") %>%
                mutate(L=factor(FOO, labels = paste0("X",1:length(unique(FOO))))) %>%
                spread(L, FOO) %>%
                select(-1)
                X1 X2 X3
                1 A B C
                2 A B <NA>
                3 <NA> B C
                4 A <NA> <NA>
                5 <NA> <NA> C


                It is also generally working e.g. df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')). In addition you can set the levels e.g. B>C>A by yourself using levels = c("B", "C", "A") in the factor function in the mutate step.






                share|improve this answer
























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  You can try a tidyverse as well



                  library(tidyverse)
                  df %>%
                  rownames_to_column() %>%
                  separate_rows(FOO, sep="[|]") %>%
                  mutate(L=factor(FOO, labels = paste0("X",1:length(unique(FOO))))) %>%
                  spread(L, FOO) %>%
                  select(-1)
                  X1 X2 X3
                  1 A B C
                  2 A B <NA>
                  3 <NA> B C
                  4 A <NA> <NA>
                  5 <NA> <NA> C


                  It is also generally working e.g. df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')). In addition you can set the levels e.g. B>C>A by yourself using levels = c("B", "C", "A") in the factor function in the mutate step.






                  share|improve this answer














                  You can try a tidyverse as well



                  library(tidyverse)
                  df %>%
                  rownames_to_column() %>%
                  separate_rows(FOO, sep="[|]") %>%
                  mutate(L=factor(FOO, labels = paste0("X",1:length(unique(FOO))))) %>%
                  spread(L, FOO) %>%
                  select(-1)
                  X1 X2 X3
                  1 A B C
                  2 A B <NA>
                  3 <NA> B C
                  4 A <NA> <NA>
                  5 <NA> <NA> C


                  It is also generally working e.g. df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')). In addition you can set the levels e.g. B>C>A by yourself using levels = c("B", "C", "A") in the factor function in the mutate step.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Aug 16 at 10:18

























                  answered Aug 16 at 9:59









                  Jimbou

                  8,58311028




                  8,58311028






















                       

                      draft saved


                      draft discarded


























                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51872267%2fsplit-columns-in-dataframe-with-na%23new-answer', 'question_page');

                      );

                      Post as a guest













































































                      這個網誌中的熱門文章

                      How to combine Bézier curves to a surface?

                      Mutual Information Always Non-negative

                      Why am i infinitely getting the same tweet with the Twitter Search API?