Split columns in dataframe with NA

up vote
11
down vote

favorite

I have a df like this:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

> df
 FOO
1 A|B|C
2 A|B
3 B|C
4 A
5 C

And I would like to have an output like this:

> df
 X1 X2 X3
1 A B C
2 A B
3 B C
4 A
5 C

So far I tried with this example: Split column at delimiter in data frame but it is not spliting the columns without repeating values, what I get there is:

df <- data.frame(do.call('rbind', strsplit(as.character(df$FOO),'|',fixed=TRUE)))

> df
 X1 X2 X3
1 A B C
2 A B A
3 B C B
4 A A A
5 C C C

And I also get this warning:

Warning message: In rbind(c("A", "B", "C"), c("A", "B"), c("B", "C"),
"A", "C") : number of columns of result is not a multiple of vector
length (arg 2)

What can I do in those cases? Preferably with baseR

edited Aug 16 at 8:20

asked Aug 16 at 7:58

Biocrazy

14312

Would the dataframe have values A, B, C... always ? I mean how do we decide which column a particular value should go?
â€“Â Ronak Shah
Aug 16 at 8:08

No, values might change but names remain the same. For example: row1 = A|B|C and row22 = B|D|F.
â€“Â Biocrazy
Aug 16 at 8:12

So if you have only these two rows how many columns would you have? 6 (A, B, C, D, E, F) or 5 (A, B, C, D, F) ?
â€“Â Ronak Shah
Aug 16 at 8:14

At the begining (in df) I have only one column with a changing number of rows. What it is inside the rows (A|B|C) always changes. Sometimes the max length can be 7 (A|B|C|D|E|F|G), others 1 (A), others 2 (A|C), and so on.
â€“Â Biocrazy
Aug 16 at 8:18

1

One last thing I want to tell you: When you look at df you see your column is of type factor. That's often a very unwanted thing. So you could add ,stringsAsFactors=F into the data.frame definition.
â€“Â Andre Elrico
Aug 16 at 8:54

Â |Â
show 3 more comments

up vote
11
down vote

favorite

I have a df like this:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

> df
 FOO
1 A|B|C
2 A|B
3 B|C
4 A
5 C

And I would like to have an output like this:

> df
 X1 X2 X3
1 A B C
2 A B
3 B C
4 A
5 C

So far I tried with this example: Split column at delimiter in data frame but it is not spliting the columns without repeating values, what I get there is:

df <- data.frame(do.call('rbind', strsplit(as.character(df$FOO),'|',fixed=TRUE)))

> df
 X1 X2 X3
1 A B C
2 A B A
3 B C B
4 A A A
5 C C C

And I also get this warning:

Warning message: In rbind(c("A", "B", "C"), c("A", "B"), c("B", "C"),
"A", "C") : number of columns of result is not a multiple of vector
length (arg 2)

What can I do in those cases? Preferably with baseR

edited Aug 16 at 8:20

asked Aug 16 at 7:58

Biocrazy

14312

Would the dataframe have values A, B, C... always ? I mean how do we decide which column a particular value should go?
â€“Â Ronak Shah
Aug 16 at 8:08

No, values might change but names remain the same. For example: row1 = A|B|C and row22 = B|D|F.
â€“Â Biocrazy
Aug 16 at 8:12

So if you have only these two rows how many columns would you have? 6 (A, B, C, D, E, F) or 5 (A, B, C, D, F) ?
â€“Â Ronak Shah
Aug 16 at 8:14

At the begining (in df) I have only one column with a changing number of rows. What it is inside the rows (A|B|C) always changes. Sometimes the max length can be 7 (A|B|C|D|E|F|G), others 1 (A), others 2 (A|C), and so on.
â€“Â Biocrazy
Aug 16 at 8:18

1

One last thing I want to tell you: When you look at df you see your column is of type factor. That's often a very unwanted thing. So you could add ,stringsAsFactors=F into the data.frame definition.
â€“Â Andre Elrico
Aug 16 at 8:54

Â |Â
show 3 more comments

up vote
11
down vote

favorite

I have a df like this:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

> df
 FOO
1 A|B|C
2 A|B
3 B|C
4 A
5 C

And I would like to have an output like this:

> df
 X1 X2 X3
1 A B C
2 A B
3 B C
4 A
5 C

So far I tried with this example: Split column at delimiter in data frame but it is not spliting the columns without repeating values, what I get there is:

df <- data.frame(do.call('rbind', strsplit(as.character(df$FOO),'|',fixed=TRUE)))

> df
 X1 X2 X3
1 A B C
2 A B A
3 B C B
4 A A A
5 C C C

And I also get this warning:

Warning message: In rbind(c("A", "B", "C"), c("A", "B"), c("B", "C"),
"A", "C") : number of columns of result is not a multiple of vector
length (arg 2)

What can I do in those cases? Preferably with baseR

edited Aug 16 at 8:20

asked Aug 16 at 7:58

Biocrazy

14312

I have a df like this:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

> df
 FOO
1 A|B|C
2 A|B
3 B|C
4 A
5 C

And I would like to have an output like this:

> df
 X1 X2 X3
1 A B C
2 A B
3 B C
4 A
5 C

So far I tried with this example: Split column at delimiter in data frame but it is not spliting the columns without repeating values, what I get there is:

df <- data.frame(do.call('rbind', strsplit(as.character(df$FOO),'|',fixed=TRUE)))

> df
 X1 X2 X3
1 A B C
2 A B A
3 B C B
4 A A A
5 C C C

And I also get this warning:

Warning message: In rbind(c("A", "B", "C"), c("A", "B"), c("B", "C"),
"A", "C") : number of columns of result is not a multiple of vector
length (arg 2)

What can I do in those cases? Preferably with baseR

edited Aug 16 at 8:20

asked Aug 16 at 7:58

Biocrazy

14312

edited Aug 16 at 8:20

asked Aug 16 at 7:58

Biocrazy

14312

asked Aug 16 at 7:58

Biocrazy

14312

asked Aug 16 at 7:58

Biocrazy

14312

Would the dataframe have values A, B, C... always ? I mean how do we decide which column a particular value should go?
â€“Â Ronak Shah
Aug 16 at 8:08

No, values might change but names remain the same. For example: row1 = A|B|C and row22 = B|D|F.
â€“Â Biocrazy
Aug 16 at 8:12

So if you have only these two rows how many columns would you have? 6 (A, B, C, D, E, F) or 5 (A, B, C, D, F) ?
â€“Â Ronak Shah
Aug 16 at 8:14

At the begining (in df) I have only one column with a changing number of rows. What it is inside the rows (A|B|C) always changes. Sometimes the max length can be 7 (A|B|C|D|E|F|G), others 1 (A), others 2 (A|C), and so on.
â€“Â Biocrazy
Aug 16 at 8:18

1

One last thing I want to tell you: When you look at df you see your column is of type factor. That's often a very unwanted thing. So you could add ,stringsAsFactors=F into the data.frame definition.
â€“Â Andre Elrico
Aug 16 at 8:54

Â |Â
show 3 more comments

Would the dataframe have values A, B, C... always ? I mean how do we decide which column a particular value should go?
â€“Â Ronak Shah
Aug 16 at 8:08

No, values might change but names remain the same. For example: row1 = A|B|C and row22 = B|D|F.
â€“Â Biocrazy
Aug 16 at 8:12

So if you have only these two rows how many columns would you have? 6 (A, B, C, D, E, F) or 5 (A, B, C, D, F) ?
â€“Â Ronak Shah
Aug 16 at 8:14

At the begining (in df) I have only one column with a changing number of rows. What it is inside the rows (A|B|C) always changes. Sometimes the max length can be 7 (A|B|C|D|E|F|G), others 1 (A), others 2 (A|C), and so on.
â€“Â Biocrazy
Aug 16 at 8:18

1

One last thing I want to tell you: When you look at df you see your column is of type factor. That's often a very unwanted thing. So you could add ,stringsAsFactors=F into the data.frame definition.
â€“Â Andre Elrico
Aug 16 at 8:54

Would the dataframe have values A, B, C... always ? I mean how do we decide which column a particular value should go?
â€“Â Ronak Shah
Aug 16 at 8:08

No, values might change but names remain the same. For example: row1 = A|B|C and row22 = B|D|F.
â€“Â Biocrazy
Aug 16 at 8:12

So if you have only these two rows how many columns would you have? 6 (A, B, C, D, E, F) or 5 (A, B, C, D, F) ?
â€“Â Ronak Shah
Aug 16 at 8:14

At the begining (in df) I have only one column with a changing number of rows. What it is inside the rows (A|B|C) always changes. Sometimes the max length can be 7 (A|B|C|D|E|F|G), others 1 (A), others 2 (A|C), and so on.
â€“Â Biocrazy
Aug 16 at 8:18

One last thing I want to tell you: When you look at df you see your column is of type factor. That's often a very unwanted thing. So you could add ,stringsAsFactors=F into the data.frame definition.
â€“Â Andre Elrico
Aug 16 at 8:54

Â |Â
show 3 more comments

5 Answers
5

active

oldest

votes

up vote
7
down vote

accepted

Simply do:

splt <- strsplit(as.character(df$FOO),"\|")
all_val <- sort(unique(unlist(splt)))
t(sapply(splt,function(x)all_val[!(all_val %in% x)]<-NA;all_val))


# [,1] [,2] [,3]
#[1,] "A" "B" "C" 
#[2,] "A" "B" NA 
#[3,] NA "B" "C" 
#[4,] "A" NA NA 
#[5,] NA NA "C"

data:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

Please note:

My version is base:: (no libraries needed) and general:

It would also work with:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F'))

edited Aug 16 at 8:38

answered Aug 16 at 8:32

Andre Elrico

2,758723

1

Short, general and using base::... It works great! Thank you.
â€“Â Biocrazy
Aug 16 at 8:42

3

It was a well-posed question and an interesting problem!
â€“Â Andre Elrico
Aug 16 at 8:44

1

Please note the result is a matrix if that is problematic for you, just wrap as.data.frame() around it.
â€“Â Andre Elrico
Aug 16 at 8:47

add a commentÂ |Â

up vote
5
down vote

Overlooked that OP asked for a base R solution. Please try @AndreElrico's, @r.user.05apr's or @milan's solutions.

This can be done with cSplit_e from the splitstackshape package:

library(splitstackshape)
cSplit_e(
 data = df,
 split.col = "FOO",
 sep = "|",
 mode = "value",
 type = "character",
 fill = " ",
 drop = TRUE
)
# FOO_A FOO_B FOO_C
#1 A B C
#2 A B 
#3 B C
#4 A 
#5 C

Does also work in case of the following df (see OP's comment above).

(df1 <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')))
# FOO
#1 A|B|C
#2 A|B
#3 B|C
#4 A
#5 C
#6 B|D|F

cSplit_e(df1, "FOO", "|", "value", "character", TRUE, fill = " ")
# FOO_A FOO_B FOO_C FOO_D FOO_F
#1 A B C 
#2 A B 
#3 B C 
#4 A 
#5 C 
#6 B D F

edited Aug 16 at 17:46

answered Aug 16 at 8:22

markus

4,971626

add a commentÂ |Â

up vote
4
down vote

In base R:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

dummy <- strsplit(as.character(df$FOO), "[|]")
want <- data.frame(values = unlist(dummy),
 ids = rep(1:length(dummy), unlist(lapply(dummy, length))), 
 stringsAsFactors = FALSE)

library(reshape2)
want <- dcast(want, ids ~ values, value.var = "values", fill = " ")[, -1] # first col removed
names(want) <- paste0("X", seq_along(unique(unlist(dummy)))) 
want
# X1 X2 X3
#1 A B C
#2 A B 
#3 B C
#4 A 
#5 C

edited Aug 16 at 12:34

answered Aug 16 at 8:24

r.user.05apr

1,5231619

Thanks for your help, but why does this retrieves the values as rows? Is there a way to keep the columns of the df?
â€“Â Biocrazy
Aug 16 at 8:28

Sorry! I just saw you used a library to "reshape" the columns. I thought they were different examples :-P
â€“Â Biocrazy
Aug 16 at 8:30

add a commentÂ |Â

up vote
2
down vote

Use unique and strsplit to find all unique values (A, B and C in this case). Use grep to search for the unique values, and return the values when there's a match or character(0) otherwise. cbind the resulting characters. Use apply and ifelse to replace character(0) with NA.

vals <- unique(unlist(sapply(a1, function(x) strsplit(x, '|', fixed = T))))

out <- NULL
for(i in vals)
 out <- cbind(out, as.character((lapply(df$FOO, function(x) grep(x, i, value=T)))))


apply(out, 2, function(x) ifelse(x=="character(0)", NA, x))

 [,1] [,2] [,3]
[1,] "A" "B" "C" 
[2,] "A" "B" NA 
[3,] NA "B" "C" 
[4,] "A" NA NA 
[5,] NA NA "C"

edited Aug 17 at 5:59

answered Aug 16 at 8:38

milan

2,1441721

Let's say I have 80 letters (A, B, C, D...) then do I have to use grep for each one of the cases?
â€“Â Biocrazy
Aug 16 at 9:01

Changed it now so that you can easily apply it to many values.
â€“Â milan
Aug 16 at 9:06

add a commentÂ |Â

up vote
0
down vote

You can try a tidyverse as well

library(tidyverse)
df %>%
 rownames_to_column() %>% 
 separate_rows(FOO, sep="[|]") %>% 
 mutate(L=factor(FOO, labels = paste0("X",1:length(unique(FOO))))) %>% 
 spread(L, FOO) %>% 
 select(-1)
 X1 X2 X3
1 A B C
2 A B <NA>
3 <NA> B C
4 A <NA> <NA>
5 <NA> <NA> C

It is also generally working e.g. df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')). In addition you can set the levels e.g. B>C>A by yourself using levels = c("B", "C", "A") in the factor function in the mutate step.

edited Aug 16 at 10:18

answered Aug 16 at 9:59

Jimbou

8,58311028

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f51872267%2fsplit-columns-in-dataframe-with-na%23new-answer', 'question_page');

);

Post as a guest

Name

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

up vote
7
down vote

accepted

Simply do:

splt <- strsplit(as.character(df$FOO),"\|")
all_val <- sort(unique(unlist(splt)))
t(sapply(splt,function(x)all_val[!(all_val %in% x)]<-NA;all_val))


# [,1] [,2] [,3]
#[1,] "A" "B" "C" 
#[2,] "A" "B" NA 
#[3,] NA "B" "C" 
#[4,] "A" NA NA 
#[5,] NA NA "C"

data:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

Please note:

My version is base:: (no libraries needed) and general:

It would also work with:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F'))

edited Aug 16 at 8:38

answered Aug 16 at 8:32

Andre Elrico

2,758723

1

Short, general and using base::... It works great! Thank you.
â€“Â Biocrazy
Aug 16 at 8:42

3

It was a well-posed question and an interesting problem!
â€“Â Andre Elrico
Aug 16 at 8:44

1

Please note the result is a matrix if that is problematic for you, just wrap as.data.frame() around it.
â€“Â Andre Elrico
Aug 16 at 8:47

add a commentÂ |Â

up vote
7
down vote

accepted

Simply do:

splt <- strsplit(as.character(df$FOO),"\|")
all_val <- sort(unique(unlist(splt)))
t(sapply(splt,function(x)all_val[!(all_val %in% x)]<-NA;all_val))


# [,1] [,2] [,3]
#[1,] "A" "B" "C" 
#[2,] "A" "B" NA 
#[3,] NA "B" "C" 
#[4,] "A" NA NA 
#[5,] NA NA "C"

data:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

Please note:

My version is base:: (no libraries needed) and general:

It would also work with:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F'))

edited Aug 16 at 8:38

answered Aug 16 at 8:32

Andre Elrico

2,758723

1

Short, general and using base::... It works great! Thank you.
â€“Â Biocrazy
Aug 16 at 8:42

3

It was a well-posed question and an interesting problem!
â€“Â Andre Elrico
Aug 16 at 8:44

1

Please note the result is a matrix if that is problematic for you, just wrap as.data.frame() around it.
â€“Â Andre Elrico
Aug 16 at 8:47

add a commentÂ |Â

up vote
7
down vote

accepted

Simply do:

splt <- strsplit(as.character(df$FOO),"\|")
all_val <- sort(unique(unlist(splt)))
t(sapply(splt,function(x)all_val[!(all_val %in% x)]<-NA;all_val))


# [,1] [,2] [,3]
#[1,] "A" "B" "C" 
#[2,] "A" "B" NA 
#[3,] NA "B" "C" 
#[4,] "A" NA NA 
#[5,] NA NA "C"

data:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

Please note:

My version is base:: (no libraries needed) and general:

It would also work with:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F'))

edited Aug 16 at 8:38

answered Aug 16 at 8:32

Andre Elrico

2,758723

Simply do:

splt <- strsplit(as.character(df$FOO),"\|")
all_val <- sort(unique(unlist(splt)))
t(sapply(splt,function(x)all_val[!(all_val %in% x)]<-NA;all_val))


# [,1] [,2] [,3]
#[1,] "A" "B" "C" 
#[2,] "A" "B" NA 
#[3,] NA "B" "C" 
#[4,] "A" NA NA 
#[5,] NA NA "C"

data:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

Please note:

My version is base:: (no libraries needed) and general:

It would also work with:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F'))

edited Aug 16 at 8:38

answered Aug 16 at 8:32

Andre Elrico

2,758723

edited Aug 16 at 8:38

answered Aug 16 at 8:32

Andre Elrico

2,758723

answered Aug 16 at 8:32

Andre Elrico

2,758723

answered Aug 16 at 8:32

Andre Elrico

2,758723

1

Short, general and using base::... It works great! Thank you.
â€“Â Biocrazy
Aug 16 at 8:42

3

It was a well-posed question and an interesting problem!
â€“Â Andre Elrico
Aug 16 at 8:44

1

Please note the result is a matrix if that is problematic for you, just wrap as.data.frame() around it.
â€“Â Andre Elrico
Aug 16 at 8:47

add a commentÂ |Â

1

Short, general and using base::... It works great! Thank you.
â€“Â Biocrazy
Aug 16 at 8:42

3

It was a well-posed question and an interesting problem!
â€“Â Andre Elrico
Aug 16 at 8:44

1

Please note the result is a matrix if that is problematic for you, just wrap as.data.frame() around it.
â€“Â Andre Elrico
Aug 16 at 8:47

Short, general and using base::... It works great! Thank you.
â€“Â Biocrazy
Aug 16 at 8:42

It was a well-posed question and an interesting problem!
â€“Â Andre Elrico
Aug 16 at 8:44

Please note the result is a matrix if that is problematic for you, just wrap as.data.frame() around it.
â€“Â Andre Elrico
Aug 16 at 8:47

add a commentÂ |Â

up vote
5
down vote

Overlooked that OP asked for a base R solution. Please try @AndreElrico's, @r.user.05apr's or @milan's solutions.

This can be done with cSplit_e from the splitstackshape package:

library(splitstackshape)
cSplit_e(
 data = df,
 split.col = "FOO",
 sep = "|",
 mode = "value",
 type = "character",
 fill = " ",
 drop = TRUE
)
# FOO_A FOO_B FOO_C
#1 A B C
#2 A B 
#3 B C
#4 A 
#5 C

Does also work in case of the following df (see OP's comment above).

(df1 <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')))
# FOO
#1 A|B|C
#2 A|B
#3 B|C
#4 A
#5 C
#6 B|D|F

cSplit_e(df1, "FOO", "|", "value", "character", TRUE, fill = " ")
# FOO_A FOO_B FOO_C FOO_D FOO_F
#1 A B C 
#2 A B 
#3 B C 
#4 A 
#5 C 
#6 B D F

edited Aug 16 at 17:46

answered Aug 16 at 8:22

markus

4,971626

add a commentÂ |Â

up vote
5
down vote

Overlooked that OP asked for a base R solution. Please try @AndreElrico's, @r.user.05apr's or @milan's solutions.

This can be done with cSplit_e from the splitstackshape package:

library(splitstackshape)
cSplit_e(
 data = df,
 split.col = "FOO",
 sep = "|",
 mode = "value",
 type = "character",
 fill = " ",
 drop = TRUE
)
# FOO_A FOO_B FOO_C
#1 A B C
#2 A B 
#3 B C
#4 A 
#5 C

Does also work in case of the following df (see OP's comment above).

(df1 <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')))
# FOO
#1 A|B|C
#2 A|B
#3 B|C
#4 A
#5 C
#6 B|D|F

cSplit_e(df1, "FOO", "|", "value", "character", TRUE, fill = " ")
# FOO_A FOO_B FOO_C FOO_D FOO_F
#1 A B C 
#2 A B 
#3 B C 
#4 A 
#5 C 
#6 B D F

edited Aug 16 at 17:46

answered Aug 16 at 8:22

markus

4,971626

add a commentÂ |Â

up vote
5
down vote

Overlooked that OP asked for a base R solution. Please try @AndreElrico's, @r.user.05apr's or @milan's solutions.

This can be done with cSplit_e from the splitstackshape package:

library(splitstackshape)
cSplit_e(
 data = df,
 split.col = "FOO",
 sep = "|",
 mode = "value",
 type = "character",
 fill = " ",
 drop = TRUE
)
# FOO_A FOO_B FOO_C
#1 A B C
#2 A B 
#3 B C
#4 A 
#5 C

Does also work in case of the following df (see OP's comment above).

(df1 <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')))
# FOO
#1 A|B|C
#2 A|B
#3 B|C
#4 A
#5 C
#6 B|D|F

cSplit_e(df1, "FOO", "|", "value", "character", TRUE, fill = " ")
# FOO_A FOO_B FOO_C FOO_D FOO_F
#1 A B C 
#2 A B 
#3 B C 
#4 A 
#5 C 
#6 B D F

edited Aug 16 at 17:46

answered Aug 16 at 8:22

markus

4,971626

Overlooked that OP asked for a base R solution. Please try @AndreElrico's, @r.user.05apr's or @milan's solutions.

This can be done with cSplit_e from the splitstackshape package:

library(splitstackshape)
cSplit_e(
 data = df,
 split.col = "FOO",
 sep = "|",
 mode = "value",
 type = "character",
 fill = " ",
 drop = TRUE
)
# FOO_A FOO_B FOO_C
#1 A B C
#2 A B 
#3 B C
#4 A 
#5 C

Does also work in case of the following df (see OP's comment above).

(df1 <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C', 'B|D|F')))
# FOO
#1 A|B|C
#2 A|B
#3 B|C
#4 A
#5 C
#6 B|D|F

cSplit_e(df1, "FOO", "|", "value", "character", TRUE, fill = " ")
# FOO_A FOO_B FOO_C FOO_D FOO_F
#1 A B C 
#2 A B 
#3 B C 
#4 A 
#5 C 
#6 B D F

edited Aug 16 at 17:46

answered Aug 16 at 8:22

markus

4,971626

edited Aug 16 at 17:46

answered Aug 16 at 8:22

markus

4,971626

answered Aug 16 at 8:22

markus

4,971626

answered Aug 16 at 8:22

markus

4,971626

add a commentÂ |Â

up vote
4
down vote

In base R:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

dummy <- strsplit(as.character(df$FOO), "[|]")
want <- data.frame(values = unlist(dummy),
 ids = rep(1:length(dummy), unlist(lapply(dummy, length))), 
 stringsAsFactors = FALSE)

library(reshape2)
want <- dcast(want, ids ~ values, value.var = "values", fill = " ")[, -1] # first col removed
names(want) <- paste0("X", seq_along(unique(unlist(dummy)))) 
want
# X1 X2 X3
#1 A B C
#2 A B 
#3 B C
#4 A 
#5 C

edited Aug 16 at 12:34

answered Aug 16 at 8:24

r.user.05apr

1,5231619

Thanks for your help, but why does this retrieves the values as rows? Is there a way to keep the columns of the df?
â€“Â Biocrazy
Aug 16 at 8:28

Sorry! I just saw you used a library to "reshape" the columns. I thought they were different examples :-P
â€“Â Biocrazy
Aug 16 at 8:30

add a commentÂ |Â

up vote
4
down vote

In base R:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

dummy <- strsplit(as.character(df$FOO), "[|]")
want <- data.frame(values = unlist(dummy),
 ids = rep(1:length(dummy), unlist(lapply(dummy, length))), 
 stringsAsFactors = FALSE)

library(reshape2)
want <- dcast(want, ids ~ values, value.var = "values", fill = " ")[, -1] # first col removed
names(want) <- paste0("X", seq_along(unique(unlist(dummy)))) 
want
# X1 X2 X3
#1 A B C
#2 A B 
#3 B C
#4 A 
#5 C

edited Aug 16 at 12:34

answered Aug 16 at 8:24

r.user.05apr

1,5231619

Thanks for your help, but why does this retrieves the values as rows? Is there a way to keep the columns of the df?
â€“Â Biocrazy
Aug 16 at 8:28

Sorry! I just saw you used a library to "reshape" the columns. I thought they were different examples :-P
â€“Â Biocrazy
Aug 16 at 8:30

add a commentÂ |Â

up vote
4
down vote

In base R:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

dummy <- strsplit(as.character(df$FOO), "[|]")
want <- data.frame(values = unlist(dummy),
 ids = rep(1:length(dummy), unlist(lapply(dummy, length))), 
 stringsAsFactors = FALSE)

library(reshape2)
want <- dcast(want, ids ~ values, value.var = "values", fill = " ")[, -1] # first col removed
names(want) <- paste0("X", seq_along(unique(unlist(dummy)))) 
want
# X1 X2 X3
#1 A B C
#2 A B 
#3 B C
#4 A 
#5 C

edited Aug 16 at 12:34

answered Aug 16 at 8:24

r.user.05apr

1,5231619

In base R:

df <- data.frame(FOO = c('A|B|C', 'A|B', 'B|C', 'A', 'C'))

dummy <- strsplit(as.character(df$FOO), "[|]")
want <- data.frame(values = unlist(dummy),
 ids = rep(1:length(dummy), unlist(lapply(dummy, length))), 
 stringsAsFactors = FALSE)

library(reshape2)
want <- dcast(want, ids ~ values, value.var = "values", fill = " ")[, -1] # first col removed
names(want) <- paste0("X", seq_along(unique(unlist(dummy)))) 
want
# X1 X2 X3
#1 A B C
#2 A B 
#3 B C
#4 A 
#5 C

edited Aug 16 at 12:34

answered Aug 16 at 8:24

r.user.05apr

1,5231619

edited Aug 16 at 12:34

answered Aug 16 at 8:24

r.user.05apr

1,5231619

answered Aug 16 at 8:24

r.user.05apr

1,5231619

answered Aug 16 at 8:24

r.user.05apr

1,5231619

Thanks for your help, but why does this retrieves the values as rows? Is there a way to keep the columns of the df?
â€“Â Biocrazy
Aug 16 at 8:28

Sorry! I just saw you used a library to "reshape" the columns. I thought they were different examples :-P
â€“Â Biocrazy
Aug 16 at 8:30

add a commentÂ |Â

Thanks for your help, but why does this retrieves the values as rows? Is there a way to keep the columns of the df?
â€“Â Biocrazy
Aug 16 at 8:28

Sorry! I just saw you used a library to "reshape" the columns. I thought they were different examples :-P
â€“Â Biocrazy
Aug 16 at 8:30

Thanks for your help, but why does this retrieves the values as rows? Is there a way to keep the columns of the df?
â€“Â Biocrazy
Aug 16 at 8:28

Sorry! I just saw you used a library to "reshape" the columns. I thought they were different examples :-P
â€“Â Biocrazy
Aug 16 at 8:30

add a commentÂ |Â

up vote
2
down vote

vals <- unique(unlist(sapply(a1, function(x) strsplit(x, '|', fixed = T))))

out <- NULL
for(i in vals)
 out <- cbind(out, as.character((lapply(df$FOO, function(x) grep(x, i, value=T)))))


apply(out, 2, function(x) ifelse(x=="character(0)", NA, x))

 [,1] [,2] [,3]
[1,] "A" "B" "C" 
[2,] "A" "B" NA 
[3,] NA "B" "C" 
[4,] "A" NA NA 
[5,] NA NA "C"

edited Aug 17 at 5:59

answered Aug 16 at 8:38

milan

2,1441721

Let's say I have 80 letters (A, B, C, D...) then do I have to use grep for each one of the cases?
â€“Â Biocrazy
Aug 16 at 9:01

Changed it now so that you can easily apply it to many values.
â€“Â milan
Aug 16 at 9:06

add a commentÂ |Â

up vote
2
down vote

vals <- unique(unlist(sapply(a1, function(x) strsplit(x, '|', fixed = T))))

out <- NULL
for(i in vals)
 out <- cbind(out, as.character((lapply(df$FOO, function(x) grep(x, i, value=T)))))


apply(out, 2, function(x) ifelse(x=="character(0)", NA, x))

 [,1] [,2] [,3]
[1,] "A" "B" "C" 
[2,] "A" "B" NA 
[3,] NA "B" "C" 
[4,] "A" NA NA 
[5,] NA NA "C"

edited Aug 17 at 5:59

answered Aug 16 at 8:38

milan

2,1441721

Let's say I have 80 letters (A, B, C, D...) then do I have to use grep for each one of the cases?
â€“Â Biocrazy
Aug 16 at 9:01

Changed it now so that you can easily apply it to many values.
â€“Â milan
Aug 16 at 9:06

add a commentÂ |Â

up vote
2
down vote

vals <- unique(unlist(sapply(a1, function(x) strsplit(x, '|', fixed = T))))

out <- NULL
for(i in vals)
 out <- cbind(out, as.character((lapply(df$FOO, function(x) grep(x, i, value=T)))))


apply(out, 2, function(x) ifelse(x=="character(0)", NA, x))

 [,1] [,2] [,3]
[1,] "A" "B" "C" 
[2,] "A" "B" NA 
[3,] NA "B" "C" 
[4,] "A" NA NA 
[5,] NA NA "C"

edited Aug 17 at 5:59

answered Aug 16 at 8:38

milan

2,1441721

vals <- unique(unlist(sapply(a1, function(x) strsplit(x, '|', fixed = T))))

out <- NULL
for(i in vals)
 out <- cbind(out, as.character((lapply(df$FOO, function(x) grep(x, i, value=T)))))


apply(out, 2, function(x) ifelse(x=="character(0)", NA, x))

 [,1] [,2] [,3]
[1,] "A" "B" "C" 
[2,] "A" "B" NA 
[3,] NA "B" "C" 
[4,] "A" NA NA 
[5,] NA NA "C"

edited Aug 17 at 5:59

answered Aug 16 at 8:38

milan

2,1441721

edited Aug 17 at 5:59

answered Aug 16 at 8:38

milan

2,1441721

answered Aug 16 at 8:38

milan

2,1441721

answered Aug 16 at 8:38

milan

2,1441721

Let's say I have 80 letters (A, B, C, D...) then do I have to use grep for each one of the cases?
â€“Â Biocrazy
Aug 16 at 9:01

Changed it now so that you can easily apply it to many values.
â€“Â milan
Aug 16 at 9:06

add a commentÂ |Â

Let's say I have 80 letters (A, B, C, D...) then do I have to use grep for each one of the cases?
â€“Â Biocrazy
Aug 16 at 9:01

Changed it now so that you can easily apply it to many values.
â€“Â milan
Aug 16 at 9:06

Let's say I have 80 letters (A, B, C, D...) then do I have to use grep for each one of the cases?
â€“Â Biocrazy
Aug 16 at 9:01

Changed it now so that you can easily apply it to many values.
â€“Â milan
Aug 16 at 9:06

add a commentÂ |Â

up vote
0
down vote

You can try a tidyverse as well

library(tidyverse)
df %>%
 rownames_to_column() %>% 
 separate_rows(FOO, sep="[|]") %>% 
 mutate(L=factor(FOO, labels = paste0("X",1:length(unique(FOO))))) %>% 
 spread(L, FOO) %>% 
 select(-1)
 X1 X2 X3
1 A B C
2 A B <NA>
3 <NA> B C
4 A <NA> <NA>
5 <NA> <NA> C

edited Aug 16 at 10:18

answered Aug 16 at 9:59

Jimbou

8,58311028

add a commentÂ |Â

up vote
0
down vote

You can try a tidyverse as well

library(tidyverse)
df %>%
 rownames_to_column() %>% 
 separate_rows(FOO, sep="[|]") %>% 
 mutate(L=factor(FOO, labels = paste0("X",1:length(unique(FOO))))) %>% 
 spread(L, FOO) %>% 
 select(-1)
 X1 X2 X3
1 A B C
2 A B <NA>
3 <NA> B C
4 A <NA> <NA>
5 <NA> <NA> C

edited Aug 16 at 10:18

answered Aug 16 at 9:59

Jimbou

8,58311028

add a commentÂ |Â

up vote
0
down vote

You can try a tidyverse as well

library(tidyverse)
df %>%
 rownames_to_column() %>% 
 separate_rows(FOO, sep="[|]") %>% 
 mutate(L=factor(FOO, labels = paste0("X",1:length(unique(FOO))))) %>% 
 spread(L, FOO) %>% 
 select(-1)
 X1 X2 X3
1 A B C
2 A B <NA>
3 <NA> B C
4 A <NA> <NA>
5 <NA> <NA> C

edited Aug 16 at 10:18

answered Aug 16 at 9:59

Jimbou

8,58311028

You can try a tidyverse as well

library(tidyverse)
df %>%
 rownames_to_column() %>% 
 separate_rows(FOO, sep="[|]") %>% 
 mutate(L=factor(FOO, labels = paste0("X",1:length(unique(FOO))))) %>% 
 spread(L, FOO) %>% 
 select(-1)
 X1 X2 X3
1 A B C
2 A B <NA>
3 <NA> B C
4 A <NA> <NA>
5 <NA> <NA> C

edited Aug 16 at 10:18

answered Aug 16 at 9:59

Jimbou

8,58311028

edited Aug 16 at 10:18

answered Aug 16 at 9:59

Jimbou

8,58311028

answered Aug 16 at 9:59

Jimbou

8,58311028

answered Aug 16 at 9:59

Jimbou

8,58311028

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

Vtyjkyuk