R create recursive variable by group in data.table
up vote
1
down vote
favorite
I have a data.table like this (except I have many more observations):
name id time start rate payment
Anna 100 2000-01-01 100 4 15
Anna 100 2000-02-01 100 4 20
Anna 100 2000-03-01 100 4 25
Jenny 250 2008-01-01 200 5 10
Jenny 250 2008-02-01 200 5 20
Jenny 250 2008-03-01 200 5 30
Jenny 250 2008-04-01 200 5 35
I would like to create a new variable called for example new_var
by group (name, id
) that would equal start
variable for the first observation in each (name, id
) group and then would equal its previous value multiplied by (1+rate
) minus payment
. That is, for name
= Anna and id
= 100, new_var[1]
= 100, new_var[2]
= 100*(1+4)-20 = 480 and new_var[3]
= 480*(1+4)-25 = 2375, where 480 is the value of new_var[2]
. The whole data.table with this new variable would therefore look like this:
name id time start rate payment new_var
Anna 100 2000-01-01 100 4 15 100
Anna 100 2000-02-01 100 4 20 480
Anna 100 2000-03-01 100 4 25 2375
Jenny 250 2008-01-01 200 5 10 200
Jenny 250 2008-02-01 200 5 20 1180
Jenny 250 2008-03-01 200 5 30 7050
Jenny 250 2008-04-01 200 5 35 42265
Is it possible to achieve this somehow, preferably without a loop?
r group-by data.table
add a comment |
up vote
1
down vote
favorite
I have a data.table like this (except I have many more observations):
name id time start rate payment
Anna 100 2000-01-01 100 4 15
Anna 100 2000-02-01 100 4 20
Anna 100 2000-03-01 100 4 25
Jenny 250 2008-01-01 200 5 10
Jenny 250 2008-02-01 200 5 20
Jenny 250 2008-03-01 200 5 30
Jenny 250 2008-04-01 200 5 35
I would like to create a new variable called for example new_var
by group (name, id
) that would equal start
variable for the first observation in each (name, id
) group and then would equal its previous value multiplied by (1+rate
) minus payment
. That is, for name
= Anna and id
= 100, new_var[1]
= 100, new_var[2]
= 100*(1+4)-20 = 480 and new_var[3]
= 480*(1+4)-25 = 2375, where 480 is the value of new_var[2]
. The whole data.table with this new variable would therefore look like this:
name id time start rate payment new_var
Anna 100 2000-01-01 100 4 15 100
Anna 100 2000-02-01 100 4 20 480
Anna 100 2000-03-01 100 4 25 2375
Jenny 250 2008-01-01 200 5 10 200
Jenny 250 2008-02-01 200 5 20 1180
Jenny 250 2008-03-01 200 5 30 7050
Jenny 250 2008-04-01 200 5 35 42265
Is it possible to achieve this somehow, preferably without a loop?
r group-by data.table
Isname==id
for all obs? If so you can group just by one of the two.
– RLave
19 hours ago
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a data.table like this (except I have many more observations):
name id time start rate payment
Anna 100 2000-01-01 100 4 15
Anna 100 2000-02-01 100 4 20
Anna 100 2000-03-01 100 4 25
Jenny 250 2008-01-01 200 5 10
Jenny 250 2008-02-01 200 5 20
Jenny 250 2008-03-01 200 5 30
Jenny 250 2008-04-01 200 5 35
I would like to create a new variable called for example new_var
by group (name, id
) that would equal start
variable for the first observation in each (name, id
) group and then would equal its previous value multiplied by (1+rate
) minus payment
. That is, for name
= Anna and id
= 100, new_var[1]
= 100, new_var[2]
= 100*(1+4)-20 = 480 and new_var[3]
= 480*(1+4)-25 = 2375, where 480 is the value of new_var[2]
. The whole data.table with this new variable would therefore look like this:
name id time start rate payment new_var
Anna 100 2000-01-01 100 4 15 100
Anna 100 2000-02-01 100 4 20 480
Anna 100 2000-03-01 100 4 25 2375
Jenny 250 2008-01-01 200 5 10 200
Jenny 250 2008-02-01 200 5 20 1180
Jenny 250 2008-03-01 200 5 30 7050
Jenny 250 2008-04-01 200 5 35 42265
Is it possible to achieve this somehow, preferably without a loop?
r group-by data.table
I have a data.table like this (except I have many more observations):
name id time start rate payment
Anna 100 2000-01-01 100 4 15
Anna 100 2000-02-01 100 4 20
Anna 100 2000-03-01 100 4 25
Jenny 250 2008-01-01 200 5 10
Jenny 250 2008-02-01 200 5 20
Jenny 250 2008-03-01 200 5 30
Jenny 250 2008-04-01 200 5 35
I would like to create a new variable called for example new_var
by group (name, id
) that would equal start
variable for the first observation in each (name, id
) group and then would equal its previous value multiplied by (1+rate
) minus payment
. That is, for name
= Anna and id
= 100, new_var[1]
= 100, new_var[2]
= 100*(1+4)-20 = 480 and new_var[3]
= 480*(1+4)-25 = 2375, where 480 is the value of new_var[2]
. The whole data.table with this new variable would therefore look like this:
name id time start rate payment new_var
Anna 100 2000-01-01 100 4 15 100
Anna 100 2000-02-01 100 4 20 480
Anna 100 2000-03-01 100 4 25 2375
Jenny 250 2008-01-01 200 5 10 200
Jenny 250 2008-02-01 200 5 20 1180
Jenny 250 2008-03-01 200 5 30 7050
Jenny 250 2008-04-01 200 5 35 42265
Is it possible to achieve this somehow, preferably without a loop?
r group-by data.table
r group-by data.table
edited 19 hours ago
RLave
2,5281820
2,5281820
asked 19 hours ago
doremi
424
424
Isname==id
for all obs? If so you can group just by one of the two.
– RLave
19 hours ago
add a comment |
Isname==id
for all obs? If so you can group just by one of the two.
– RLave
19 hours ago
Is
name==id
for all obs? If so you can group just by one of the two.– RLave
19 hours ago
Is
name==id
for all obs? If so you can group just by one of the two.– RLave
19 hours ago
add a comment |
2 Answers
2
active
oldest
votes
up vote
2
down vote
I don't know how avoid a loop, but you can use it inside data.table and I think it will be efficient anyway :
### DT re-created with the following code
DT <- data.table(
name = c("Anna","Anna","Anna","Jenny","Jenny","Jenny","Jenny"),
id = c(100L,100L,100L,250L,250L,250L,250L),
time = as.Date(c("2000-01-01","2000-02-01","2000-03-01","2008-01-01","2008-02-01",
"2008-03-01","2008-04-01")),
start = c(100,100,100,200,200,200,200),
rate = c(4,4,4,5,5,5,5),
payment = c(15,20,25,10,20,30,35))
###
computeNewVar <- function(subDT)
v <- subDT$start
if(nrow(subDT)>1)
for(i in 2:nrow(subDT))
v[i] <- v[i-1] * (1+subDT$rate[i]) - subDT$payment[i]
v
DT[,new_var:=computeNewVar(.SD),by=.(name,id)]
Result :
> DT
name id time start rate payment new_var
1: Anna 100 2000-01-01 100 4 15 100
2: Anna 100 2000-02-01 100 4 20 480
3: Anna 100 2000-03-01 100 4 25 2375
4: Jenny 250 2008-01-01 200 5 10 200
5: Jenny 250 2008-02-01 200 5 20 1180
6: Jenny 250 2008-03-01 200 5 30 7050
7: Jenny 250 2008-04-01 200 5 35 42265
add a comment |
up vote
0
down vote
I'm a bit rusty with numerical approaches, but for some variety.
> aTbl[, start := as.numeric(start)]
> aTbl[, end := start]
> aTbl[, rowid := rowid(name, id)]
> aTbl
name id time start rate payment end rowid
1: Anna 100 2000-01-01 100 4 15 100 1
2: Anna 100 2000-02-01 100 4 20 100 2
3: Anna 100 2000-03-01 100 4 25 100 3
4: Jenny 250 2008-01-01 200 5 10 200 1
5: Jenny 250 2008-02-01 200 5 20 200 2
6: Jenny 250 2008-03-01 200 5 30 200 3
7: Jenny 250 2008-04-01 200 5 35 200 4
> for (i in c(1:250))
aTbl[, endPrev := shift(end)]
aTbl[rowid == 1, endPrev := NA]
aTbl[, endNew := endPrev * (1 + rate) - payment]
aTbl[, end := end + .1 * (endNew - end)]
aTbl[is.na(end), end := start]
aTbl
> aTbl[, endNew := NULL]
> aTbl[, endPrev := NULL]
> setnames(aTbl, 'end', 'new_var')
> aTbl[, rowid := NULL]
> aTbl
name id time start rate payment new_var
1: Anna 100 2000-01-01 100 4 15 100
2: Anna 100 2000-02-01 100 4 20 480
3: Anna 100 2000-03-01 100 4 25 2375
4: Jenny 250 2008-01-01 200 5 10 200
5: Jenny 250 2008-02-01 200 5 20 1180
6: Jenny 250 2008-03-01 200 5 30 7050
7: Jenny 250 2008-04-01 200 5 35 42265
>
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
I don't know how avoid a loop, but you can use it inside data.table and I think it will be efficient anyway :
### DT re-created with the following code
DT <- data.table(
name = c("Anna","Anna","Anna","Jenny","Jenny","Jenny","Jenny"),
id = c(100L,100L,100L,250L,250L,250L,250L),
time = as.Date(c("2000-01-01","2000-02-01","2000-03-01","2008-01-01","2008-02-01",
"2008-03-01","2008-04-01")),
start = c(100,100,100,200,200,200,200),
rate = c(4,4,4,5,5,5,5),
payment = c(15,20,25,10,20,30,35))
###
computeNewVar <- function(subDT)
v <- subDT$start
if(nrow(subDT)>1)
for(i in 2:nrow(subDT))
v[i] <- v[i-1] * (1+subDT$rate[i]) - subDT$payment[i]
v
DT[,new_var:=computeNewVar(.SD),by=.(name,id)]
Result :
> DT
name id time start rate payment new_var
1: Anna 100 2000-01-01 100 4 15 100
2: Anna 100 2000-02-01 100 4 20 480
3: Anna 100 2000-03-01 100 4 25 2375
4: Jenny 250 2008-01-01 200 5 10 200
5: Jenny 250 2008-02-01 200 5 20 1180
6: Jenny 250 2008-03-01 200 5 30 7050
7: Jenny 250 2008-04-01 200 5 35 42265
add a comment |
up vote
2
down vote
I don't know how avoid a loop, but you can use it inside data.table and I think it will be efficient anyway :
### DT re-created with the following code
DT <- data.table(
name = c("Anna","Anna","Anna","Jenny","Jenny","Jenny","Jenny"),
id = c(100L,100L,100L,250L,250L,250L,250L),
time = as.Date(c("2000-01-01","2000-02-01","2000-03-01","2008-01-01","2008-02-01",
"2008-03-01","2008-04-01")),
start = c(100,100,100,200,200,200,200),
rate = c(4,4,4,5,5,5,5),
payment = c(15,20,25,10,20,30,35))
###
computeNewVar <- function(subDT)
v <- subDT$start
if(nrow(subDT)>1)
for(i in 2:nrow(subDT))
v[i] <- v[i-1] * (1+subDT$rate[i]) - subDT$payment[i]
v
DT[,new_var:=computeNewVar(.SD),by=.(name,id)]
Result :
> DT
name id time start rate payment new_var
1: Anna 100 2000-01-01 100 4 15 100
2: Anna 100 2000-02-01 100 4 20 480
3: Anna 100 2000-03-01 100 4 25 2375
4: Jenny 250 2008-01-01 200 5 10 200
5: Jenny 250 2008-02-01 200 5 20 1180
6: Jenny 250 2008-03-01 200 5 30 7050
7: Jenny 250 2008-04-01 200 5 35 42265
add a comment |
up vote
2
down vote
up vote
2
down vote
I don't know how avoid a loop, but you can use it inside data.table and I think it will be efficient anyway :
### DT re-created with the following code
DT <- data.table(
name = c("Anna","Anna","Anna","Jenny","Jenny","Jenny","Jenny"),
id = c(100L,100L,100L,250L,250L,250L,250L),
time = as.Date(c("2000-01-01","2000-02-01","2000-03-01","2008-01-01","2008-02-01",
"2008-03-01","2008-04-01")),
start = c(100,100,100,200,200,200,200),
rate = c(4,4,4,5,5,5,5),
payment = c(15,20,25,10,20,30,35))
###
computeNewVar <- function(subDT)
v <- subDT$start
if(nrow(subDT)>1)
for(i in 2:nrow(subDT))
v[i] <- v[i-1] * (1+subDT$rate[i]) - subDT$payment[i]
v
DT[,new_var:=computeNewVar(.SD),by=.(name,id)]
Result :
> DT
name id time start rate payment new_var
1: Anna 100 2000-01-01 100 4 15 100
2: Anna 100 2000-02-01 100 4 20 480
3: Anna 100 2000-03-01 100 4 25 2375
4: Jenny 250 2008-01-01 200 5 10 200
5: Jenny 250 2008-02-01 200 5 20 1180
6: Jenny 250 2008-03-01 200 5 30 7050
7: Jenny 250 2008-04-01 200 5 35 42265
I don't know how avoid a loop, but you can use it inside data.table and I think it will be efficient anyway :
### DT re-created with the following code
DT <- data.table(
name = c("Anna","Anna","Anna","Jenny","Jenny","Jenny","Jenny"),
id = c(100L,100L,100L,250L,250L,250L,250L),
time = as.Date(c("2000-01-01","2000-02-01","2000-03-01","2008-01-01","2008-02-01",
"2008-03-01","2008-04-01")),
start = c(100,100,100,200,200,200,200),
rate = c(4,4,4,5,5,5,5),
payment = c(15,20,25,10,20,30,35))
###
computeNewVar <- function(subDT)
v <- subDT$start
if(nrow(subDT)>1)
for(i in 2:nrow(subDT))
v[i] <- v[i-1] * (1+subDT$rate[i]) - subDT$payment[i]
v
DT[,new_var:=computeNewVar(.SD),by=.(name,id)]
Result :
> DT
name id time start rate payment new_var
1: Anna 100 2000-01-01 100 4 15 100
2: Anna 100 2000-02-01 100 4 20 480
3: Anna 100 2000-03-01 100 4 25 2375
4: Jenny 250 2008-01-01 200 5 10 200
5: Jenny 250 2008-02-01 200 5 20 1180
6: Jenny 250 2008-03-01 200 5 30 7050
7: Jenny 250 2008-04-01 200 5 35 42265
edited 19 hours ago
answered 19 hours ago
digEmAll
45.8k984120
45.8k984120
add a comment |
add a comment |
up vote
0
down vote
I'm a bit rusty with numerical approaches, but for some variety.
> aTbl[, start := as.numeric(start)]
> aTbl[, end := start]
> aTbl[, rowid := rowid(name, id)]
> aTbl
name id time start rate payment end rowid
1: Anna 100 2000-01-01 100 4 15 100 1
2: Anna 100 2000-02-01 100 4 20 100 2
3: Anna 100 2000-03-01 100 4 25 100 3
4: Jenny 250 2008-01-01 200 5 10 200 1
5: Jenny 250 2008-02-01 200 5 20 200 2
6: Jenny 250 2008-03-01 200 5 30 200 3
7: Jenny 250 2008-04-01 200 5 35 200 4
> for (i in c(1:250))
aTbl[, endPrev := shift(end)]
aTbl[rowid == 1, endPrev := NA]
aTbl[, endNew := endPrev * (1 + rate) - payment]
aTbl[, end := end + .1 * (endNew - end)]
aTbl[is.na(end), end := start]
aTbl
> aTbl[, endNew := NULL]
> aTbl[, endPrev := NULL]
> setnames(aTbl, 'end', 'new_var')
> aTbl[, rowid := NULL]
> aTbl
name id time start rate payment new_var
1: Anna 100 2000-01-01 100 4 15 100
2: Anna 100 2000-02-01 100 4 20 480
3: Anna 100 2000-03-01 100 4 25 2375
4: Jenny 250 2008-01-01 200 5 10 200
5: Jenny 250 2008-02-01 200 5 20 1180
6: Jenny 250 2008-03-01 200 5 30 7050
7: Jenny 250 2008-04-01 200 5 35 42265
>
add a comment |
up vote
0
down vote
I'm a bit rusty with numerical approaches, but for some variety.
> aTbl[, start := as.numeric(start)]
> aTbl[, end := start]
> aTbl[, rowid := rowid(name, id)]
> aTbl
name id time start rate payment end rowid
1: Anna 100 2000-01-01 100 4 15 100 1
2: Anna 100 2000-02-01 100 4 20 100 2
3: Anna 100 2000-03-01 100 4 25 100 3
4: Jenny 250 2008-01-01 200 5 10 200 1
5: Jenny 250 2008-02-01 200 5 20 200 2
6: Jenny 250 2008-03-01 200 5 30 200 3
7: Jenny 250 2008-04-01 200 5 35 200 4
> for (i in c(1:250))
aTbl[, endPrev := shift(end)]
aTbl[rowid == 1, endPrev := NA]
aTbl[, endNew := endPrev * (1 + rate) - payment]
aTbl[, end := end + .1 * (endNew - end)]
aTbl[is.na(end), end := start]
aTbl
> aTbl[, endNew := NULL]
> aTbl[, endPrev := NULL]
> setnames(aTbl, 'end', 'new_var')
> aTbl[, rowid := NULL]
> aTbl
name id time start rate payment new_var
1: Anna 100 2000-01-01 100 4 15 100
2: Anna 100 2000-02-01 100 4 20 480
3: Anna 100 2000-03-01 100 4 25 2375
4: Jenny 250 2008-01-01 200 5 10 200
5: Jenny 250 2008-02-01 200 5 20 1180
6: Jenny 250 2008-03-01 200 5 30 7050
7: Jenny 250 2008-04-01 200 5 35 42265
>
add a comment |
up vote
0
down vote
up vote
0
down vote
I'm a bit rusty with numerical approaches, but for some variety.
> aTbl[, start := as.numeric(start)]
> aTbl[, end := start]
> aTbl[, rowid := rowid(name, id)]
> aTbl
name id time start rate payment end rowid
1: Anna 100 2000-01-01 100 4 15 100 1
2: Anna 100 2000-02-01 100 4 20 100 2
3: Anna 100 2000-03-01 100 4 25 100 3
4: Jenny 250 2008-01-01 200 5 10 200 1
5: Jenny 250 2008-02-01 200 5 20 200 2
6: Jenny 250 2008-03-01 200 5 30 200 3
7: Jenny 250 2008-04-01 200 5 35 200 4
> for (i in c(1:250))
aTbl[, endPrev := shift(end)]
aTbl[rowid == 1, endPrev := NA]
aTbl[, endNew := endPrev * (1 + rate) - payment]
aTbl[, end := end + .1 * (endNew - end)]
aTbl[is.na(end), end := start]
aTbl
> aTbl[, endNew := NULL]
> aTbl[, endPrev := NULL]
> setnames(aTbl, 'end', 'new_var')
> aTbl[, rowid := NULL]
> aTbl
name id time start rate payment new_var
1: Anna 100 2000-01-01 100 4 15 100
2: Anna 100 2000-02-01 100 4 20 480
3: Anna 100 2000-03-01 100 4 25 2375
4: Jenny 250 2008-01-01 200 5 10 200
5: Jenny 250 2008-02-01 200 5 20 1180
6: Jenny 250 2008-03-01 200 5 30 7050
7: Jenny 250 2008-04-01 200 5 35 42265
>
I'm a bit rusty with numerical approaches, but for some variety.
> aTbl[, start := as.numeric(start)]
> aTbl[, end := start]
> aTbl[, rowid := rowid(name, id)]
> aTbl
name id time start rate payment end rowid
1: Anna 100 2000-01-01 100 4 15 100 1
2: Anna 100 2000-02-01 100 4 20 100 2
3: Anna 100 2000-03-01 100 4 25 100 3
4: Jenny 250 2008-01-01 200 5 10 200 1
5: Jenny 250 2008-02-01 200 5 20 200 2
6: Jenny 250 2008-03-01 200 5 30 200 3
7: Jenny 250 2008-04-01 200 5 35 200 4
> for (i in c(1:250))
aTbl[, endPrev := shift(end)]
aTbl[rowid == 1, endPrev := NA]
aTbl[, endNew := endPrev * (1 + rate) - payment]
aTbl[, end := end + .1 * (endNew - end)]
aTbl[is.na(end), end := start]
aTbl
> aTbl[, endNew := NULL]
> aTbl[, endPrev := NULL]
> setnames(aTbl, 'end', 'new_var')
> aTbl[, rowid := NULL]
> aTbl
name id time start rate payment new_var
1: Anna 100 2000-01-01 100 4 15 100
2: Anna 100 2000-02-01 100 4 20 480
3: Anna 100 2000-03-01 100 4 25 2375
4: Jenny 250 2008-01-01 200 5 10 200
5: Jenny 250 2008-02-01 200 5 20 1180
6: Jenny 250 2008-03-01 200 5 30 7050
7: Jenny 250 2008-04-01 200 5 35 42265
>
edited 2 hours ago
answered 7 hours ago
Clayton Stanley
4,46232240
4,46232240
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53222114%2fr-create-recursive-variable-by-group-in-data-table%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Is
name==id
for all obs? If so you can group just by one of the two.– RLave
19 hours ago