Wednesday, 14 August 2013

select-from-where procedure using R

select-from-where procedure using R

I have two dataframes, here's the first:
df <- data.frame(p=letters[1:3],y1=c(2,4,3))
df
p y1
1 a 2
2 b 4
3 c 3
and the second one:
df2 <-
data.frame(p=rep(letters[1:3],c(3,2,4)),y2=c(3,1,1,4,3,4,3,3,1),d=rep(1,length=9))
df2
p y2 d
1 a 3 1
2 a 1 1
3 a 1 1
4 b 4 1
5 b 3 1
6 c 4 1
7 c 3 1
8 c 3 1
9 c 1 1
What I want to do is get those lines in df2, where for each value of
p(a,b,c etc.) where d=1 (which in this case are all rows), y2 is bigger
than y1 grouped by p in df.
Because this explanation probably doesn't make sense, the two lines that
need to be kicked: line 1 in df2, because for a, y2=3 is greater than y1=2
in df, and line 6, because for c in df2, y has value 4, but the value for
c in df is 3.
Since I'm working with data.tables, a "data.table-solution" would be nice,
maybe something like:
setkey(df2,d)
df2[1,y>??,by="p"]

No comments:

Post a Comment