How to get top n % and button n% in data frame in R [duplicate]

Question

Here is my data:

dat <- read.table(text = "id    val1    val2    vt
1   14  12  19
2   13  13  12
3   12  12  13
4   12  13  13
5   12  14  22
6   12  12  14
7   12  13  14
8   12  14  12
9   13  13  14
10  13  14  14
11  14  14  14
12  13  14  17
13  13  14  31
14  13  13  14
15  13  14  13
16  13  14  23
                
", header = TRUE)

I want to get the top 25 % and the bottom 45% according to vt.

Here is the output top25%

id  val1    val2    vt
13  13  14  31
16  13  14  23
5   12  14  22
1   14  12  19

and the top 45% is

id  val1    val2    vt
7   12  13  14
9   13  13  14
10  13  14  14
11  14  14  14
14  13  13  14
3   12  12  13
4   12  13  13
15  13  14  13
2   13  13  12
8   12  14  12

I have tried subset() with quantile, it seems it does not work for the bottom n%. Is it possible to do it with dplyr? I have checked the other links, they have not provided for the bottom n%. In addition, I do not want to get them by any group.

PLease check the slice_max and slice_min function from the tidyverse. — deschen, Commented Jan 22, 2021 at 14:25
I Have edited the questions. Please open it if at all possible — user330, Commented Jan 22, 2021 at 15:16

Dunois · Accepted Answer · 2021-01-22 14:24:47Z

1

Use dplyr::slice_min() and dplyr::slice_max().

library(dplyr)
library(magrittr)

df <- read.table(text = "id    val1    val2    vt
1   14  12  19
2   13  13  12
3   12  12  13
4   12  13  13
5   12  14  22
6   12  12  14
7   12  13  14
8   12  14  12
9   13  13  14
10  13  14  14
11  14  14  14
12  13  14  17
13  13  14  31
14  13  13  14
15  13  14  13
16  13  14  23
                
", header = TRUE)

df %>% slice_max(order_by = vt, prop = 0.25)
#   id val1 val2 vt
# 1 13   13   14 31
# 2 16   13   14 23
# 3  5   12   14 22
# 4  1   14   12 19

df %>% slice_min(order_by = vt, prop = 0.45)
#    id val1 val2 vt
# 1   2   13   13 12
# 2   8   12   14 12
# 3   3   12   12 13
# 4   4   12   13 13
# 5  15   13   14 13
# 6   6   12   12 14
# 7   7   12   13 14
# 8   9   13   13 14
# 9  10   13   14 14
# 10 11   14   14 14
# 11 14   13   13 14

answered Jan 22, 2021 at 14:24

Dunois

1,86311 gold badge1111 silver badges2424 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

user330 Over a year ago

If you have similar values, it does not properly slice the top and button n%

Dunois Over a year ago

@user330 Could you elaborate?

user330 Over a year ago

For example, if 31 appears 5 times, you get 5 rows instead of 4 rows,

Dunois Over a year ago

So you don't want identical vt values to be treated as distinct (values)?

user330 Over a year ago

The identical values are not an issue. I want to code behave correctly. For example, if we have 16 values, it needs to slice 4 at the top and 4 at the butoom

|

ThomasIsCoding · Accepted Answer · 2021-01-22 14:48:07Z

Perhaps you can try findInterval + quantile like below

res <- with(dat, split(dat, findInterval(vt, quantile(vt, c(.45, .75)), left.open = TRUE)))
res_45bottom <- head(res, 1)[[1]]
res_25top <- tail(res, 1)[[1]]

such that

> res_45bottom
   id val1 val2 vt
2   2   13   13 12
3   3   12   12 13
4   4   12   13 13
6   6   12   12 14
7   7   12   13 14
8   8   12   14 12
9   9   13   13 14
10 10   13   14 14
11 11   14   14 14
14 14   13   13 14
15 15   13   14 13

> res_25top
   id val1 val2 vt
1   1   14   12 19
5   5   12   14 22
13 13   13   14 31
16 16   13   14 23

Collectives™ on Stack Overflow

How to get top n % and button n% in data frame in R [duplicate]

2 Answers 2

7 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Linked

Related