Skip to main content
  1. About
  2. For Teams
Asked
Viewed 607 times
Part of R Language Collective
0

Here is my data:

dat <- read.table(text = "id    val1    val2    vt
1   14  12  19
2   13  13  12
3   12  12  13
4   12  13  13
5   12  14  22
6   12  12  14
7   12  13  14
8   12  14  12
9   13  13  14
10  13  14  14
11  14  14  14
12  13  14  17
13  13  14  31
14  13  13  14
15  13  14  13
16  13  14  23
                
", header = TRUE)

I want to get the top 25 % and the bottom 45% according to vt.

Here is the output top25%

id  val1    val2    vt
13  13  14  31
16  13  14  23
5   12  14  22
1   14  12  19

and the top 45% is

id  val1    val2    vt
7   12  13  14
9   13  13  14
10  13  14  14
11  14  14  14
14  13  13  14
3   12  12  13
4   12  13  13
15  13  14  13
2   13  13  12
8   12  14  12

I have tried subset() with quantile, it seems it does not work for the bottom n%. Is it possible to do it with dplyr? I have checked the other links, they have not provided for the bottom n%. In addition, I do not want to get them by any group.

2
  • 1
    PLease check the slice_max and slice_min function from the tidyverse.
    deschen
    –  deschen
    2021-01-22 14:25:05 +00:00
    Commented Jan 22, 2021 at 14:25
  • I Have edited the questions. Please open it if at all possible
    user330
    –  user330
    2021-01-22 15:16:53 +00:00
    Commented Jan 22, 2021 at 15:16

2 Answers 2

1

Use dplyr::slice_min() and dplyr::slice_max().

library(dplyr)
library(magrittr)

df <- read.table(text = "id    val1    val2    vt
1   14  12  19
2   13  13  12
3   12  12  13
4   12  13  13
5   12  14  22
6   12  12  14
7   12  13  14
8   12  14  12
9   13  13  14
10  13  14  14
11  14  14  14
12  13  14  17
13  13  14  31
14  13  13  14
15  13  14  13
16  13  14  23
                
", header = TRUE)

df %>% slice_max(order_by = vt, prop = 0.25)
#   id val1 val2 vt
# 1 13   13   14 31
# 2 16   13   14 23
# 3  5   12   14 22
# 4  1   14   12 19

df %>% slice_min(order_by = vt, prop = 0.45)
#    id val1 val2 vt
# 1   2   13   13 12
# 2   8   12   14 12
# 3   3   12   12 13
# 4   4   12   13 13
# 5  15   13   14 13
# 6   6   12   12 14
# 7   7   12   13 14
# 8   9   13   13 14
# 9  10   13   14 14
# 10 11   14   14 14
# 11 14   13   13 14
Sign up to request clarification or add additional context in comments.

7 Comments

If you have similar values, it does not properly slice the top and button n%
@user330 Could you elaborate?
For example, if 31 appears 5 times, you get 5 rows instead of 4 rows,
So you don't want identical vt values to be treated as distinct (values)?
The identical values are not an issue. I want to code behave correctly. For example, if we have 16 values, it needs to slice 4 at the top and 4 at the butoom
|
0

Perhaps you can try findInterval + quantile like below

res <- with(dat, split(dat, findInterval(vt, quantile(vt, c(.45, .75)), left.open = TRUE)))
res_45bottom <- head(res, 1)[[1]]
res_25top <- tail(res, 1)[[1]]

such that

> res_45bottom
   id val1 val2 vt
2   2   13   13 12
3   3   12   12 13
4   4   12   13 13
6   6   12   12 14
7   7   12   13 14
8   8   12   14 12
9   9   13   13 14
10 10   13   14 14
11 11   14   14 14
14 14   13   13 14
15 15   13   14 13

> res_25top
   id val1 val2 vt
1   1   14   12 19
5   5   12   14 22
13 13   13   14 31
16 16   13   14 23

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.