The performance of SQL query is too poor when db size is large #1457

Oct 27, 2022

HeavenNash
Oct 27, 2022

What would you like to be added or enhanced

I create one table which contains 22 fields, and create indexes on 5 columns. The type of indexed column is integer. Then I insert about 180,000 records the immudb. After that, when I execute SQL query "select * from tablename where column1='value' ", column1 is indexed column. Actually, the size of query result is 50, it takes more than 3 seconds.
In addition, the query speed of "select count(*) from tablename where column1 = 'value' " is also very slow. It also takes more than 3 seconds.
The performance is too poor. It can't satisfy the requirement of application development.
Is there any methods to solve this problem?

Oct 27, 2022

byo
Oct 27, 2022

Hi @HeavenNash , the SELECT COUNT(*) will currently be a full-table scan, optimization for that is on the roadmap but currently we're focused on different immudb areas.

Regarding the select * based on index - this definitely looks like immudb did not use a correct index for that query. Are you able to share the create table / create index and the select statements?

0 replies

Oct 28, 2022

HeavenNash
Oct 28, 2022
Author

@byo , Thank you for your reply. The statement of create table is as following:

//the number of total fields is 22
CREATE TABLE IF NOT EXISTS myTable (
    id           INTEGER AUTO_INCREMENT,
    task_id      INTEGER,
    project_id    INTEGER,
    sceneid,        INTEGER,
    other_id1,     INTEGER,
    other_id2,     varchar[256],
    ......
    create_time  TIMESTAMP,
    PRIMARY KEY (id)
);

the statement of create index is as following:

CREATE INDEX ON myTable (task_id);
CREATE INDEX ON myTable (project_id);
CREATE INDEX ON myTable (sceneid);
CREATE INDEX ON myTable (other_id1);
CREATE INDEX ON myTable (other_id2);

Then I use the insert sql statement like "insert into myTable(task_id,project_id,sceneid,other_id1,other_id2, ... )". For each task_id, I insert 50 records. Maybe the first 100 records in the immudb are

"(id:1,  task_id:1, ... ... ... )"
"(id:2,  task_id:2, ... ... ... )"
"(id:3,  task_id:3, ... ... ... )"
... ...
... ...
"(id:50,  task_id:50, ... ... ... )"

After I insert about 18,000 records, task_id ranges from 1 to 3,600. Then I query the table
'select * from myTable where task_id = 3600 (or other number) ', the query speed is very slow.
Can you help me to analyze the reason ?

0 replies

Oct 28, 2022

byo
Oct 28, 2022

@HeavenNash one thing that comes to my mind - in order to force using given index you can use ORDER BY clause on that column - can you check if this would work efficiently: select * from myTable where task_id = 3600 ORDER BY task_id ?

0 replies

Oct 28, 2022

jeroiraz
Oct 28, 2022

select * from myTable where task_id = 3600

SELECT * FROM myTable USE INDEX ON (task_id) WHERE task_id = 3600 should also work

0 replies

Oct 31, 2022

HeavenNash
Oct 31, 2022
Author

@byo @jeroiraz Thank you very much! The two methods both could work. Can you tell me why do we must use 'order by' clause to force using given index ? Will immudb support the implicit index in future , such as traditional database mysql ?
Besides, when I use two columns as primary key('task_id' and 'time_stamp'), searching speed by 'task_id' can be very fast.

0 replies

Oct 31, 2022

jeroiraz
Oct 31, 2022

@byo @jeroiraz Thank you very much! The two methods both could work. Can you tell me why do we must use 'order by' clause to force using given index ? Will immudb support the implicit index in future , such as traditional database mysql ? Besides, when I use two columns as primary key('task_id' and 'time_stamp'), searching speed by 'task_id' can be very fast.

Hi @HeavenNash, it's a very good question. All queries are resolved using an index, by default the primary index is used but order by clause is currently forcing the use of a compatible index with the column used for the ordering, because immudb is not currently sorting in-memory. Either of the solutions we mentioned will instruct immudb to resolve the query using a different index. The reason this is needed is because we haven't yet implemented an automated index selection based on the query and index information, but it will eventually be implemented. Please feel free to open a feature request for that.

Multi-column indexing is already supported, as you mention having a condition on the first indexed column will speed up, or equality on the first column and then a condition in the second e.g. task_id = 200 and time_stamp > @ts

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The performance of SQL query is too poor when db size is large #1457

Uh oh!

{{title}}

Uh oh!

Replies: 6 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Search code, repositories, users, issues, pull requests...

The performance of SQL query is too poor when db size is large #1457

Uh oh!

HeavenNash Oct 27, 2022

Replies: 6 comments

Uh oh!

byo Oct 27, 2022

Uh oh!

HeavenNash Oct 28, 2022 Author

Uh oh!

byo Oct 28, 2022

Uh oh!

jeroiraz Oct 28, 2022

Uh oh!

HeavenNash Oct 31, 2022 Author

Uh oh!

jeroiraz Oct 31, 2022

HeavenNash
Oct 27, 2022

byo
Oct 27, 2022

HeavenNash
Oct 28, 2022
Author

byo
Oct 28, 2022

jeroiraz
Oct 28, 2022

HeavenNash
Oct 31, 2022
Author

jeroiraz
Oct 31, 2022