Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[Fix](merge-on-write) Should update pending delete bitmap KVs in MS when no need to calc delete bitmaps in publish phase #46039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Dec 26, 2024

What problem does this PR solve?

consider the following situation:

  1. Txn A acquires the lock, obtains version X to publish, calculates the delete bitmap, writes the pending delete bitmap KVs to the MS, but fails for some reason before committing the transaction in the MS.
  2. Txn B acquires the lock, obtains version X to publish, cleans up the pending delete bitmap KV written by Txn A, calculates the delete bitmap, writes its pending delete bitmap KV to the MS, but also fails for some reason before committing the transaction in the MS.
  3. Txn A then reacquires the lock, obtains version X to publish, and notices that neither the version nor the compaction counts have changed. It will skip the process of calculating the delete bitmap and writing the pending delete bitmap KV to the MS [Fix](merge-on-write) Fix FE may use the staled response to wrongly commit txn #39018 and eventually succeeds in committing the transaction in the MS.

In this case, Txn A will save the wrong delete bitmaps(generated by Txn B) in MS and causing correctness problem.

To solve the problem, we should still update delete bitmap KVs in MS when we skip the calculation of delete bitmap on BE in publish phase.

Also add a defensive check: record lock_id when writing pending delete bitmap keys and check if the lock_id is correct when commit txn in MS.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Dec 26, 2024

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1
Copy link
Contributor Author

bobhan1 commented Dec 26, 2024

run buildall

@zhannngchen
Copy link
Contributor

we need to add an additional filed to identify the owner of the pending delete bitmap

@doris-robot
Copy link

TPC-H: Total hot run time: 32447 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 85ac8050efc77e8566ca6b8c6c6ef6db500b4d4f, data reload: false

------ Round 1 ----------------------------------
q1	17580	6151	6033	6033
q2	2041	311	165	165
q3	10416	1261	717	717
q4	10198	879	429	429
q5	7498	2138	1973	1973
q6	204	179	144	144
q7	902	748	598	598
q8	9239	1346	1139	1139
q9	5290	5005	4954	4954
q10	6768	2327	1848	1848
q11	480	275	261	261
q12	342	355	214	214
q13	17779	3603	3008	3008
q14	242	239	225	225
q15	566	499	522	499
q16	642	608	589	589
q17	570	837	320	320
q18	7147	6694	6300	6300
q19	1225	966	547	547
q20	312	322	187	187
q21	2830	2162	1980	1980
q22	367	334	317	317
Total cold run time: 102638 ms
Total hot run time: 32447 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6288	6263	6298	6263
q2	236	324	236	236
q3	2228	2588	2335	2335
q4	1382	1834	1356	1356
q5	4342	4764	4778	4764
q6	187	177	141	141
q7	2123	1978	1887	1887
q8	2566	2808	2629	2629
q9	7288	7179	7305	7179
q10	3090	3358	2812	2812
q11	606	518	500	500
q12	683	736	579	579
q13	3387	3737	3116	3116
q14	284	292	282	282
q15	571	526	510	510
q16	660	692	634	634
q17	1221	1754	1247	1247
q18	7788	7423	7013	7013
q19	782	913	1111	913
q20	1938	1969	1790	1790
q21	5387	5048	4883	4883
q22	621	613	577	577
Total cold run time: 53658 ms
Total hot run time: 51646 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.83% (10104/26022)
Line Coverage: 29.83% (85339/286081)
Region Coverage: 28.97% (43613/150543)
Branch Coverage: 25.51% (22239/87192)
Coverage Report: http://coverage.selectdb-in.cc/coverage/85ac8050efc77e8566ca6b8c6c6ef6db500b4d4f_85ac8050efc77e8566ca6b8c6c6ef6db500b4d4f/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 190524 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 85ac8050efc77e8566ca6b8c6c6ef6db500b4d4f, data reload: false

query1	961	387	391	387
query2	6520	2391	2413	2391
query3	6712	216	211	211
query4	34094	23636	23371	23371
query5	4406	645	460	460
query6	312	215	188	188
query7	4628	500	300	300
query8	315	247	246	246
query9	9544	2764	2744	2744
query10	482	321	268	268
query11	18049	15347	15241	15241
query12	157	107	107	107
query13	1678	558	404	404
query14	10501	7419	7832	7419
query15	280	192	183	183
query16	8141	584	454	454
query17	1583	742	546	546
query18	2075	393	284	284
query19	216	184	147	147
query20	117	118	108	108
query21	212	121	101	101
query22	4151	4381	4115	4115
query23	34237	33786	33340	33340
query24	6544	2255	2250	2250
query25	479	443	372	372
query26	1186	269	154	154
query27	2016	448	329	329
query28	5190	2431	2436	2431
query29	720	535	416	416
query30	224	182	151	151
query31	1046	891	799	799
query32	79	60	59	59
query33	505	353	294	294
query34	767	835	515	515
query35	808	827	770	770
query36	1013	1060	924	924
query37	118	102	78	78
query38	4062	4208	4203	4203
query39	1503	1425	1430	1425
query40	209	115	105	105
query41	54	57	45	45
query42	123	101	106	101
query43	539	549	525	525
query44	1316	802	790	790
query45	175	172	167	167
query46	864	1043	647	647
query47	1891	1916	1860	1860
query48	377	403	332	332
query49	764	500	383	383
query50	618	661	382	382
query51	7264	7105	7162	7105
query52	107	104	95	95
query53	224	255	207	207
query54	483	502	414	414
query55	79	73	81	73
query56	244	290	239	239
query57	1187	1212	1101	1101
query58	237	231	231	231
query59	3196	3378	3034	3034
query60	285	258	248	248
query61	112	105	105	105
query62	862	810	752	752
query63	232	196	195	195
query64	4387	1054	680	680
query65	3326	3204	3196	3196
query66	1018	426	309	309
query67	15835	15840	15495	15495
query68	8604	757	511	511
query69	471	311	259	259
query70	1249	1145	1180	1145
query71	431	279	255	255
query72	5903	3875	3875	3875
query73	668	770	356	356
query74	9787	8999	8867	8867
query75	4640	3158	2638	2638
query76	4518	1225	795	795
query77	806	367	282	282
query78	10112	10084	9380	9380
query79	3096	893	580	580
query80	703	534	462	462
query81	484	278	233	233
query82	682	152	121	121
query83	162	173	149	149
query84	238	90	72	72
query85	788	362	311	311
query86	365	323	295	295
query87	4660	4582	4438	4438
query88	4727	2207	2205	2205
query89	414	347	300	300
query90	1894	260	200	200
query91	138	135	107	107
query92	64	56	55	55
query93	1494	898	531	531
query94	689	378	270	270
query95	327	259	248	248
query96	512	612	288	288
query97	2716	2824	2664	2664
query98	232	195	193	193
query99	1748	1546	1438	1438
Total cold run time: 294961 ms
Total hot run time: 190524 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.56 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 85ac8050efc77e8566ca6b8c6c6ef6db500b4d4f, data reload: false

query1	0.03	0.03	0.03
query2	0.08	0.03	0.03
query3	0.23	0.07	0.07
query4	1.62	0.11	0.11
query5	0.44	0.42	0.40
query6	1.17	0.65	0.64
query7	0.03	0.01	0.01
query8	0.04	0.03	0.03
query9	0.59	0.50	0.49
query10	0.55	0.56	0.55
query11	0.15	0.12	0.11
query12	0.14	0.10	0.11
query13	0.61	0.61	0.60
query14	2.84	2.78	2.78
query15	0.89	0.82	0.82
query16	0.37	0.38	0.38
query17	1.07	0.98	1.06
query18	0.23	0.20	0.21
query19	2.01	1.89	2.00
query20	0.01	0.01	0.01
query21	15.36	0.93	0.59
query22	0.75	0.84	0.72
query23	15.26	1.43	0.57
query24	2.70	1.10	1.69
query25	0.15	0.21	0.10
query26	0.28	0.14	0.13
query27	0.05	0.05	0.05
query28	14.01	1.52	1.06
query29	12.60	3.97	3.21
query30	0.25	0.09	0.06
query31	2.81	0.58	0.39
query32	3.23	0.55	0.47
query33	3.09	3.08	3.03
query34	16.78	5.09	4.56
query35	4.50	4.51	4.53
query36	0.65	0.48	0.48
query37	0.10	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.03
query40	0.17	0.13	0.13
query41	0.08	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 106.07 s
Total hot run time: 31.56 s

@bobhan1 bobhan1 force-pushed the fix-pending-delete-bitmaps-removed-by-other-txn branch from 2bb6847 to c55511c Compare December 26, 2024 13:00
@bobhan1
Copy link
Contributor Author

bobhan1 commented Dec 26, 2024

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.82% (10103/26022)
Line Coverage: 29.85% (85382/286081)
Region Coverage: 28.97% (43618/150543)
Branch Coverage: 25.51% (22243/87192)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c55511cff77fdbb1066f1a10a087e49e3abb3904_c55511cff77fdbb1066f1a10a087e49e3abb3904/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 32709 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c55511cff77fdbb1066f1a10a087e49e3abb3904, data reload: false

------ Round 1 ----------------------------------
q1	17592	6229	6072	6072
q2	2052	296	161	161
q3	10433	1339	714	714
q4	10210	873	433	433
q5	7556	2243	2025	2025
q6	203	181	145	145
q7	912	745	616	616
q8	9249	1394	1183	1183
q9	5320	5007	5019	5007
q10	6781	2327	1863	1863
q11	460	276	252	252
q12	344	368	231	231
q13	17777	3605	2919	2919
q14	234	251	212	212
q15	555	508	512	508
q16	646	616	581	581
q17	581	859	326	326
q18	7363	6446	6499	6446
q19	2199	971	569	569
q20	305	316	184	184
q21	2831	2224	1948	1948
q22	377	335	314	314
Total cold run time: 103980 ms
Total hot run time: 32709 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6313	6310	6284	6284
q2	236	335	229	229
q3	2219	2645	2323	2323
q4	1418	1833	1337	1337
q5	4371	4827	4951	4827
q6	196	180	140	140
q7	2121	2013	1797	1797
q8	2704	2755	2647	2647
q9	7397	7240	7310	7240
q10	3048	3330	2774	2774
q11	578	512	500	500
q12	653	754	605	605
q13	3427	3750	3135	3135
q14	297	317	302	302
q15	566	511	495	495
q16	632	686	623	623
q17	1238	1752	1261	1261
q18	7694	7653	7027	7027
q19	824	996	1086	996
q20	1887	1955	1930	1930
q21	5492	5196	4727	4727
q22	644	631	580	580
Total cold run time: 53955 ms
Total hot run time: 51779 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190186 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c55511cff77fdbb1066f1a10a087e49e3abb3904, data reload: false

query1	996	369	383	369
query2	6545	2500	2380	2380
query3	6714	217	212	212
query4	34268	23720	23481	23481
query5	4317	621	450	450
query6	294	222	193	193
query7	4633	477	294	294
query8	293	238	230	230
query9	9281	2716	2714	2714
query10	455	304	253	253
query11	18175	15260	15162	15162
query12	160	103	107	103
query13	1679	520	416	416
query14	10440	7052	6931	6931
query15	222	204	181	181
query16	8018	601	463	463
query17	1582	751	582	582
query18	1927	401	301	301
query19	226	183	153	153
query20	124	114	110	110
query21	207	125	106	106
query22	4229	4500	4272	4272
query23	35044	33720	33400	33400
query24	6422	2236	2328	2236
query25	475	447	376	376
query26	1161	267	152	152
query27	1981	470	331	331
query28	5166	2438	2410	2410
query29	564	543	464	464
query30	230	178	145	145
query31	994	914	816	816
query32	72	63	58	58
query33	502	370	297	297
query34	772	838	506	506
query35	770	836	744	744
query36	977	1032	982	982
query37	123	109	80	80
query38	4409	4211	4076	4076
query39	1506	1437	1421	1421
query40	207	109	102	102
query41	45	48	43	43
query42	117	101	102	101
query43	522	547	508	508
query44	1276	782	791	782
query45	185	173	165	165
query46	877	1037	645	645
query47	1915	1932	1856	1856
query48	391	406	327	327
query49	784	463	380	380
query50	609	647	380	380
query51	7117	7322	6936	6936
query52	100	100	87	87
query53	219	248	181	181
query54	473	507	405	405
query55	79	78	77	77
query56	255	244	265	244
query57	1228	1214	1122	1122
query58	228	231	225	225
query59	3219	3219	3120	3120
query60	275	256	238	238
query61	114	105	113	105
query62	889	803	721	721
query63	221	189	189	189
query64	4112	993	688	688
query65	3283	3222	3211	3211
query66	1060	423	311	311
query67	15923	15849	15621	15621
query68	7390	743	499	499
query69	452	305	246	246
query70	1263	1105	1058	1058
query71	425	293	248	248
query72	5979	3854	3904	3854
query73	1485	751	363	363
query74	10256	9098	8929	8929
query75	4385	3157	2628	2628
query76	5065	1165	764	764
query77	937	361	266	266
query78	10668	10448	9659	9659
query79	1692	865	584	584
query80	700	519	420	420
query81	472	280	223	223
query82	200	156	121	121
query83	203	172	149	149
query84	284	88	69	69
query85	741	367	313	313
query86	347	303	306	303
query87	4561	4591	4341	4341
query88	3594	2249	2193	2193
query89	397	326	293	293
query90	1999	186	184	184
query91	132	134	102	102
query92	65	62	54	54
query93	966	765	526	526
query94	663	383	286	286
query95	323	256	247	247
query96	493	605	277	277
query97	2675	2792	2724	2724
query98	217	201	211	201
query99	1632	1630	1442	1442
Total cold run time: 292337 ms
Total hot run time: 190186 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.58 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c55511cff77fdbb1066f1a10a087e49e3abb3904, data reload: false

query1	0.03	0.03	0.05
query2	0.07	0.04	0.03
query3	0.24	0.07	0.07
query4	1.60	0.11	0.10
query5	0.42	0.43	0.41
query6	1.14	0.64	0.66
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.58	0.50	0.52
query10	0.55	0.59	0.55
query11	0.15	0.10	0.09
query12	0.14	0.11	0.11
query13	0.62	0.60	0.60
query14	2.72	2.75	2.74
query15	0.90	0.82	0.83
query16	0.39	0.38	0.40
query17	1.03	1.00	1.09
query18	0.23	0.21	0.21
query19	1.88	1.89	2.01
query20	0.02	0.02	0.01
query21	15.37	0.86	0.58
query22	0.76	0.80	0.68
query23	15.30	1.45	0.59
query24	2.54	1.48	1.01
query25	0.24	0.16	0.17
query26	0.30	0.14	0.15
query27	0.05	0.05	0.04
query28	14.14	1.53	1.04
query29	12.58	3.92	3.29
query30	0.25	0.08	0.06
query31	2.81	0.57	0.38
query32	3.22	0.54	0.47
query33	3.20	3.09	3.13
query34	16.69	5.12	4.56
query35	4.58	4.47	4.52
query36	0.66	0.48	0.49
query37	0.10	0.06	0.05
query38	0.04	0.04	0.04
query39	0.04	0.03	0.02
query40	0.17	0.15	0.12
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 105.95 s
Total hot run time: 31.58 s

@bobhan1 bobhan1 requested a review from zhannngchen December 27, 2024 03:03
@bobhan1
Copy link
Contributor Author

bobhan1 commented Dec 27, 2024

run buildall

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Dec 27, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 32722 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit be16edd78c014fc075342f0b58d4de0a7f0f2b3e, data reload: false

------ Round 1 ----------------------------------
q1	17632	6183	6091	6091
q2	2054	318	174	174
q3	10471	1260	723	723
q4	10212	863	431	431
q5	7575	2236	1985	1985
q6	213	184	147	147
q7	918	775	618	618
q8	9233	1415	1152	1152
q9	5245	4975	4998	4975
q10	6762	2344	1876	1876
q11	486	283	263	263
q12	358	356	219	219
q13	17778	3590	2908	2908
q14	228	248	212	212
q15	568	506	506	506
q16	648	614	589	589
q17	585	859	330	330
q18	7201	6595	6478	6478
q19	2094	964	548	548
q20	312	310	193	193
q21	2899	2238	1979	1979
q22	368	332	325	325
Total cold run time: 103840 ms
Total hot run time: 32722 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6363	6250	6229	6229
q2	247	329	240	240
q3	2266	2616	2332	2332
q4	1402	1797	1350	1350
q5	4405	4757	4887	4757
q6	193	180	161	161
q7	2140	1983	1791	1791
q8	2580	2783	2734	2734
q9	7365	7232	7295	7232
q10	3072	3320	2823	2823
q11	589	503	489	489
q12	673	778	622	622
q13	3361	3742	3133	3133
q14	300	304	265	265
q15	565	519	507	507
q16	666	703	657	657
q17	1228	1722	1262	1262
q18	7588	7462	7218	7218
q19	776	1122	1056	1056
q20	1960	1989	1802	1802
q21	5346	4999	5028	4999
q22	628	608	558	558
Total cold run time: 53713 ms
Total hot run time: 52217 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190734 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit be16edd78c014fc075342f0b58d4de0a7f0f2b3e, data reload: false

query1	966	406	394	394
query2	6557	2362	2369	2362
query3	6701	211	216	211
query4	33716	24016	23565	23565
query5	4370	645	477	477
query6	283	196	189	189
query7	4663	483	294	294
query8	289	235	235	235
query9	9762	2718	2716	2716
query10	457	305	256	256
query11	18414	15676	15123	15123
query12	171	113	104	104
query13	1685	566	411	411
query14	10922	7576	7170	7170
query15	239	195	186	186
query16	8156	614	440	440
query17	1589	747	578	578
query18	2109	408	300	300
query19	216	187	155	155
query20	131	114	115	114
query21	209	125	105	105
query22	4457	4625	4278	4278
query23	34698	33612	33480	33480
query24	6433	2228	2259	2228
query25	487	445	403	403
query26	1198	263	149	149
query27	2026	452	329	329
query28	5400	2459	2432	2432
query29	726	529	406	406
query30	227	184	154	154
query31	985	920	840	840
query32	89	60	60	60
query33	500	357	287	287
query34	755	841	504	504
query35	803	822	724	724
query36	1059	1065	974	974
query37	124	98	74	74
query38	4178	4242	4119	4119
query39	1504	1467	1425	1425
query40	208	118	101	101
query41	49	44	47	44
query42	117	104	103	103
query43	528	526	508	508
query44	1341	805	792	792
query45	176	172	164	164
query46	856	1043	631	631
query47	1917	1933	1857	1857
query48	383	406	327	327
query49	767	470	389	389
query50	616	642	387	387
query51	7291	7282	6925	6925
query52	100	100	91	91
query53	223	247	190	190
query54	486	469	389	389
query55	79	75	75	75
query56	260	265	242	242
query57	1196	1171	1141	1141
query58	260	230	236	230
query59	3225	3311	3099	3099
query60	271	267	241	241
query61	113	111	111	111
query62	883	803	730	730
query63	222	193	189	189
query64	4464	987	633	633
query65	3322	3216	3265	3216
query66	1058	415	311	311
query67	15921	15923	15607	15607
query68	9397	742	497	497
query69	477	289	254	254
query70	1185	1147	1156	1147
query71	438	285	251	251
query72	5806	3801	3765	3765
query73	669	750	356	356
query74	10722	9416	9162	9162
query75	4485	3153	2650	2650
query76	5591	1171	764	764
query77	1019	364	269	269
query78	10131	10213	9419	9419
query79	2837	903	596	596
query80	707	510	427	427
query81	465	273	230	230
query82	342	148	119	119
query83	198	207	141	141
query84	288	88	70	70
query85	741	348	298	298
query86	354	315	301	301
query87	4534	4473	4499	4473
query88	3340	2243	2197	2197
query89	413	337	303	303
query90	2080	190	188	188
query91	135	133	104	104
query92	66	64	59	59
query93	1613	884	523	523
query94	672	370	304	304
query95	337	262	268	262
query96	491	606	281	281
query97	2763	2840	2692	2692
query98	220	204	206	204
query99	1640	1563	1454	1454
Total cold run time: 297742 ms
Total hot run time: 190734 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.31 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit be16edd78c014fc075342f0b58d4de0a7f0f2b3e, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.03
query3	0.24	0.06	0.07
query4	1.61	0.10	0.11
query5	0.41	0.40	0.42
query6	1.14	0.66	0.64
query7	0.02	0.02	0.02
query8	0.05	0.04	0.03
query9	0.58	0.49	0.50
query10	0.55	0.57	0.55
query11	0.14	0.10	0.10
query12	0.13	0.12	0.12
query13	0.61	0.61	0.59
query14	2.74	2.89	2.77
query15	0.90	0.83	0.83
query16	0.38	0.38	0.37
query17	1.06	1.05	0.97
query18	0.22	0.22	0.21
query19	2.00	1.92	2.02
query20	0.01	0.01	0.01
query21	15.37	0.94	0.58
query22	0.76	0.77	0.64
query23	15.38	1.45	0.51
query24	3.38	0.88	1.49
query25	0.24	0.06	0.11
query26	0.30	0.14	0.14
query27	0.07	0.07	0.06
query28	13.27	1.48	1.04
query29	12.56	4.05	3.34
query30	0.24	0.09	0.07
query31	2.83	0.58	0.38
query32	3.24	0.57	0.46
query33	3.08	3.09	3.06
query34	16.90	5.12	4.55
query35	4.54	4.50	4.53
query36	0.65	0.52	0.48
query37	0.10	0.07	0.06
query38	0.04	0.04	0.04
query39	0.04	0.02	0.02
query40	0.17	0.14	0.13
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.19 s
Total hot run time: 31.31 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.89% (10121/26023)
Line Coverage: 29.88% (85485/286120)
Region Coverage: 29.02% (43704/150576)
Branch Coverage: 25.57% (22297/87216)
Coverage Report: http://coverage.selectdb-in.cc/coverage/be16edd78c014fc075342f0b58d4de0a7f0f2b3e_be16edd78c014fc075342f0b58d4de0a7f0f2b3e/report/index.html

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhannngchen zhannngchen merged commit 2240053 into apache:master Dec 27, 2024
23 of 25 checks passed
bobhan1 added a commit to bobhan1/doris that referenced this pull request Dec 30, 2024
…hen no need to calc delete bitmaps in publish phase (apache#46039)

consider the following situation:
1. Txn A acquires the lock, obtains version X to publish, calculates the
delete bitmap, writes the pending delete bitmap KVs to the MS, but fails
for some reason before committing the transaction in the MS.
2. Txn B acquires the lock, obtains version X to publish, **cleans up
the pending delete bitmap KV written by Txn A**, calculates the delete
bitmap, **writes its pending delete bitmap KV to the MS**, but also
fails for some reason before committing the transaction in the MS.
3. Txn A then reacquires the lock, obtains version X to publish, and
notices that neither the version nor the compaction counts have changed.
It will skip the process of calculating the delete bitmap and writing
the pending delete bitmap KV to the MS
apache#39018 and eventually succeeds in
committing the transaction in the MS.

In this case, Txn A will save the wrong delete bitmaps(generated by Txn
B) in MS and causing correctness problem.

To solve the problem, we should still update delete bitmap KVs in MS
when we skip the calculation of delete bitmap on BE in publish phase.

Also add a defensive check: record `lock_id` when writing pending delete
bitmap keys and check if the `lock_id` is correct when commit txn in MS.
bobhan1 added a commit to bobhan1/doris that referenced this pull request Dec 30, 2024
…hen no need to calc delete bitmaps in publish phase (apache#46039)

consider the following situation:
1. Txn A acquires the lock, obtains version X to publish, calculates the
delete bitmap, writes the pending delete bitmap KVs to the MS, but fails
for some reason before committing the transaction in the MS.
2. Txn B acquires the lock, obtains version X to publish, **cleans up
the pending delete bitmap KV written by Txn A**, calculates the delete
bitmap, **writes its pending delete bitmap KV to the MS**, but also
fails for some reason before committing the transaction in the MS.
3. Txn A then reacquires the lock, obtains version X to publish, and
notices that neither the version nor the compaction counts have changed.
It will skip the process of calculating the delete bitmap and writing
the pending delete bitmap KV to the MS
apache#39018 and eventually succeeds in
committing the transaction in the MS.

In this case, Txn A will save the wrong delete bitmaps(generated by Txn
B) in MS and causing correctness problem.

To solve the problem, we should still update delete bitmap KVs in MS
when we skip the calculation of delete bitmap on BE in publish phase.

Also add a defensive check: record `lock_id` when writing pending delete
bitmap keys and check if the `lock_id` is correct when commit txn in MS.
zhannngchen pushed a commit that referenced this pull request Dec 31, 2024
… KVs in MS when no need to calc delete bitmaps in publish phase #46039 (#46139)

pick #46039
zhannngchen pushed a commit that referenced this pull request Jan 13, 2025
…it txn in MS (#46841)

### What problem does this PR solve?

Related PR: #46039

Problem Summary:

#46039 add a defensive check when
commit_txn in MS to check whether the `lock_id` of pending delete
bitmaps on tablets involved in the txn is the current txn's `lock_id`.
But this may report a false negative in the following circumstance:

1. heavy schema change begins and add shadow index to table.
2. txn A load data to base index and shadow index.
3. txn A write its pending delete bitmaps on MS. This includes tablets
of base index and shadow index.
4. txn A failed to remove its pending delete bitmaps for some reson(e.g.
`commit_txn()` failed due to too large value)
5. txn B load data to base index and shadow index.
6. schema change failed for some reason and **remove shadow index on
table.**
7. txn B send delete bitmap calculation task to BE. **Note that this
will not involved tablets under shadow index because these tablets have
been dropped.** **So these tablets' pending delete bitmaps will still be
txn A's**.
8. txn B commit txn on MS and find that pending delete bitmaps'
`lock_id` on tablets under shadow index not match. And txn B will
failed.

We can see that the checks on these dropped tablets are useless so we
remove the mandatory check to avoid this false negative and print a
warning log instead to help locate problems.
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jan 13, 2025
…it txn in MS (apache#46841)

Related PR: apache#46039

Problem Summary:

apache#46039 add a defensive check when
commit_txn in MS to check whether the `lock_id` of pending delete
bitmaps on tablets involved in the txn is the current txn's `lock_id`.
But this may report a false negative in the following circumstance:

1. heavy schema change begins and add shadow index to table.
2. txn A load data to base index and shadow index.
3. txn A write its pending delete bitmaps on MS. This includes tablets
of base index and shadow index.
4. txn A failed to remove its pending delete bitmaps for some reson(e.g.
`commit_txn()` failed due to too large value)
5. txn B load data to base index and shadow index.
6. schema change failed for some reason and **remove shadow index on
table.**
7. txn B send delete bitmap calculation task to BE. **Note that this
will not involved tablets under shadow index because these tablets have
been dropped.** **So these tablets' pending delete bitmaps will still be
txn A's**.
8. txn B commit txn on MS and find that pending delete bitmaps'
`lock_id` on tablets under shadow index not match. And txn B will
failed.

We can see that the checks on these dropped tablets are useless so we
remove the mandatory check to avoid this false negative and print a
warning log instead to help locate problems.
dataroaring pushed a commit that referenced this pull request Jan 21, 2025
…_txn()` (#47136)

### What problem does this PR solve?

Related PR: #46039

Problem Summary:

#46039 introduce an defensive check
when `commit_txn()`, but this may influence the commit process. This PR
remove this check totally to eliminate this overhead.
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jan 21, 2025
…_txn()` (apache#47136)

Related PR: apache#46039

Problem Summary:

apache#46039 introduce an defensive check
when `commit_txn()`, but this may influence the commit process. This PR
remove this check totally to eliminate this overhead.
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jan 22, 2025
…_txn()` (apache#47136)

Related PR: apache#46039

Problem Summary:

apache#46039 introduce an defensive check
when `commit_txn()`, but this may influence the commit process. This PR
remove this check totally to eliminate this overhead.
BiteTheDDDDt pushed a commit to BiteTheDDDDt/incubator-doris that referenced this pull request Feb 7, 2025
lzyy2024 pushed a commit to lzyy2024/doris that referenced this pull request Feb 21, 2025
…_txn()` (apache#47136)

### What problem does this PR solve?

Related PR: apache#46039

Problem Summary:

apache#46039 introduce an defensive check
when `commit_txn()`, but this may influence the commit process. This PR
remove this check totally to eliminate this overhead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.4-merged p0_w reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.