[stdlib][docs] Clarify the Cost of Raising #5349

owenhilyard · Sep 22, 2025

Given that we've seen a lot of confusion around the cost of raising in Mojo, add clarification to the Mojo manual that raises is not the same as C++ exceptions and include some reasoning as to why it's better than Rust's Result.

Given that we've seen a lot of confusion around the cost of raising in Mojo, add clarification to the Mojo manual that `raises` is not the same as C++ exceptions and include some reasoning as to why it's better than Rust's `Result`. Signed-off-by: Owen Hilyard <hilyard.owen@gmail.com>

lattner · Sep 22, 2025

@arthurevans fyi

NathanSWard · Sep 23, 2025

mojo/docs/manual/errors.mdx

+you raise and returning a boolean, meaning that it has similar overhead to errors as
+values (C and Rust), but still enables "placement new" and other optimizations that 


It may be worth adding a short code snippet here on how the compiler would transform a raises function into this. Visual/code example are often a bit easier to understand than walls of text.

Right now my local Mojo install and compiler explorer are having a fairly strong disagreement about this.

Here's what compiler explorer says Mojo emits, which, while not great (see the Rust version), is a reasonable lowering given the constraints of raises.

But when I do it locally using this:

from compile import compile_info from utils import StaticTuple @export(ABI="C") fn bounds_check(read l: List[UInt], var index: UInt) raises: """ Checks that an index is in-bounds for a list, raising if it is not. """ if index >= len(l): raise "Index too large" def main(): print(compile_info[bounds_check, emission_kind="asm"]())

and run it I get the following:

bounds_check: .Lbounds_check$local: .type .Lbounds_check$local,@function .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 pushq %r15 .cfi_def_cfa_offset 24 pushq %r14 .cfi_def_cfa_offset 32 pushq %r13 .cfi_def_cfa_offset 40 pushq %r12 .cfi_def_cfa_offset 48 pushq %rbx .cfi_def_cfa_offset 56 subq $24, %rsp .cfi_def_cfa_offset 80 .cfi_offset %rbx, -56 .cfi_offset %r12, -48 .cfi_offset %r13, -40 .cfi_offset %r14, -32 .cfi_offset %r15, -24 .cfi_offset %rbp, -16 movq 8(%rdi), %r14 movq %r8, %rbx movq %rdx, %r15 movq %rsi, %r12 cmpq %r14, %rsi jae .LBB0_2 jmp .LBB0_19 .LBB0_2: movq %rcx, 16(%rsp) movq $0, (%rsp) movq %rsp, %rdi xorl %esi, %esi callq KGEN_CompilerRT_GetStackTrace@PLT testq %rax, %rax je .LBB0_3 movl $1, %edi movq %rax, %rsi movq %rax, %rbp movq %rbx, 8(%rsp) callq KGEN_CompilerRT_AlignedAlloc@PLT movl $8, %edi movl $16, %esi movq %rax, %rbx callq KGEN_CompilerRT_AlignedAlloc@PLT movq (%rsp), %rdi movq $1, (%rax) movq %rax, %r13 movq %rbx, 8(%rax) cmpq $4, %rbp jg .LBB0_7 movzbl (%rdi), %eax movb %al, (%rbx) movzbl -1(%rdi,%rbp), %eax movb %al, -1(%rbx,%rbp) cmpq $3, %rbp jl .LBB0_17 movzbl 1(%rdi), %eax movb %al, 1(%rbx) movzbl -2(%rdi,%rbp), %eax movb %al, -2(%rbx,%rbp) jmp .LBB0_17 .LBB0_3: movl $8, %edi movl $16, %esi callq KGEN_CompilerRT_AlignedAlloc@PLT movq $1, (%rax) movq %rax, %r13 movq $0, 8(%rax) jmp .LBB0_18 .LBB0_7: cmpq $16, %rbp jg .LBB0_11 cmpq $8, %rbp jl .LBB0_10 movq (%rdi), %rax movq %rax, (%rbx) movq -8(%rdi,%rbp), %rax movq %rax, -8(%rbx,%rbp) jmp .LBB0_17 .LBB0_11: movabsq $9223372036854775776, %rax andq %rbp, %rax je .LBB0_14 xorl %ecx, %ecx .p2align 4 .LBB0_13: vmovups (%rdi,%rcx), %ymm0 vmovups %ymm0, (%rbx,%rcx) addq $32, %rcx cmpq %rax, %rcx jb .LBB0_13 .LBB0_14: testb $31, %bpl je .LBB0_17 subq %rax, %rbp .p2align 4 .LBB0_16: movzbl (%rdi,%rax), %ecx decq %rbp movb %cl, (%rbx,%rax) incq %rax testq %rbp, %rbp jg .LBB0_16 jmp .LBB0_17 .LBB0_10: movl (%rdi), %eax movl %eax, (%rbx) movl -4(%rdi,%rbp), %eax movl %eax, -4(%rbx,%rbp) .LBB0_17: vzeroupper callq free@PLT movq 8(%rsp), %rbx .LBB0_18: movq 16(%rsp), %rcx movl $15, %eax leaq static_string_d0af4fb5fae76f45(%rip), %rdi .LBB0_19: vmovdqa64 .LCPI0_0(%rip), %zmm1 vpbroadcastq %r13, %zmm0 vpbroadcastq %rax, %zmm2 cmpq %r14, %r12 sbbl %edx, %edx xorl %esi, %esi cmpq %r14, %r12 kmovd %edx, %k0 cmovaeq %rdi, %rsi knotw %k0, %k1 movq %rsi, (%r15) vpsrlvq %zmm1, %zmm0, %zmm0 vpsrlvq %zmm1, %zmm2, %zmm1 vpmovqb %zmm1, %xmm1 vpmovqb %zmm0, %xmm0 vpunpcklqdq %xmm0, %xmm1, %xmm0 vmovdqu8 %xmm0, %xmm0 {%k1} {z} vmovdqu %xmm0, (%rcx) setb (%rbx) addq $24, %rsp .cfi_def_cfa_offset 56 popq %rbx .cfi_def_cfa_offset 48 popq %r12 .cfi_def_cfa_offset 40 popq %r13 .cfi_def_cfa_offset 32 popq %r14 .cfi_def_cfa_offset 24 popq %r15 .cfi_def_cfa_offset 16 popq %rbp .cfi_def_cfa_offset 8 vzeroupper retq

Is there some kind of work happening around raises that would cause a massive performance regression like this since the last time compiler explorer updated their mojo nightly? I don't have the ability to look at how raises is actually lowered so I was planning to work upwards from assembly and/or llvm IR.

I'm not sure if having code here is a great idea given this issue. As-is, what I've said in the docs is apparently untrue and raises is actually quite expensive. Compiler explorer seems to have an out of date version of Mojo where this was cheaper.

I agree about not having code like this here. I don't think it would be helpful in the context of a basic section explaining how to raise and handle errors.

If we do end up including a note like this, it might be more helpful to include pseudocode showing the rough equivalent (in Mojo syntax).

For now, it seems like there's an open question about the performance of raising in Mojo and whether there's been a recent regression, which we should follow up on before updating the docs. @owenhilyard did you open a ticket about this?

it might be more helpful to include pseudocode showing the rough equivalent (in Mojo syntax).

Yep, this is what I was referring to. Not actual assembly or llvm codegen. But mojo pseudo code

@arthurevans Chris seemed to be well aware of the issue so I had assumed it was already tracked internally. I can open an issue if you aren't aware of one which already exists.

I was planning to have some roughly equivalent Mojo code, but wanted to verify it had a vaguely equivalent cost, which is why I was looking at assembly.

Is there some kind of work happening around raises that would cause a massive performance regression like this since the last time compiler explorer updated their mojo nightly?

@owenhilyard do you have a benchmark to share so that I can measure performance regression you see?

Details: The change in asm comes from ~2m old commit 6d87e52 that added ability to get a stack trace of the raised Error. What I don't understand is why nightly in compiler explorer doesn't show this.
That functionality, however, is disabled by default due to massive perf drop it can cause: KGEN_CompilerRT_GetStackTrace internally checks if MOJO_ENABLE_STACK_TRACE_ON_ERROR env var is set. If it's not set, nothing is collected and no extra logic later to do memcpy is done.

The benchmark and data I used:

from time import perf_counter_ns @no_inline def throw_exception(): raise Error("dummy") def main(): alias iters = 1000 var total_time: Int = 0 var start_time = perf_counter_ns() for _ in range(iters): try: throw_exception() except e: _ = e pass total_time += perf_counter_ns() - start_time print(total_time)

mojo build test.mojo -o test.out

Time reported by hyperfine

Before 6d87e52 6d87e52 + no MOJO_ENABLE_STACK_TRACE_ON_ERROR 6d87e52 + MOJO_ENABLE_STACK_TRACE_ON_ERROR

20.6 ms +- 1.0 ms 20.4 ms + 0.8 ms 47.02 s +- 0.1 ms

That is, I don't see perf change before and after 6d87e52 commit with no MOJO_ENABLE_STACK_TRACE_ON_ERROR env var set. However, if that variable IS set, perf drops by more than 2000x times.

The regression I was talking about was in the generated code. That being stack trace handling code makes sense, but that's a lot of code to have inlined and gets rid of the "it's syntactic sugar for returning a boolean and the actual return value" argument I was trying to make since it now does quite a bit more. It looks like compiler explorer has since updated and I forgot to permalink, so the outputs are gone, but previously it was a about a dozen instructions. Even in the new generated code, there's still a bunch of vector stuff in what should be a fully scalar happy path (notice the single retq in the function being at the end with no labels to jump past the vector code. Vector ops on zen 4 aren't as bad as skylake X, but AVX512 still isn't cheap. This means that even assuming the branch predictor works, you still have a fairly substantial body of vector ops which makes using this in kernel code not a great idea since you can't raise for "free" while OoO takes care of other vector ops.

What would probably work as a quick fix is to have a way to disable stack trace gathering statically. Once we have parametric raises, that can be opt-in behavior based on your error type.

owenhilyard · Oct 2, 2025

Given that raises is actually quite expensive right now, we should probably pause this until parametric raises makes raises cheap.

lattner · Oct 11, 2025

I'm sorry I am just getting caught up on this now. Indeed, the introduction of stack trace stuff was a major regression, unnecessarily so. This is deeply unfortunately, I'd love help improving this. I filed some notes here: #5423

dgurchenkov · Oct 13, 2025

Let's put this PR on hold for a moment. The raising code indeed has some low hanging fruit improvements, we should do those first, and then take another loop at the generated code and decide how to explain the raising in the doc.

owenhilyard · Oct 14, 2025

I agree @dgurchenkov. I want it to actually be a zero-overhead solution before we claim it is one. If that means waiting for parametric raises so be it.

owenhilyard requested a review from a team as a code owner September 22, 2025 14:49

JoeLoser assigned arthurevans Sep 23, 2025

NathanSWard reviewed Sep 23, 2025

View reviewed changes

owenhilyard mentioned this pull request Oct 15, 2025

[Feature Request] [stdlib] Raise on most ffi errors until we have parametric raises #5455

Open

1 task

		you raise and returning a boolean, meaning that it has similar overhead to errors as
		values (C and Rust), but still enables "placement new" and other optimizations that

Search code, repositories, users, issues, pull requests...

[stdlib][docs] Clarify the Cost of Raising #5349

Are you sure you want to change the base?

[stdlib][docs] Clarify the Cost of Raising #5349

Uh oh!

Conversation

owenhilyard commented Sep 22, 2025

Uh oh!

lattner commented Sep 22, 2025

Uh oh!

NathanSWard Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

owenhilyard Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

owenhilyard Sep 27, 2025

Choose a reason for hiding this comment

Uh oh!

arthurevans Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

NathanSWard Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

owenhilyard Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

npanchen Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

owenhilyard Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

owenhilyard commented Oct 2, 2025

Uh oh!

lattner commented Oct 11, 2025

Uh oh!

dgurchenkov commented Oct 13, 2025

Uh oh!

owenhilyard commented Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

NathanSWard Oct 6, 2025 •

edited

Loading