Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

owenhilyard
Copy link
Contributor

Given that we've seen a lot of confusion around the cost of raising in Mojo, add clarification to the Mojo manual that raises is not the same as C++ exceptions and include some reasoning as to why it's better than Rust's Result.

Given that we've seen a lot of confusion around the cost of raising in
Mojo, add clarification to the Mojo manual that `raises` is not the
same as C++ exceptions and include some reasoning as to why it's better
than Rust's `Result`.

Signed-off-by: Owen Hilyard <hilyard.owen@gmail.com>
@owenhilyard owenhilyard requested a review from a team as a code owner September 22, 2025 14:49
@lattner
Copy link
Collaborator

lattner commented Sep 22, 2025

@arthurevans fyi

Comment on lines +57 to +58
you raise and returning a boolean, meaning that it has similar overhead to errors as
values (C and Rust), but still enables "placement new" and other optimizations that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth adding a short code snippet here on how the compiler would transform a raises function into this. Visual/code example are often a bit easier to understand than walls of text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now my local Mojo install and compiler explorer are having a fairly strong disagreement about this.

Here's what compiler explorer says Mojo emits, which, while not great (see the Rust version), is a reasonable lowering given the constraints of raises.

But when I do it locally using this:

from compile import compile_info
from utils import StaticTuple

@export(ABI="C")
fn bounds_check(read l: List[UInt], var index: UInt) raises:
    """
    Checks that an index is in-bounds for a list, raising if it is not.
    """
    if index >= len(l):
        raise "Index too large"

def main():
    print(compile_info[bounds_check, emission_kind="asm"]())

and run it I get the following:

bounds_check:
.Lbounds_check$local:
        .type   .Lbounds_check$local,@function
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        pushq   %r15
        .cfi_def_cfa_offset 24
        pushq   %r14
        .cfi_def_cfa_offset 32
        pushq   %r13
        .cfi_def_cfa_offset 40
        pushq   %r12
        .cfi_def_cfa_offset 48
        pushq   %rbx
        .cfi_def_cfa_offset 56
        subq    $24, %rsp
        .cfi_def_cfa_offset 80
        .cfi_offset %rbx, -56
        .cfi_offset %r12, -48
        .cfi_offset %r13, -40
        .cfi_offset %r14, -32
        .cfi_offset %r15, -24
        .cfi_offset %rbp, -16
        movq    8(%rdi), %r14
        movq    %r8, %rbx
        movq    %rdx, %r15
        movq    %rsi, %r12
        cmpq    %r14, %rsi
        jae     .LBB0_2
        jmp     .LBB0_19
.LBB0_2:
        movq    %rcx, 16(%rsp)
        movq    $0, (%rsp)
        movq    %rsp, %rdi
        xorl    %esi, %esi
        callq   KGEN_CompilerRT_GetStackTrace@PLT
        testq   %rax, %rax
        je      .LBB0_3
        movl    $1, %edi
        movq    %rax, %rsi
        movq    %rax, %rbp
        movq    %rbx, 8(%rsp)
        callq   KGEN_CompilerRT_AlignedAlloc@PLT
        movl    $8, %edi
        movl    $16, %esi
        movq    %rax, %rbx
        callq   KGEN_CompilerRT_AlignedAlloc@PLT
        movq    (%rsp), %rdi
        movq    $1, (%rax)
        movq    %rax, %r13
        movq    %rbx, 8(%rax)
        cmpq    $4, %rbp
        jg      .LBB0_7
        movzbl  (%rdi), %eax
        movb    %al, (%rbx)
        movzbl  -1(%rdi,%rbp), %eax
        movb    %al, -1(%rbx,%rbp)
        cmpq    $3, %rbp
        jl      .LBB0_17
        movzbl  1(%rdi), %eax
        movb    %al, 1(%rbx)
        movzbl  -2(%rdi,%rbp), %eax
        movb    %al, -2(%rbx,%rbp)
        jmp     .LBB0_17
.LBB0_3:
        movl    $8, %edi
        movl    $16, %esi
        callq   KGEN_CompilerRT_AlignedAlloc@PLT
        movq    $1, (%rax)
        movq    %rax, %r13
        movq    $0, 8(%rax)
        jmp     .LBB0_18
.LBB0_7:
        cmpq    $16, %rbp
        jg      .LBB0_11
        cmpq    $8, %rbp
        jl      .LBB0_10
        movq    (%rdi), %rax
        movq    %rax, (%rbx)
        movq    -8(%rdi,%rbp), %rax
        movq    %rax, -8(%rbx,%rbp)
        jmp     .LBB0_17
.LBB0_11:
        movabsq $9223372036854775776, %rax
        andq    %rbp, %rax
        je      .LBB0_14
        xorl    %ecx, %ecx
        .p2align        4
.LBB0_13:
        vmovups (%rdi,%rcx), %ymm0
        vmovups %ymm0, (%rbx,%rcx)
        addq    $32, %rcx
        cmpq    %rax, %rcx
        jb      .LBB0_13
.LBB0_14:
        testb   $31, %bpl
        je      .LBB0_17
        subq    %rax, %rbp
        .p2align        4
.LBB0_16:
        movzbl  (%rdi,%rax), %ecx
        decq    %rbp
        movb    %cl, (%rbx,%rax)
        incq    %rax
        testq   %rbp, %rbp
        jg      .LBB0_16
        jmp     .LBB0_17
.LBB0_10:
        movl    (%rdi), %eax
        movl    %eax, (%rbx)
        movl    -4(%rdi,%rbp), %eax
        movl    %eax, -4(%rbx,%rbp)
.LBB0_17:
        vzeroupper
        callq   free@PLT
        movq    8(%rsp), %rbx
.LBB0_18:
        movq    16(%rsp), %rcx
        movl    $15, %eax
        leaq    static_string_d0af4fb5fae76f45(%rip), %rdi
.LBB0_19:
        vmovdqa64       .LCPI0_0(%rip), %zmm1
        vpbroadcastq    %r13, %zmm0
        vpbroadcastq    %rax, %zmm2
        cmpq    %r14, %r12
        sbbl    %edx, %edx
        xorl    %esi, %esi
        cmpq    %r14, %r12
        kmovd   %edx, %k0
        cmovaeq %rdi, %rsi
        knotw   %k0, %k1
        movq    %rsi, (%r15)
        vpsrlvq %zmm1, %zmm0, %zmm0
        vpsrlvq %zmm1, %zmm2, %zmm1
        vpmovqb %zmm1, %xmm1
        vpmovqb %zmm0, %xmm0
        vpunpcklqdq     %xmm0, %xmm1, %xmm0
        vmovdqu8        %xmm0, %xmm0 {%k1} {z}
        vmovdqu %xmm0, (%rcx)
        setb    (%rbx)
        addq    $24, %rsp
        .cfi_def_cfa_offset 56
        popq    %rbx
        .cfi_def_cfa_offset 48
        popq    %r12
        .cfi_def_cfa_offset 40
        popq    %r13
        .cfi_def_cfa_offset 32
        popq    %r14
        .cfi_def_cfa_offset 24
        popq    %r15
        .cfi_def_cfa_offset 16
        popq    %rbp
        .cfi_def_cfa_offset 8
        vzeroupper
        retq

Is there some kind of work happening around raises that would cause a massive performance regression like this since the last time compiler explorer updated their mojo nightly? I don't have the ability to look at how raises is actually lowered so I was planning to work upwards from assembly and/or llvm IR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if having code here is a great idea given this issue. As-is, what I've said in the docs is apparently untrue and raises is actually quite expensive. Compiler explorer seems to have an out of date version of Mojo where this was cheaper.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about not having code like this here. I don't think it would be helpful in the context of a basic section explaining how to raise and handle errors.

If we do end up including a note like this, it might be more helpful to include pseudocode showing the rough equivalent (in Mojo syntax).

For now, it seems like there's an open question about the performance of raising in Mojo and whether there's been a recent regression, which we should follow up on before updating the docs. @owenhilyard did you open a ticket about this?

Copy link
Contributor

@NathanSWard NathanSWard Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be more helpful to include pseudocode showing the rough equivalent (in Mojo syntax).

Yep, this is what I was referring to. Not actual assembly or llvm codegen. But mojo pseudo code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arthurevans Chris seemed to be well aware of the issue so I had assumed it was already tracked internally. I can open an issue if you aren't aware of one which already exists.

I was planning to have some roughly equivalent Mojo code, but wanted to verify it had a vaguely equivalent cost, which is why I was looking at assembly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some kind of work happening around raises that would cause a massive performance regression like this since the last time compiler explorer updated their mojo nightly?

@owenhilyard do you have a benchmark to share so that I can measure performance regression you see?

Details: The change in asm comes from ~2m old commit 6d87e52 that added ability to get a stack trace of the raised Error. What I don't understand is why nightly in compiler explorer doesn't show this.
That functionality, however, is disabled by default due to massive perf drop it can cause: KGEN_CompilerRT_GetStackTrace internally checks if MOJO_ENABLE_STACK_TRACE_ON_ERROR env var is set. If it's not set, nothing is collected and no extra logic later to do memcpy is done.

The benchmark and data I used:

from time import perf_counter_ns

@no_inline
def throw_exception():
    raise Error("dummy")

def main():
    alias iters = 1000
    var total_time: Int = 0
    var start_time = perf_counter_ns()
    for _ in range(iters):
        try:
            throw_exception()
        except e:
            _ = e
            pass
    total_time += perf_counter_ns() - start_time
    print(total_time)
mojo build test.mojo -o test.out

Time reported by hyperfine

Before 6d87e52 6d87e52 + no MOJO_ENABLE_STACK_TRACE_ON_ERROR 6d87e52 + MOJO_ENABLE_STACK_TRACE_ON_ERROR
20.6 ms +- 1.0 ms 20.4 ms + 0.8 ms 47.02 s +- 0.1 ms

That is, I don't see perf change before and after 6d87e52 commit with no MOJO_ENABLE_STACK_TRACE_ON_ERROR env var set. However, if that variable IS set, perf drops by more than 2000x times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regression I was talking about was in the generated code. That being stack trace handling code makes sense, but that's a lot of code to have inlined and gets rid of the "it's syntactic sugar for returning a boolean and the actual return value" argument I was trying to make since it now does quite a bit more. It looks like compiler explorer has since updated and I forgot to permalink, so the outputs are gone, but previously it was a about a dozen instructions. Even in the new generated code, there's still a bunch of vector stuff in what should be a fully scalar happy path (notice the single retq in the function being at the end with no labels to jump past the vector code. Vector ops on zen 4 aren't as bad as skylake X, but AVX512 still isn't cheap. This means that even assuming the branch predictor works, you still have a fairly substantial body of vector ops which makes using this in kernel code not a great idea since you can't raise for "free" while OoO takes care of other vector ops.

What would probably work as a quick fix is to have a way to disable stack trace gathering statically. Once we have parametric raises, that can be opt-in behavior based on your error type.

@owenhilyard
Copy link
Contributor Author

Given that raises is actually quite expensive right now, we should probably pause this until parametric raises makes raises cheap.

@lattner
Copy link
Collaborator

lattner commented Oct 11, 2025

I'm sorry I am just getting caught up on this now. Indeed, the introduction of stack trace stuff was a major regression, unnecessarily so. This is deeply unfortunately, I'd love help improving this. I filed some notes here: #5423

@dgurchenkov
Copy link
Contributor

Let's put this PR on hold for a moment. The raising code indeed has some low hanging fruit improvements, we should do those first, and then take another loop at the generated code and decide how to explain the raising in the doc.

@owenhilyard
Copy link
Contributor Author

I agree @dgurchenkov. I want it to actually be a zero-overhead solution before we claim it is one. If that means waiting for parametric raises so be it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.