Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Add MessagePackPrimitives class #1986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 7, 2024
Merged

Add MessagePackPrimitives class #1986

merged 9 commits into from
Nov 7, 2024

Conversation

AArnott
Copy link
Collaborator

@AArnott AArnott commented Oct 7, 2024

Closes #1985

And use it from MessagePackWriter and MessagePackReader.

  • Measure perf impact

Trivia:

16 files changed, 3301 insertions(+), 1139 deletions(-)

Perf comparisons

All of these are on .NET 8

Deserialize

Before

Method Mean Error Ratio Gen0 Gen1 Allocated Alloc Ratio
IntKey 108.3 ns NA 1.00 0.0067 - 56 B 1.00
IntKey_NonGeneric 114.7 ns NA 1.06 0.0067 - 56 B 1.00
StringKey 228.2 ns NA 2.11 0.0067 - 56 B 1.00
Typeless_IntKey 302.7 ns NA 2.79 0.0067 - 56 B 1.00
Typeless_StringKey 446.1 ns NA 4.12 0.0067 - 56 B 1.00
MsgPackCliMap 705.0 ns NA 6.51 0.0696 - 584 B 10.43
MsgPackCliArray 236.5 ns NA 2.18 0.0181 - 152 B 2.71
ProtobufNet 208.8 ns NA 1.93 0.0172 - 144 B 2.57
Hyperion 309.7 ns NA 2.86 0.0553 - 464 B 8.29
JsonNetString 1,665.6 ns NA 15.38 0.3147 - 2640 B 47.14
JsonNetStreamReader 2,016.4 ns NA 18.62 0.6828 0.0076 5728 B 102.29
JilString 426.1 ns NA 3.93 0.0067 - 56 B 1.00
JilStreamReader 737.9 ns NA 6.81 0.4063 0.0038 3400 B 60.71

After

Method Mean Error Ratio Gen0 Gen1 Allocated Alloc Ratio
IntKey 119.7 ns NA 1.00 0.0067 - 56 B 1.00
IntKey_NonGeneric 135.2 ns NA 1.13 0.0067 - 56 B 1.00
StringKey 299.1 ns NA 2.50 0.0067 - 56 B 1.00
Typeless_IntKey 315.7 ns NA 2.64 0.0067 - 56 B 1.00
Typeless_StringKey 489.7 ns NA 4.09 0.0067 - 56 B 1.00
MsgPackCliMap 689.5 ns NA 5.76 0.0696 - 584 B 10.43
MsgPackCliArray 245.7 ns NA 2.05 0.0181 - 152 B 2.71
ProtobufNet 206.8 ns NA 1.73 0.0172 - 144 B 2.57
Hyperion 306.5 ns NA 2.56 0.0553 - 464 B 8.29
JsonNetString 1,765.8 ns NA 14.75 0.3147 - 2640 B 47.14
JsonNetStreamReader 1,917.2 ns NA 16.01 0.6828 0.0076 5728 B 102.29
JilString 425.3 ns NA 3.55 0.0067 - 56 B 1.00
JilStreamReader 763.7 ns NA 6.38 0.4063 0.0038 3400 B 60.71

Serialize

Before

Method Mean Error Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
IntKey 97.95 ns NA 1.00 0.00 0.0048 - 40 B 1.00
IntKey_NonGeneric 102.41 ns NA 1.05 0.00 0.0048 - 40 B 1.00
StringKey 134.65 ns NA 1.37 0.00 0.0172 - 144 B 3.60
Typeless_IntKey 286.87 ns NA 2.93 0.00 0.0191 - 160 B 4.00
Typeless_StringKey 326.14 ns NA 3.33 0.00 0.0324 - 272 B 6.80
MsgPackCliMap 532.29 ns NA 5.43 0.00 0.0629 - 528 B 13.20
MsgPackCliArray 191.66 ns NA 1.96 0.00 0.0477 - 400 B 10.00
ProtobufNet 154.07 ns NA 1.57 0.00 0.0076 - 64 B 1.60
Hyperion 164.94 ns NA 1.68 0.00 0.0782 - 656 B 16.40
ZeroFormatter NA NA ? ? NA NA NA ?
JsonNetString 779.63 ns NA 7.96 0.00 0.2260 - 1896 B 47.40
JsonNetStreamWriter 879.99 ns NA 8.98 0.00 0.3929 0.0019 3288 B 82.20
JilString 289.99 ns NA 2.96 0.00 0.1740 - 1456 B 36.40
JilStreamWriter 355.89 ns NA 3.63 0.00 0.3395 0.0029 2840 B 71.00

After

Method Mean Error Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
IntKey 147.5 ns NA 1.00 0.00 0.0048 - 40 B 1.00
IntKey_NonGeneric 153.9 ns NA 1.04 0.00 0.0048 - 40 B 1.00
StringKey 173.3 ns NA 1.17 0.00 0.0172 - 144 B 3.60
Typeless_IntKey 331.0 ns NA 2.24 0.00 0.0191 - 160 B 4.00
Typeless_StringKey 364.9 ns NA 2.47 0.00 0.0324 - 272 B 6.80
MsgPackCliMap 532.7 ns NA 3.61 0.00 0.0629 - 528 B 13.20
MsgPackCliArray 178.8 ns NA 1.21 0.00 0.0477 - 400 B 10.00
ProtobufNet 152.1 ns NA 1.03 0.00 0.0076 - 64 B 1.60
Hyperion 191.4 ns NA 1.30 0.00 0.0782 - 656 B 16.40
ZeroFormatter NA NA ? ? NA NA NA ?
JsonNetString 790.5 ns NA 5.36 0.00 0.2260 - 1896 B 47.40
JsonNetStreamWriter 956.0 ns NA 6.48 0.00 0.3929 0.0019 3288 B 82.20
JilString 295.2 ns NA 2.00 0.00 0.1740 - 1456 B 36.40
JilStreamWriter 366.1 ns NA 2.48 0.00 0.3395 0.0029 2840 B 71.00

MessagePackReader

Before

Method Categories Mean Error StdDev
ReadByte10 1.0 1.085 ns 0.0093 ns 0.0087 ns
ReadByte20 2.0 3.396 ns 0.0212 ns 0.0198 ns

After

Method Categories Mean Error StdDev Ratio RatioSD
ReadByte10 1.0 1.085 ns 0.0062 ns 0.0058 ns ? ?
ReadBytePrimitiveInt32 2.0 1.262 ns 0.0088 ns 0.0082 ns 0.49 0.00
ReadByte20 2.0 2.586 ns 0.0211 ns 0.0197 ns 1.00 0.01

MessagePackWriter

Before

Method Categories Mean Error StdDev Median
WriteByte 1.x 1.469 ns 0.0117 ns 0.0104 ns 1.467 ns
WriteInt32 1.x 1.701 ns 0.0090 ns 0.0080 ns 1.700 ns
WriteString 1.x 12.795 ns 0.1105 ns 0.1034 ns 12.823 ns
Write_Byte 2.0 1.712 ns 0.0158 ns 0.0140 ns 1.714 ns
Write_UInt32 2.0 1.932 ns 0.0172 ns 0.0153 ns 1.927 ns
Write_Int32 2.0 1.940 ns 0.0145 ns 0.0161 ns 1.943 ns
Write_String 2.0 21.114 ns 0.9287 ns 2.7384 ns 22.902 ns

After

Method Categories Mean Error StdDev Median
WriteByte 1.x 1.488 ns 0.0117 ns 0.0104 ns 1.488 ns
WriteInt32 1.x 1.705 ns 0.0109 ns 0.0102 ns 1.705 ns
WriteString 1.x 12.735 ns 0.0723 ns 0.0604 ns 12.722 ns
Write_Byte 2.0 1.944 ns 0.0153 ns 0.0119 ns 1.943 ns
Write_UInt32 2.0 2.421 ns 0.0194 ns 0.0181 ns 2.427 ns
Write_Int32 2.0 2.674 ns 0.0197 ns 0.0175 ns 2.678 ns
Write_String 2.0 21.789 ns 1.0808 ns 3.1867 ns 23.365 ns

@AArnott AArnott added this to the v3.0 milestone Oct 7, 2024
@AArnott AArnott requested a review from neuecc October 7, 2024 19:20
@AArnott AArnott force-pushed the MessagePackPrimitives branch from 8a63bce to 97a95c3 Compare October 7, 2024 19:59
@AArnott
Copy link
Collaborator Author

AArnott commented Oct 7, 2024

The perf is significantly slower after this change. I haven't studied the ETL trace, but my guess is we won't find an efficient way to call from the reader/writer structs into the MessagePackPrimitives class. If we bring back this class from v1, it may be without any use case within the library for it, which means we'll have to add tests specifically for it.

@AArnott
Copy link
Collaborator Author

AArnott commented Oct 7, 2024

@neuecc Do you have any time and interest in mitigating the perf regressions here? Should we abandon the PR?

@AArnott
Copy link
Collaborator Author

AArnott commented Oct 7, 2024

The MessagePackWriter shouldn't be fundamentally any slower after this change. The fact that it is slower suggests something we can probably fix (e.g. bring back AggressiveInline attributes that I may have dropped).

It's the MessagePackReader side that I'm more concerned about inescapable perf hits. But I'm not sure.

@AArnott AArnott mentioned this pull request Oct 7, 2024
@neuecc
Copy link
Member

neuecc commented Oct 8, 2024

Thanks, I'll check now.

@neuecc
Copy link
Member

neuecc commented Oct 8, 2024

I added ReadBytePrimitiveInt32 and got this result:

[Benchmark(OperationsPerInvoke = BufferLength)]
[BenchmarkCategory("2.0")]
public void ReadBytePrimitiveInt32()
{
    int offset = 0;
    for (int i = 0; i < this.buffer.Length; i++)
    {
        newmsgpack::MessagePack.MessagePackPrimitives.TryReadInt32(this.buffer.AsSpan(offset), out int value, out var readSize);
        offset += readSize;
    }
}

image

I think the overhead simply doubled.
After trying a few things, it seemed that the size of the method itself was having an impact.
Just by moving the FixedInt code to the true branch of an if statement and placing/not placing a switch statement on the else side, the benchmark results changed significantly.

The contents of v1 pushed the branching into array and virtual method calls, making the method body very small.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int ReadInt32(byte[] bytes, int offset, out int readSize)
{
    return int32Decoders[bytes[offset]].Read(bytes, offset, out readSize);
}

I tried various approaches, but it seems difficult to achieve performance equivalent to this...

Note that for the benchmark measurements, I was using WarmupCount 1 and IterationCount 1 to speed up the execution iterations.

class BenchmarkConfig : ManualConfig
{
    public BenchmarkConfig()
    {
        AddJob(Job.ShortRun.WithWarmupCount(1).WithIterationCount(1));
    }
}

[Config(typeof(BenchmarkConfig))]
[GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory)]
[CategoriesColumn]
public class MessagePackReaderBenchmark

By the way, the excessive overloading specification was difficult to use.
I don't think it's a good design to separate methods based on the type of the out parameter.
It should be done like TryReadInt32 so that it can be received with var.
This would also maintain symmetry with MessagePackReader.

@AArnott
Copy link
Collaborator Author

AArnott commented Nov 3, 2024

@neuecc Excellent perf hints. Thank you. I have a prototype with these numbers:

Method Categories Mean Error StdDev Ratio RatioSD
ReadByte10 1.0 1.094 ns 0.0062 ns 0.0058 ns ? ?
ReadBytePrimitiveInt32 2.0 1.193 ns 0.0069 ns 0.0061 ns 0.50 0.00
ReadByte20 2.0 2.395 ns 0.0126 ns 0.0111 ns 1.00 0.00

That's better than v2's original perf before my change. Yay.

| Method                 | Categories | Mean     | Error     | StdDev    | Ratio | RatioSD |
|----------------------- |----------- |---------:|----------:|----------:|------:|--------:|
| ReadByte10             | 1.0        | 1.085 ns | 0.0053 ns | 0.0047 ns |     ? |       ? |
|                        |            |          |           |           |       |         |
| ReadBytePrimitiveInt32 | 2.0        | 1.243 ns | 0.0025 ns | 0.0019 ns |  0.48 |    0.00 |
| ReadByte20             | 2.0        | 2.582 ns | 0.0072 ns | 0.0060 ns |  1.00 |    0.00 |
@AArnott
Copy link
Collaborator Author

AArnott commented Nov 3, 2024

I've updated the benchmark tables in the PR description. Read speed is awesome. Write speed is regressed. Based on what I've learned in speeding up read speeds, I think I know why.

Copy link
Collaborator Author

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go at this point. Some of the higher level perf tables suggest regressions in 'after', but you'll note that the whole table is regressed including irrelevant things to the changes such as Hyperion, so if you take that into account and adjust, it looks about right. And the microbenchmarks look either better or close to original.

@AArnott AArnott modified the milestones: v3.1, v3.0 Nov 3, 2024
This avoids a name conflict with the result of `PipeReader.ReadAsync`
Copy link
Member

@neuecc neuecc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!
I was thinking of not doing this rewrite since it would be a hassle, so I'm happy to see it implemented.
If this is going in, then there's probably no need to rush to include #2054 which has some incomplete behavior (although Span support should be added eventually)

@AArnott AArnott merged commit 44f79c1 into develop Nov 7, 2024
6 checks passed
@AArnott AArnott deleted the MessagePackPrimitives branch November 7, 2024 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.