Serialization format questions #1544

Jun 6, 2025

kylecarow
Jun 6, 2025

This is not really an issue per-se, but I'm trying to gain some insight into why the ArrayBase serialization format is the way it is.

With this example 2-D array:
array![[0., 1., 2.], [3., 4., 5.], [6., 7., 8.]]

serialization output to JSON looks like this:
{"v":1,"dim":[3,3],"data":[0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0]}

What I wonder is why the more human-readable (and smaller-character-count, depending on data) format generated by alternate ser/de functions defined in serde_ndim is not used instead:
[[0.0,1.0,2.0],[3.0,4.0,5.0],[6.0,7.0,8.0]]
In fact, its exactly how a user would supply an array syntactically in code.

I can speculate a few reasons, but would like insight from contributors to ndarray. Is it just performance reasons? I understand arrays are all 1-D in memory, and decoding the shape must take some amount of extra processing time. Are there other considerations that I'm missing?

I'm also looking for some insight into why the version field v exists.

Thanks! :)

Jun 7, 2025

akern40
Jun 7, 2025
Maintainer

I think I can answer most of this, although I didn't write the serialization code.

On the JSON format, I think this is because our serialization implementation is generic over the Serializer, as most serde implementations are. So we don't provide specific code for JSON in particular, we just tell serde how any Serializer should interpret our data: as a linear Sequence, with some "metadata" about the shape (and a version number). serde_json is then responsible for turning this into a JSON representation, specifically.

Yes, I think that processing time of "discovering" the shape on deserialization could be expensive. In particular, think of it this way: if we know the shape of the data independently, we can grab a block of uninitialized memory of the exact size we need and then iterate once through the data itself to fill it in.

I think the version field is a sort of humble admission that we're not going to write "the one serializer to rule them all" the first time. You can consider a whole slew of optimizations and enhancements: including whether the data is (approximately) C- or F-order, packing arrays of bool into smaller representations with a bit for each element, etc. You may need a different "version" of the serialization format for these changes.

Lmk if that answers your questions!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Serialization format questions #1544

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Search code, repositories, users, issues, pull requests...

Serialization format questions #1544

Uh oh!

Uh oh!

kylecarow Jun 6, 2025

Replies: 1 comment

Uh oh!

akern40 Jun 7, 2025 Maintainer

kylecarow
Jun 6, 2025

akern40
Jun 7, 2025
Maintainer