Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Discussion options

This is not really an issue per-se, but I'm trying to gain some insight into why the ArrayBase serialization format is the way it is.

With this example 2-D array:
array![[0., 1., 2.], [3., 4., 5.], [6., 7., 8.]]

serialization output to JSON looks like this:
{"v":1,"dim":[3,3],"data":[0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0]}

What I wonder is why the more human-readable (and smaller-character-count, depending on data) format generated by alternate ser/de functions defined in serde_ndim is not used instead:
[[0.0,1.0,2.0],[3.0,4.0,5.0],[6.0,7.0,8.0]]
In fact, its exactly how a user would supply an array syntactically in code.

I can speculate a few reasons, but would like insight from contributors to ndarray. Is it just performance reasons? I understand arrays are all 1-D in memory, and decoding the shape must take some amount of extra processing time. Are there other considerations that I'm missing?

I'm also looking for some insight into why the version field v exists.

Thanks! :)

You must be logged in to vote

Replies: 1 comment

Comment options

I think I can answer most of this, although I didn't write the serialization code.

On the JSON format, I think this is because our serialization implementation is generic over the Serializer, as most serde implementations are. So we don't provide specific code for JSON in particular, we just tell serde how any Serializer should interpret our data: as a linear Sequence, with some "metadata" about the shape (and a version number). serde_json is then responsible for turning this into a JSON representation, specifically.

Yes, I think that processing time of "discovering" the shape on deserialization could be expensive. In particular, think of it this way: if we know the shape of the data independently, we can grab a block of uninitialized memory of the exact size we need and then iterate once through the data itself to fill it in.

I think the version field is a sort of humble admission that we're not going to write "the one serializer to rule them all" the first time. You can consider a whole slew of optimizations and enhancements: including whether the data is (approximately) C- or F-order, packing arrays of bool into smaller representations with a bit for each element, etc. You may need a different "version" of the serialization format for these changes.

Lmk if that answers your questions!

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
🙏
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #1518 on November 07, 2025 02:31.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.