Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Is something like sax/event-based parser planned? #2279

Answered by lemire
MBkkt asked this question in Q&A
Discussion options

For an example when parsing json string to something like binary json, it's really better than ondemand.

And I consider other usecases exists, althouth for most usecases dom/ondemand better alternative.

Some example:
https://github.com/ydb-platform/ydb/blob/74df9273b222dc253487508e6f4602237f4a7c11/ydb/library/binary_json/write.cpp#L553

You must be logged in to vote

@MBkkt

with ondemand such use case needs to be recursive :(

This code looks very good to me: https://github.com/ydb-platform/ydb/blob/74df9273b222dc253487508e6f4602237f4a7c11/ydb/library/binary_json/write.cpp#L553

❤️

We get pretty decent results with recursion and the code is relatively elegant. The one annoying caveat is that GCC and LLVM differ in how they handle recursion in practice, so a bit of care is needed if one wants to get good performance. But that's relatively minor.

Please see...

https://github.com/simdjson/simdjson/blob/master/benchmark/json2msgpack/simdjson_ondemand.h

Depending on your compiler, you might get better result when passing by value instead of by reference.

I…

Replies: 1 comment · 7 replies

Comment options

it's really better than ondemand.

We designed On-Demand specifically as a replacement for event-based parsing.

We had, for a time, an event-based API but we removed it. It is not faster and I feel that for most people, it is harder to use.

If you find On-Demand difficult to use for some use cases, please raise specific issues, we will try to address it.

You must be logged in to vote
7 replies
@MBkkt
Comment options

with ondemand such use case needs to be recursive :(

Is any option how to use ondemand without recursion?

@lemire
Comment options

@MBkkt

with ondemand such use case needs to be recursive :(

This code looks very good to me: https://github.com/ydb-platform/ydb/blob/74df9273b222dc253487508e6f4602237f4a7c11/ydb/library/binary_json/write.cpp#L553

❤️

We get pretty decent results with recursion and the code is relatively elegant. The one annoying caveat is that GCC and LLVM differ in how they handle recursion in practice, so a bit of care is needed if one wants to get good performance. But that's relatively minor.

Please see...

https://github.com/simdjson/simdjson/blob/master/benchmark/json2msgpack/simdjson_ondemand.h

Depending on your compiler, you might get better result when passing by value instead of by reference.

Is any option how to use ondemand without recursion?

You can handle the stack yourself if you prefer. The stack is either made of an object and its iterator, or an array and its iterator.

But that's going to be annoying to implement.

However, in simdjson, we already have a JSON to binary conversion routine that runs very fast. It is our DOM builder. And it is done by an event-like system. Please see...

https://github.com/simdjson/simdjson/blob/master/src/generic/stage2/tape_builder.h

For example, we have...

simdjson_warn_unused simdjson_inline error_code tape_builder::visit_true_atom(json_iterator &iter, const uint8_t *value) noexcept {
  iter.log_value("true");
  if (!atomparsing::is_valid_true_atom(value)) { return T_ATOM_ERROR; }
  tape.append(0, internal::tape_type::TRUE_VALUE);
  return SUCCESS;
}

The key ingredient is the 'tape' which receives the values. Obviously, this would work with any "tape"...

We would need to go back in the git tree, but this used to be exposed in some way (but we never documented it).

I'd be concerned about opening this up to everyone because very few people should program this way. It is error prone.

However, if you are willing to help, we could make it usable from people like ydb. It is not very hard because we already use it internally. It is mostly a matter of plumbing.

Let me know if this is something you'd like to help with!!!

Answer selected by MBkkt
@MBkkt
Comment options

tape builder looks really interesting, I will try to play with this code a little and benchmark it with ondemand approach, thanks

I also thought about we have stack in binary json builder and it can have additional payload to use it instead of recursion

@lemire
Comment options

@MBkkt To be honest, I'd be interested in us having a routine that is suitable to write very fast JSON -> something else conversions. If we focus on this one use case, I think we can write something very efficient and very nice.

@MBkkt
Comment options

Sounds awesome, maybe later I will try to make type_builder

@MBkkt
Comment options

Ofc if someone will do it before me, I will be happy :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
🙏
Q&A
Labels
None yet
2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.