Welcome to python-semver Discussions! #329

Dec 9, 2020

tomschr
Dec 9, 2020
Maintainer

👋 Welcome!

We’re using Discussions as a place to connect with other members of our community. We hope that you:

Ask questions you’re wondering about.
Share ideas.
Engage with other community members.
Welcome others and are open-minded. Remember that this is a community we build together 💪.

To get started, comment below with an introduction of yourself and tell us about what you do with this community.

tomschr · Nov 1, 2023

chtaylo3
Nov 1, 2023

Hi @tomschr , I was wondering if you'd be willing to accept/review a PR for an "extract" class method that takes a string as input and returns a generator (or list) for semver objects? We have some package and vulnerability metadata that we need to parse, so that's the use case. I've been experimenting with antlr4 and this grammar as starting point - I think I can put something together that's worth looking at. Provided I can get this up to standard, this could also lay a foundation to help address #241. Please let me know what you think.

7 replies

tomschr Nov 1, 2023
Maintainer Author

Thanks you for your detailed answer and helping me to understand it better! Much appreciated. 👍

I think I get slowly an idea about this topic. Maybe it's due to my cold from which I haven't yet fully recovered, but I'm not completely sure of your suggested change. Am I right that you would like to add this extract class method to the Version class and the result from the Antlr grammar output (the parser)?

That would be an interesting idea. Maybe you can answer my concerns:

Simplicity & dependencies
The semver library (and specifically the Version class) is quite simple. It doesn't have any dependencies and I would like to keep it that way. Maybe I got it wrong, but adding some dependencies of antlr (even if it's generated code by that) sounds a bit... too much to me?
Code generations
If I'm not mistaken, as antlr is a parser generator and create Python code from your grammar. That sounds quite awesome. But for such a small project it sounds a bit of an overkill and the code you get is probably not very pythonic. Sure, nobody wants to write a parser manually, but I fear this code would be bigger than the whole class itself.
How much Python code lines do we talk about?
Semver specification
Apart from the previous concerns semver itself doesn't specifies any tags (be it "alpha", "snapshots" etc.) as literals. If you add them to your grammar, you give them a special meaning that the semver specification itself doesn't define. For semver it's just a "prerelease" string. Adding those to the class sounds a bit of an extension to semver.

I'm a big fan of making the project or the class as flexible and adaptable as possible to different use cases. One way would be, for example, to enable others to derive the class and to implement something that the base class cannot provide. This is how Pydantic does it and it is described in the documentation.

The other way could be to improve issue #241: I've played around a bit and created PR #367 some time ago. It's certainly not perfect, but it tries to implement the tilde and caret operators. It's not all fleshed out, as there are currently some design issues to be solve. But it's a start. What the PR currently lacks is support for prereleases. Maybe the base class could provide some way to parse the prerelease tags to support more fancy comparisions? Would that help your use case if that issue would be supported?

I'm not sure if this helps or answers your questions. Perhaps this current post touches more the philosophical aspect than the mere technical one. 🙂

Would like to get to know your point of view and how we can solve it together.

chtaylo3 Nov 2, 2023

Hope you feel better soon!

I agree with your statement about the philosophical aspects, but I think there may be a way to address both.

Q: Am I right that you would like to add this extract class method to the Version class and the result from the Antlr grammar output (the parser)?
A: Yes. It would also be possible to relegate this extract capability to an optional component of the package so that users still retain the ability to download just the smaller core capability.

Q: Concerns regarding Simplicity & dependencies
A: Yes, these are all accurate and valid concerns. There is a runtime dependency as well as the code generated by antlr. Making this an optional component of the package should address these concerns.

Q: Concerns regarding code generation
A: Yes, antlr or lex/yacc or whatever are parser generators that build a parser based on the grammar provided. In addition to the code generated on our behalf, there is an additional runtime dependency. One of the reasons why I gravitated towards antlr4 is this rewrite was done purposefully to make it easier for people that want to parse things other than a full grammar for a compiler (see the listener/visitor pattern) and the output is significantly more pythonic than v3 was. The lexer is about ~150-200 loc, the parser is about ~1 KLOC (it grows or shrinks depending on the number of rules in the parser grammar). The listener that we inherit from is about ~100 LOC of just methods that 'pass' so they're not functional lines of code. The inherited listener (where we write our code) will have ~2 or 3 decorators of about 50-200 LOC each which then need to be applied on the various methods. There is a dependency on the antlrv4 python runtime. I don't know how many LOC that is, but I can find out if you need. In terms of runtime, I would assume it's around the same number of LOC executed as a regex engine although the re module is not installed with the package because it's in stdlib. In summary, yes - it would noticeably increase the size, but I think we can mitigate that by having this functionality provided in some sort of optional install for the package. This might mean it doesn't go directly into the version class as a class method, but I think the tradeoff would be worth it.

Q: Concerns regarding semver specification.
A: I'm more than happy to revert that back to the spec and remove these additional tags. There's always a tradeoff between the formal spec and how people use it.

Q: Maybe the base class could provide some way to parse the prerelease tags to support more fancy comparisions? Would that help your use case if that issue would be supported?
A: Partially. We're getting comma separated semvers in a string which tells people what they should upgrade to. We could parse that by hand if needed but I thought a generalized extract method might be more useful to others. We're also getting names of libraries that we need to parse the version out from as well. For example, "my_lib_compat_1.2.3_4.5.6-rc5.7-3.8.23.jar", where 4.5.6-rc5.7-3.8.23 is the semver. My only concern is that this can get complicated very quickly. I'd be happy to think on that and see if I think I can submit a PR for this without using a parser generator.

My concern is that if you try and fully implement #241, you're going to run into problems trying to do this by hand compared to using an existing parser generator. That's why I was thinking about a parser for an extract capability because it could also lay the foundation to address both problems easily.

tomschr Nov 2, 2023
Maintainer Author

Thanks for your answers! 👍

Hope you feel better soon!

Thanks, still dealing with some coughing and I'm slow to respond. But mostly I'm fine. 🙂

I've tried to compile your examples with antlr4, but I wasn't successful:

$ antlr4  SemanticVersionParser.g4
warning(125): SemanticVersionParser.g4:5:5: implicit definition of token ALPHA in parser
warning(125): SemanticVersionParser.g4:6:5: implicit definition of token BETA in parser
warning(125): SemanticVersionParser.g4:7:5: implicit definition of token RC in parser

... similar warning messages pruned ...

There was no Python code created. Obviously, I did something wrong here. 😆

In summary, yes - it would noticeably increase the size, but I think we can mitigate that by having this functionality provided in some sort of optional install for the package.

Even if we make it "optional", we still have to build, test, document, and maintain it. If it's there, people will use it.

[... Spec...] I'm more than happy to revert that back to the spec and remove these additional tags. There's always a tradeoff between the formal spec and how people use it.

Well, specifications are there for a reason. This project tries to follow the semver specification as best as it can.

You can try to convince the semver people. Perhaps they will add it in one of their next version. However, I had the impression that they wanted to keep their specification lean and easy.

[...] but I thought a generalized extract method might be more useful to others.

Yes, maybe. I'm always driven by this simple question: What should be supported/integrated by semver (or more specifically, the Version class) and what not?

I think, the obvious core purpose is to do one task well: managing semver versions. This involves converting to and from strings, tuples, dicts, raising versions, and comparing versions to each other.

What is outside of those core tasks is nothing that semver will do. For example, reading versions from a file or connecting to a database to extract versions from there. This is something the user has to take care of.

What I'm trying to say is, if it follows its core purpose and is beneficial to a broader audience, then it's very likely it will get incorporated. If it's something that is only beneficial to parts of our users, it could be probably be integrated into the documentation.

The question is, would such an extract method be useful for others? How often does it occur and is needed? For the ordinary users who are only interested in comparing versions is probably not very interesting. You have a special use case and I assume, only a fraction of semver's users would need it. But I could be wrong.

My concern is that if you try and fully implement #241, you're going to run into problems trying to do this by hand compared to using an existing parser generator.

Oh, quite likely! I wrote so many wrong lines that I can't count them. 😆

So yes, that can happen. But for this we have the test suite to ensure code is heavily tested and works as advertised. Of course, it's not a 100% guarantee. But you don't have a guarantee either that your generated parser with ~1200 LOC doesn't have bugs. It's a heavy piece. 😉

semver's library has got the same ~1200 LOC (wheres the core with version.py is appr. half the size). Adding another ~1200 LOC just for the parsing the same that already exists sounds a bit, uhmn, overkill to me and a bad trade-off.

At the moment, the Version class uses a regex for its "parsing". It's simple, effective, and fast. I don't see yet the need to add an additional dedicated parser that is a) quite heavy, b) implements what's already there, and c) adds additional dependencies.

Sooo... sorry, ATM I'm not really convinced this would be something useful for this project. But I could be wrong, so let me think and digest it for some more days.

But if you have ideas how to improve #241 I'm more than happy!

chtaylo3 Nov 3, 2023

You need to do the lexer first so the parser has something to work from. you also probably want to pass along the target language and the version of the runtime you're compiling for. So for example

poetry add antlr4-python-runtime:4.13.1
poetry shell
antlr4 -v 4.13.1 -Dlanguage=Python3 SemanticVersionLexer.g4
antlr4 -v 4.13.1 -Dlanguage=Python3 SemanticVersionParser.g4
//whatever code editor

Sooo... sorry, ATM I'm not really convinced this would be something useful for this project.

No worries at all! One of the things we were hoping to get from this is a tokenized/parsed pre-release for comparison purposes so that we can later do some analysis on the release velocity and such.

But if you have ideas how to improve #241 I'm more than happy!

Unfortunately, I would use antlr4 to generate a parser from the formal grammar so that we don't have to reinvent things. I'm a bit lazy like that ;)

tomschr Nov 3, 2023
Maintainer Author

Thanks for your understanding. ❤️

I'm still curious about antlr and would try it out a bit, but for some reason this doesn't work as expected. I've tried it in a subdirectory (without pyproject.toml, so I had to use poetry init first). I'm stuck at the first step:

$ poetry add antlr4-python-runtime:4.13.1

Could not find a matching version of package antlr4-python-runtime

$ poetry add antlr4-python-runtime==4.13.1

Could not find a matching version of package antlr4-python-runtime

I normally work with pyenv and I've already installed the above lib (although with pip). If I skip the poetry step, I see these two .g4 files, but when I try to run the first one, I get this:

$ antlr4 -v 4.13.1 -Dlanguage=Python3 SemanticVersionLexer.g4
error(2):  unknown command-line option -v
error(7):  cannot find or open file: 4.13.1
error(50): SemanticVersionLexer.g4:35:9: syntax error: mismatched input ',' expecting COLON while matching a lexer rule
error(50): SemanticVersionLexer.g4:35:65: syntax error: mismatched input ')' expecting COLON while matching a lexer rule

Removing -v 4.13.1 doesn't help either. Any idea?

ideas how to improve #241
Unfortunately, I would use antlr4 to generate a parser from the formal grammar so that we don't have to reinvent things. I'm a bit lazy like that ;)

That's totally fine. I was more referring to the overall design than adding code. 🙂

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Welcome to python-semver Discussions! #329

{{title}}

Replies: 1 comment · 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Search code, repositories, users, issues, pull requests...

Welcome to python-semver Discussions! #329

tomschr Dec 9, 2020 Maintainer

👋 Welcome!

Replies: 1 comment · 7 replies

chtaylo3 Nov 1, 2023

tomschr Nov 1, 2023 Maintainer Author

chtaylo3 Nov 2, 2023

tomschr Nov 2, 2023 Maintainer Author

chtaylo3 Nov 3, 2023

tomschr Nov 3, 2023 Maintainer Author

tomschr
Dec 9, 2020
Maintainer

chtaylo3
Nov 1, 2023

tomschr Nov 1, 2023
Maintainer Author

tomschr Nov 2, 2023
Maintainer Author

tomschr Nov 3, 2023
Maintainer Author