Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Discussion options

Hi,

I recently integrated rapidfuzz-cpp with one of the projects I am working on. We enabled many compiler warning flags. When compiled with rapidfuzz-cpp, I could see many warnings originated from the internals of rapidfuzz-cpp. Do you have any plans to make the project warning-free with some commonly used compiler warning flags enabled? If so, I'd love to help.

You must be logged in to vote

Replies: 7 comments · 24 replies

Comment options

Yes I would love to have the project warning free. It would be good to:

  1. fix these warnings
  2. enable some sort of Werror behavior for the CI tests, so code with warnings causes the CI to fail.
You must be logged in to vote
1 reply
@yaoyuan0553
Comment options

Sure thing. I'll go ahead and get started.

Comment options

I'm using Jason Turner's C++ best practices (https://github.com/cpp-best-practices/project_options) to enable as many warning flags as possible. This also spares us the trouble to hand-crank cross-compiler warning flags. I'm fixing warnings in this order:

  1. MSVC
  2. Clang
  3. GCC

What do you think?

You must be logged in to vote
4 replies
@maxbachmann
Comment options

Sounds good to me 👍

@yaoyuan0553
Comment options

A lot of the indices seems to be stored as int64_t regardless of whether being compiled against 32-bit or 64-bit programs. They cause warnings when used as indices for vectors and arrays. There are two options:

  1. The easy one - simply cast all of them to std::size_t when they are used to index.
  2. The hard one - change all indices from either uint64_t or int64_t to std::size_t, which will influence function parameter types, local variables and class member variables (from what I have seen so far).

The more difficult option, however, will be compliant with the C++ STL and common design principles, where 32-bit programs, being limited to 4 GB of virtual memory space (even less to be used) will probably never run into integer overflows using int32_t and uint32_t as indexing types, unless special needs arise, which can be generally resolved using underlaying OS features rather than language level types, such as the Address Windowing Extensions.

Which one do you prefer? Or perhaps you have other ideas.

@maxbachmann
Comment options

I am not 100% sure on the type that should be used for these indices. As you noted the C++ standard library uses unsigned types (size_t). However they recommend the usage of signed types as long as the modulo arithmetic is not required: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#es102-use-signed-types-for-arithmetic and stated on multiple occasions, that it was a mistake to use size_t for indices in the standard library.

I lean towards:

using index = std::ptrdiff_t;

which would be signed and 32/64 bit depending on the platform.

@yaoyuan0553
Comment options

There are cases where pointer/iterator arithmetic is performed and I used std::ptrdiff_t, and then STL containers require std::size_t for indexing. I thought about keeping both, but the non-uniformity really puts some mental burden when writing code and may also cause unexpected bugs as stated in the CppCoreGuidelines as well as this StackOverflow answer. I think you are right, I'll go with std::ptrdiff_t and see where that takes us.

Comment options

It seems we have to use unsigned integers for STL container indexing after all. Both clang and gcc warn about the potential sign change even though in most cases it is simply impossible.

You must be logged in to vote
7 replies
@yaoyuan0553
Comment options

Gotcha, the number of casts needed to get rid of the warnings is getting out of hand. I am seeking other more sane options, hence the question. Do you have other ideas?

@maxbachmann
Comment options

As far as I am aware the implementation mostly uses std::vector and std::array. We could get away with writing a wrapper for std::vector/std::array, which uses std::ptrdiff_t and performs the cast inside the wrapper implementation. This way we do not need these casts inside the implementation. Since those classes only have a limited set of functions which use std::size_t (many use iterators which already use std::ptrdiff_t), this shouldn't require a ton of effort.

@yaoyuan0553
Comment options

Hmm... I am not sure if writing a fully compatible wrapper for std::vector and std::array would be suitable in our scenario. These wrappers would require composing extensive tests to prevent introducing new bugs.

In seeking precedence on resolving this issue, I went through some of the widely used open source libraries. It seems C++ developers are okay with the idea of using std::size_t or unsigned integers for indexing STL containers. For example,

  1. Boost.Asio

https://github.com/chriskohlhoff/asio/blob/fc901d79c11c72a0ba81764820d3c6a788c0f075/asio/include/asio/ip/basic_resolver_iterator.hpp#L178

  1. dlib

https://github.com/davisking/dlib/blob/8d4df7c0b3fa7c4c1e4175951161b01ccf4541b5/examples/bayes_net_gui_ex.cpp#L788

  1. json

https://github.com/nlohmann/json/blob/develop/include/nlohmann/json.hpp#L2046

The list goes on. I also searched for std::ptrdiff_t, which apparently appeared much less frequently than its std::size_t counterpart.
My point is, I suggest that we take a step back and just use std::size_t whenever it's sensible, instead of always enforcing the use of std::ptrdiff_t and inducing more casts than intended.

@maxbachmann
Comment options

These wrappers would require composing extensive tests to prevent introducing new bugs.

90% of the wrapper would be the same as the Editops / Opcodes class.

My point is, I suggest that we take a step back and just use std::size_t whenever it's sensible, instead of always enforcing the use of std::ptrdiff_t and inducing more casts than intended.

I am completely fine with using std::size_t when it's more sensible.

@yaoyuan0553
Comment options

I am completely fine with using std::size_t when it's more sensible.

I see. I will start making some changes and see if it is still necessary to make wrappers.

Comment options

Am I correct to assume that when handling iterators provided from users of the library, std::distance(first, last) should always yield positive results? Do we invoke "undefined behavior" like the STL if the relationship between these iterators does not meet the requirements?

You must be logged in to vote
1 reply
@maxbachmann
Comment options

Yes std::distance(first, last) should always return positive results.

Comment options

Now that this builds on all compilers, I merged the changes into main. Thanks for the work you put into this 👍

You must be logged in to vote
1 reply
@yaoyuan0553
Comment options

Glad I could help :D

Comment options

I added the g++ CI to reflect our changes in cmake.yml. However, GCC spit out several conversion warnings as errors during compilation. Apparently, GitHub Actions uses GNU 9.4.0, which does warn about the conversion from int to uint8_t.

https://github.com/yaoyuan0553/rapidfuzz-cpp/runs/6159911017?check_suite_focus=true

I tested with GCC 10 and GCC 11, they do not warn about this conversion, neither does Clang. What's your strategy on the support of GCC version? Should we care about it?

You must be logged in to vote
9 replies
@maxbachmann
Comment options

Ah looking at this, this does not appear to be a bug in gcc. It is a legitimate concern, that std::distance might return a value below 0. The code currently does not support iterators where this is the case. However gcc can not know this and therefore warns. We can however make it aware of this assumption, which fixes the warning: https://godbolt.org/z/E1ovqqf8d

@yaoyuan0553
Comment options

Thanks. The compiler extensions of providing preconditions remind me of the aborted C++ 20 feature, Contracts.

I see what you are saying here. By casting the signed result returned from std::distance to an unsigned integer, we already guaranteed that cache_size is almost always greater than 0, except for the case where len1 == std::numeric_limits<std::size_t>::max() and len1 + 1 overflows making cache_size 0. And this case can never happen unless std::distance returns -1. This inference, however, cannot be realized by GCC 11.0 to 11.3 (strangely, other versions work fine), as shown here: https://godbolt.org/z/MYvcjz3nG. So I recommend that we be more blunt about this, by promising to the compiler that cache_size != 0.

@yaoyuan0553
Comment options

Would you like me to incorporate your code in the next pull request along with the GCC CI change?

@maxbachmann
Comment options

This inference, however, cannot be realized by GCC 11.0 to 11.3 (strangely, other versions work fine)

Good catch. Indeed it appears this works correctly in Trunk again. Guess this is really a bug then :)

So I recommend that we be more blunt about this, by promising to the compiler that cache_size != 0.

Yes making the inference simpler appears to work.

Would you like me to incorporate your code in the next pull request along with the GCC CI change?

I did already incorporate this change myself.

@yaoyuan0553
Comment options

I didn't realize it until now. The GCC warning suppression change I made actually introduced a bug when compiled with non-GCC compilers. the line cache[0] = 0 would have been omitted. Fortunately it was fixed by your latest patch to add preconditions.

Comment options

GCC doesn't support fuzzer as a sanitizer. Compiling with -fsanitizer=fuzzer fails. For the lack of deeper understanding of the fuzzer tests, I have an oversimplified solution, which is to simply disable fuzzer -DRAPIDFUZZ_BUILD_FUZZERS=0 for the GCC. What do you think about it?

You must be logged in to vote
1 reply
@maxbachmann
Comment options

This is expected. The fuzzing currently only works with clang. By default they are disabled, so you can just leave out the argument similar to the Windows target.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
💡
Ideas
Labels
None yet
2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.