Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Resolving a subproject of a Python package via symlink #674

Unanswered
WillAyd asked this question in Q&A
Discussion options

I am experimenting with moving the arrow-nanoarrow project over from setuptools to Meson. Generally, I have a working configuration for local development that you can see here:

apache/arrow-nanoarrow#644

The layout of the project is:

arrow-nanoarrow/
  meson.build
  src/
    nanoarrow/
  python/
    meson.build
    src/
      nanoarrow/

In the Python directory, I am symlinking the subprojects/arrow-nanoarrow directory back to the project root, so that I can use dependencies from that in the Python configuration:

nanoarrow_proj = subproject('arrow-nanoarrow')
nanoarrow_dep = nanoarrow_proj.get_variable('nanoarrow_dep')
nanoarrow_ipc_dep = nanoarrow_proj.get_variable('nanoarrow_ipc_dep')
nanoarrow_device_dep = nanoarrow_proj.get_variable('nanoarrow_device_dep')

This works when installing from source in the repository, but when I create a source distribution things break, because that symlink is not maintained.

Is there a better way to handle this situation? Should I be using the wrap system as a fallback when the symlink cannot be found? Or is my approach totally off?

You must be logged in to vote

Replies: 3 comments · 18 replies

Comment options

This is the first time for me seeing this configuration, so I'm not sure this is the right answer, but here are a few thoughts:

  • This would work with the symlink if that symlink got materialized in a full copy in the sdist.
    • However, since you're symlinking to a parent level, it would include the python/ sources twice in the sdist, so that is a bit weird at least.
  • An alternative is to put the pyproject.toml one level higher up, since you do actually need to get the targets in the top-level meson.build file built. You'd have to change some other things around, like install: true for nanoarrow_dep would be install: not is_wheel_build, but I think that would work in principle.
    • The reason not to do this would be if you'd consider the nanoarrow C++ library a separate thing and the Python package can depend on a system-installed nanoarrow instead of getting an exactly matching version from the repo/sdist. But your code isn't doing this - it's conceptually a single project, so I think the answer is that the Python build should start at the top level of the repo and have a build option to indicate that you want the Python package built.
You must be logged in to vote
3 replies
@WillAyd
Comment options

Awesome thanks as always for the input @rgommers

Working through this some more, I think I actually got to a good place doing:

arrow-nanoarrow/
  meson.build
  python/
    meson.build
    subproject/
      arrow-nanoarrow-symlink/  -- symlink to arrow-nanoarrow project root
      arrow-nanoarrow.wrap  -- in theory only used when symlink is not present

Ideally I was hoping that the subproject structure would have just been:

subprojects/
    arrow-nanoarrow/
    arrow-nanoarrow.wrap

And the wrap file would only come into play when the symlink is not resolvable (i.e. when we package an sdist, that symlink will not be included). However, it does appear that there is an upstream "bug" for this in Meson, so the naming distinction between the symlink and the wrapfile seems to be working for the time being:

mesonbuild/meson#13746

@rgommers
Comment options

Checking back in on this, are things working for you with your wrap file solution @WillAyd?

An alternative is to put the pyproject.toml one level higher up

This should be the recommended way of doing things for projects where that's an option I believe (the discussion thread in the next answer below doesn't change that so far). It should avoid any problems with subprojects, sdist generation, etc.

If for policy reasons in a very large repo the pyproject.toml isn't allowed to live one level up from the python/ bindings sub-directory, then the symlink + wrap file approach seems like a decent alternative.

@WillAyd
Comment options

Haven't finished it but yes so far the symlink + wrap is workable. Although in our case we've replaced the wrap file with a dist hook script as you suggested, which works just as well

Comment options

I think that the way you are trying to solve this is the wrong one: having a project using a part of itself as a subproject seems more a source of trouble than a solution to any problem. Which problem does using the library as a subproject solve? Why can't you simply use build targets defined in other parts of the build definition in the build definition for the Python wrapper?

If the library and the Python binding are separate projects, have two separate repositories with the Python wrappers possibly including the library as a subproject via a git submodule. If you want to keep them in the same git repository, I don't see the reason for having the library to be a fictitious submodule.

You must be logged in to vote
14 replies
@paleolimbot
Comment options

Many projects choose to have a python/ subdirectory in the same git repository for Python bindings that are tightly coupled to the underlying library (e.g., arrow/pyarrow, duckdb, protobuf). For Apache projects there is also a complication that a "release" requires quite a lot of overhead + a vote, and creating an extra repository requires intervention from the ASF (it does happen though, for example apache/datafusion-python). This is particularly true when starting Python bindings (i.e., at the beginning many coupled changes are required, but as the underlying library matures it becomes possible to release the bindings independently). We are definitely still in the "starting Python bindings" phase in nanoarrow 🙂 .

@dnicolodi
Comment options

I'm even more confused about what you are trying to achieve. From my understanding of the explanations provided so far, you have two conflicting goals: decouple the Python bindings from the underlying library (this is the only reason that I can see to have them distributed in separate source release archives) but also keep them tightly coupled (because of the code inter-dependencies and unsettled API).

My understanding is that building the Python wrappers requires a local build of the underlying library. If this is the case, I don't understand why you want to distribute the Python wrappers source code and the library source code as separate source archives. The only reason I can envision for doing so would be to allow to build against a locally installed nanoarrow library. However, this does not seem to be supported nor really possible at this stage of development. Is this the case?

@dnicolodi
Comment options

Particularly for the nanoarrow library, where some changes in Python may depend on changes being made in the C libraries as well, wouldn't splitting it into two separate repositories make things harder? I'm thinking through a scenario where you have to change the C structure, commit it, then go to another repository for Python, make changes there, only to find that you missed something back in the C repository, etc...

I don't think so. You just work on the checkout of nanoarrow that happens to be in the subprojects folder of the Python bindings project. You can include nanoarrow in the the subprojects directory of the Python bindings as a git submodule. However, I understand that separating the repositories is not something you want to do.

If separating the repositories is not something you want or can do, I don't think the subproject model adapts very well. I would reorganize things so that the Python bindings are built as part of the the build system of the nanoarrow library. If your build system of choice is CMake, you can build Python wheels with scikit-build.

@paleolimbot
Comment options

I am not sure I've done a good job communicating the motivation here, although I am not personally offended if this is not a use case meson-python would like to handle (setuptools is not currently causing any problems). Some of the projects I mentioned choose to distribute a self-contained sdist (e.g., duckdb) and some of them do not. I believe this is to accommodate users that need to be able to install in an airgapped environment.

@dnicolodi
Comment options

meson-python has nothing to do with the difficulties you are encountering. meson-python is just a way to invoke meson and package the built artifacts. You are trying to bend the Meson subprojects system to do things it is not designed for. I'm not saying that it cannot be made to work but bending it in this way seems much more work that using the build system as intended.

Comment options

nanoarrow_proj = subproject('arrow-nanoarrow')
nanoarrow_dep = nanoarrow_proj.get_variable('nanoarrow_dep')
nanoarrow_ipc_dep = nanoarrow_proj.get_variable('nanoarrow_ipc_dep')
nanoarrow_device_dep = nanoarrow_proj.get_variable('nanoarrow_device_dep')

How is this different from having

subdir('python')

in the top-level meson.build and directly using nanoarrow_dep, nanoarrow_ipc_dep and nanoarrow_device_dep in python/meson.build?

You must be logged in to vote
1 reply
@rgommers
Comment options

This is also what I meant in my first comment with "put the pyproject.toml one level higher up".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
🙏
Q&A
Labels
None yet
4 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.