This PyPi package contains the Python APIs for using Paimon.
Pypaimon requires Python 3.6+.
The core dependencies are listed in dev/requirements.txt.
The development dependencies are listed in dev/requirements-dev.txt.
You can build the source package by executing the following command:
python3 setup.py sdist
The package is under dist/. Then you can install the package by executing the following command:
pip3 install dist/*.tar.gz
The command will install the package and core dependencies to your local Python environment.
pypaimon supports HDFS through a pure-protocol client based on
hdfs-native (Rust + PyO3).
Use it when you want HDFS access without installing Hadoop, a JDK,
libhdfs, or wrestling with CLASSPATH / LD_LIBRARY_PATH.
Install with the optional extra:
pip install 'pypaimon[hdfs]'
The native backend requires Python 3.10+ (and is unavailable on Windows).
On older interpreters the extra is skipped, so pypaimon still installs — keep
using the legacy pyarrow (libhdfs/JVM) backend there via
hdfs.client.impl=pyarrow.
For hdfs:// and viewfs:// URIs this backend is now the default.
Switch back to the legacy libhdfs (JNI) path with:
catalog = CatalogFactory.create({
"warehouse": "hdfs://ns1/warehouse",
"hdfs.client.impl": "pyarrow", # default: "native"
})The client still needs to know about NameNode addresses, HA failover
groups, and viewfs mount tables. Three options:
-
Local xml — set
HADOOP_CONF_DIR(or thehdfs.conf-diroption) to a directory containingcore-site.xml/hdfs-site.xml. Only the xml is required; no Hadoop binaries or JDK. -
Catalog options (REST-friendly) — pass the original Hadoop key/values directly in catalog options. Keys with prefixes
dfs.,fs.,hadoop.,ipc.,io.are forwarded as-is. A REST catalog can deliver these in its response, giving a fully zero-file client experience:CatalogFactory.create({ "warehouse": "viewfs://cluster/warehouse", "dfs.nameservices": "ns1", "dfs.ha.namenodes.ns1": "nn1,nn2", "dfs.namenode.rpc-address.ns1.nn1": "host-1:8020", "dfs.namenode.rpc-address.ns1.nn2": "host-2:8020", "fs.viewfs.mounttable.cluster.link./prod": "hdfs://ns1/prod", })
-
Namespaced overrides — use
hdfs.config.<key>to forward any other Hadoop key not covered by the prefix whitelist.
The three sources can be combined; catalog options take precedence over xml.
A secured cluster still needs the GSSAPI system library
(libgssapi-krb5-2 on Debian/Ubuntu, krb5 via Homebrew on macOS,
krb5-libs on RHEL) plus a krb5.conf. Provide credentials by either:
- Running
kinityourself and pointingKRB5CCNAMEat the cache, or - Setting
security.kerberos.login.principalandsecurity.kerberos.login.keytabin catalog options —pypaimonwill runkinitfor you.
If the native backend fails to initialise (e.g. wheel missing on an
unsupported platform such as Windows), pypaimon automatically falls
back to the pyarrow (libhdfs/JVM) path and logs a warning. Disable
the fallback with hdfs.client.fallback-to-pyarrow=false if you want
hard failures instead.
