Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[DataLayout] Introduce DataLayout::getAddressSize(AS) #139347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

arichardson
Copy link
Member

This function can be used to retrieve the number of bits that can be used
for arithmetic in a given address space (i.e. the range of the address
space). For most in-tree targets this should not make any difference
but differentiating between the size of a pointer in bits and the address
range is extremely important e.g. for CHERI-enabled targets, where pointers
carry additional metadata such as bounds and permissions and only a subset
of the pointer bits is used as the address.

The address size is defined to be the same as the index size.

We considered adding a separate property since targets exist where indexing
and address range actually use different sizes (AMDGPU fat pointers with
160 representation, 48 bit address and 32 bit index), but for the purposes
of LLVM semantics, differentiating them does not add much value and it
introduces a lot of complexity in ensure the correct bits are used. See
the reasoning by @nikic on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/38https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/49

Originally uploaded as https://reviews.llvm.org/D135158

Created using spr 1.3.6-beta.1
@llvmbot
Copy link
Member

llvmbot commented May 10, 2025

@llvm/pr-subscribers-llvm-ir

Author: Alexander Richardson (arichardson)

Changes

This function can be used to retrieve the number of bits that can be used
for arithmetic in a given address space (i.e. the range of the address
space). For most in-tree targets this should not make any difference
but differentiating between the size of a pointer in bits and the address
range is extremely important e.g. for CHERI-enabled targets, where pointers
carry additional metadata such as bounds and permissions and only a subset
of the pointer bits is used as the address.

The address size is defined to be the same as the index size.

We considered adding a separate property since targets exist where indexing
and address range actually use different sizes (AMDGPU fat pointers with
160 representation, 48 bit address and 32 bit index), but for the purposes
of LLVM semantics, differentiating them does not add much value and it
introduces a lot of complexity in ensure the correct bits are used. See
the reasoning by @nikic on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/38https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/49

Originally uploaded as https://reviews.llvm.org/D135158


Full diff: https://github.com/llvm/llvm-project/pull/139347.diff

2 Files Affected:

  • (modified) llvm/docs/LangRef.rst (+25-8)
  • (modified) llvm/include/llvm/IR/DataLayout.h (+54-8)
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 7296bb84b7d95..07e58e5b7a338 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -3147,14 +3147,21 @@ as follows:
 ``A<address space>``
     Specifies the address space of objects created by '``alloca``'.
     Defaults to the default address space of 0.
-``p[n]:<size>:<abi>[:<pref>][:<idx>]``
-    This specifies the *size* of a pointer and its ``<abi>`` and
-    ``<pref>``\erred alignments for address space ``n``.
-    The fourth parameter ``<idx>`` is the size of the
-    index that used for address calculation, which must be less than or equal
-    to the pointer size. If not
-    specified, the default index size is equal to the pointer size. All sizes
-    are in bits. The address space, ``n``, is optional, and if not specified,
+``p[n]:<size>:<abi>[:<pref>[:<idx>[:<addr>]]]``
+    This specifies the properties of a pointer in address space ``n``.
+    The ``<size>`` parameter specifies the size of the bitwise representation.
+    For :ref:`non-integral pointers <nointptrtype>` the representation size may
+    be larger than the address width of the underlying address space (e.g. to
+    accommodate additional metadata).
+    The alignment requirements are specified via the ``<abi>`` and
+    ``<pref>``\erred alignments parameters.
+    The fourth parameter ``<idx>`` is the size of the index that used for
+    address calculations such as :ref:`getelementptr <i_getelementptr>`.
+    It must be less than or equal to the pointer size. If not specified, the
+    default index size is equal to the pointer size.
+    The index size also specifies the width of addresses in this address space.
+    All sizes are in bits.
+    The address space, ``n``, is optional, and if not specified,
     denotes the default address space 0. The value of ``n`` must be
     in the range [1,2^24).
 ``i<size>:<abi>[:<pref>]``
@@ -4266,6 +4273,16 @@ address spaces defined in the :ref:`datalayout string<langref_datalayout>`.
 the default globals address space and ``addrspace("P")`` the program address
 space.
 
+The representation of pointers can be different for each address space and does
+not necessarily need to be a plain integer address (e.g. for
+:ref:`non-integral pointers <nointptrtype>`). In addition to a representation
+bits size, pointers in each address space also have an index size which defines
+the bitwidth of indexing operations as well as the size of `integer addresses`
+in this address space. For example, CHERI capabilities are twice the size of the
+underlying addresses to accommodate for additional metadata such as bounds and
+permissions: on a 32-bit system the bitwidth of the pointer representation size
+is 64, but the underlying address width remains 32 bits.
+
 The default address space is number zero.
 
 The semantics of non-zero address spaces are target-specific. Memory
diff --git a/llvm/include/llvm/IR/DataLayout.h b/llvm/include/llvm/IR/DataLayout.h
index 2ad080e6d0cd2..09ba6b54cf721 100644
--- a/llvm/include/llvm/IR/DataLayout.h
+++ b/llvm/include/llvm/IR/DataLayout.h
@@ -324,16 +324,38 @@ class DataLayout {
   /// the backends/clients are updated.
   Align getPointerPrefAlignment(unsigned AS = 0) const;
 
-  /// Layout pointer size in bytes, rounded up to a whole
-  /// number of bytes.
+  /// The pointer representation size in bytes, rounded up to a whole number of
+  /// bytes. The difference between this function and getPointerAddressSize() is
+  /// this one returns the size of the entire pointer type (this includes
+  /// metadata bits for fat pointers) and the latter only returns the number of
+  /// address bits.
+  /// \sa DataLayout::getPointerAddressSizeInBits
   /// FIXME: The defaults need to be removed once all of
   /// the backends/clients are updated.
   unsigned getPointerSize(unsigned AS = 0) const;
 
-  // Index size in bytes used for address calculation,
-  /// rounded up to a whole number of bytes.
+  /// The index size in bytes used for address calculation, rounded up to a
+  /// whole number of bytes. This not only defines the size used in
+  /// getelementptr operations, but also the size of addresses in this \p AS.
+  /// For example, a 64-bit CHERI-enabled target has 128-bit pointers of which
+  /// only 64 are used to represent the address and the remaining ones are used
+  /// for metadata such as bounds and access permissions. In this case
+  /// getPointerSize() returns 16, but getIndexSize() returns 8.
+  /// To help with code understanding, the alias getPointerAddressSize() can be
+  /// used instead of getIndexSize() to clarify that an address width is needed.
   unsigned getIndexSize(unsigned AS) const;
 
+  /// The integral size of a pointer in a given address space in bytes, which
+  /// is defined to be the same as getIndexSize(). This exists as a separate
+  /// function to make it clearer when reading code that the size of an address
+  /// is being requested. While targets exist where index size and the
+  /// underlying address width are not identical (e.g. AMDGPU fat pointers with
+  /// 48-bit addresses and 32-bit offsets indexing), there is currently no need
+  /// to differentiate these properties in LLVM.
+  /// \sa DataLayout::getIndexSize
+  /// \sa DataLayout::getPointerAddressSizeInBits
+  unsigned getPointerAddressSize(unsigned AS) const { return getIndexSize(AS); }
+
   /// Return the address spaces containing non-integral pointers.  Pointers in
   /// this address space don't have a well-defined bitwise representation.
   SmallVector<unsigned, 8> getNonIntegralAddressSpaces() const {
@@ -358,29 +380,53 @@ class DataLayout {
     return PTy && isNonIntegralPointerType(PTy);
   }
 
-  /// Layout pointer size, in bits
+  /// The size in bits of the pointer representation in a given address space.
+  /// This is not necessarily the same as the integer address of a pointer (e.g.
+  /// for fat pointers).
+  /// \sa DataLayout::getPointerAddressSizeInBits()
   /// FIXME: The defaults need to be removed once all of
   /// the backends/clients are updated.
   unsigned getPointerSizeInBits(unsigned AS = 0) const {
     return getPointerSpec(AS).BitWidth;
   }
 
-  /// Size in bits of index used for address calculation in getelementptr.
+  /// The size in bits of indices used for address calculation in getelementptr
+  /// and for addresses in the given AS. See getIndexSize() for more
+  /// information.
+  /// \sa DataLayout::getPointerAddressSizeInBits()
   unsigned getIndexSizeInBits(unsigned AS) const {
     return getPointerSpec(AS).IndexBitWidth;
   }
 
-  /// Layout pointer size, in bits, based on the type.  If this function is
+  /// The size in bits of an address in for the given AS. This is defined to
+  /// return the same value as getIndexSizeInBits() since there is currently no
+  /// target that requires these two properties to have different values. See
+  /// getIndexSize() for more information.
+  /// \sa DataLayout::getIndexSizeInBits()
+  unsigned getPointerAddressSizeInBits(unsigned AS) const {
+    return getIndexSizeInBits(AS);
+  }
+
+  /// The pointer representation size in bits for this type. If this function is
   /// called with a pointer type, then the type size of the pointer is returned.
   /// If this function is called with a vector of pointers, then the type size
   /// of the pointer is returned.  This should only be called with a pointer or
   /// vector of pointers.
   unsigned getPointerTypeSizeInBits(Type *) const;
 
-  /// Layout size of the index used in GEP calculation.
+  /// The size in bits of the index used in GEP calculation for this type.
   /// The function should be called with pointer or vector of pointers type.
+  /// This is defined to return the same value as getPointerAddressSizeInBits(),
+  /// but separate functions exist for code clarity.
   unsigned getIndexTypeSizeInBits(Type *Ty) const;
 
+  /// The size in bits of an address for this type.
+  /// This is defined to return the same value as getIndexTypeSizeInBits(),
+  /// but separate functions exist for code clarity.
+  unsigned getPointerAddressSizeInBits(Type *Ty) const {
+    return getIndexTypeSizeInBits(Ty);
+  }
+
   unsigned getPointerTypeSize(Type *Ty) const {
     return getPointerTypeSizeInBits(Ty) / 8;
   }

Created using spr 1.3.6-beta.1
@davidchisnall
Copy link
Contributor

This looks like a good approach. The function names convey intent, and will allow us to differentiate address and index size later if necessary. The proposed use case for different addresses that I've seen was in the context of a RV128 ABI that would want a 64-bit size_t but a 128-bit intptr_t (single allocations can't be larger than 2^64 bytes, but can be anywhere in the address space).

Created using spr 1.3.6-beta.1
@arichardson arichardson changed the title [DataLayout] Introduce DataLayout::getPointerAddressSize(AS) [DataLayout] Introduce DataLayout::getAddressSize(AS) May 12, 2025
Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@arichardson arichardson merged commit aec685e into main May 16, 2025
12 checks passed
@arichardson arichardson deleted the users/arichardson/spr/datalayout-introduce-datalayoutgetpointeraddresssizeas-1 branch May 16, 2025 17:04
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request May 16, 2025
This function can be used to retrieve the number of bits that can be used
for arithmetic in a given address space (i.e. the range of the address
space). For most in-tree targets this should not make any difference
but differentiating between the size of a pointer in bits and the address
range is extremely important e.g. for CHERI-enabled targets, where pointers
carry additional metadata such as bounds and permissions and only a subset
of the pointer bits is used as the address.

The address size is defined to be the same as the index size.

We considered adding a separate property since targets exist where indexing
and address range actually use different sizes (AMDGPU fat pointers with
160 representation, 48 bit address and 32 bit index), but for the purposes
of LLVM semantics, differentiating them does not add much value and it
introduces a lot of complexity in ensure the correct bits are used. See
the reasoning by @nikic on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/38https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/49

Originally uploaded as https://reviews.llvm.org/D135158

Reviewed By: davidchisnall, krzysz00

Pull Request: llvm/llvm-project#139347
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.