Add pointer field protection feature. #133538

pcc · Mar 28, 2025

Pointer field protection is a use-after-free vulnerability
mitigation that works by changing how data structures' pointer
fields are stored in memory. For more information, see the RFC:
https://discourse.llvm.org/t/rfc-structure-protection-a-family-of-uaf-mitigation-techniques/85555

TODO:

Fix test failure.
Add more tests.
Add documentation.

Created using spr 1.3.6-beta.1

llvmbot · Mar 28, 2025

@llvm/pr-subscribers-libcxxabi
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-libcxx

@llvm/pr-subscribers-clang-codegen

Author: Peter Collingbourne (pcc)

Changes

Pointer field protection is a use-after-free vulnerability
mitigation that works by changing how data structures' pointer
fields are stored in memory. For more information, see the RFC:
https://discourse.llvm.org/t/rfc-structure-protection-a-family-of-uaf-mitigation-techniques/85555

TODO:

Fix test failure.
Add more tests.
Add documentation.

Patch is 82.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133538.diff

52 Files Affected:

(modified) clang/include/clang/AST/ASTContext.h (+22)
(modified) clang/include/clang/Basic/Attr.td (+6)
(modified) clang/include/clang/Basic/Features.def (+3)
(modified) clang/include/clang/Basic/LangOptions.def (+3)
(modified) clang/include/clang/Basic/LangOptions.h (+11)
(modified) clang/include/clang/Basic/TokenKinds.def (+1)
(modified) clang/include/clang/Driver/Options.td (+6)
(modified) clang/lib/AST/ASTContext.cpp (+95)
(modified) clang/lib/AST/ExprConstant.cpp (+1)
(modified) clang/lib/AST/Type.cpp (+3-1)
(modified) clang/lib/AST/TypePrinter.cpp (+3)
(modified) clang/lib/CodeGen/CGCall.cpp (+108-6)
(modified) clang/lib/CodeGen/CGClass.cpp (+45-7)
(modified) clang/lib/CodeGen/CGExpr.cpp (+7-1)
(modified) clang/lib/CodeGen/CGExprAgg.cpp (+1-1)
(modified) clang/lib/CodeGen/CGExprCXX.cpp (+10)
(modified) clang/lib/CodeGen/CGExprConstant.cpp (+38-1)
(modified) clang/lib/CodeGen/CodeGenFunction.cpp (+38-5)
(modified) clang/lib/CodeGen/CodeGenFunction.h (+6-2)
(modified) clang/lib/CodeGen/CodeGenModule.cpp (+39)
(modified) clang/lib/CodeGen/CodeGenModule.h (+7)
(modified) clang/lib/CodeGen/ItaniumCXXABI.cpp (+1)
(modified) clang/lib/CodeGen/MicrosoftCXXABI.cpp (+1)
(modified) clang/lib/Driver/ToolChains/Clang.cpp (+4)
(modified) clang/lib/Sema/SemaDeclAttr.cpp (+8)
(modified) clang/lib/Sema/SemaExprCXX.cpp (+5)
(added) clang/test/CodeGen/pfp-attribute-disable.cpp (+33)
(added) clang/test/CodeGen/pfp-load-store.cpp (+40)
(added) clang/test/CodeGen/pfp-memcpy.cpp (+19)
(added) clang/test/CodeGen/pfp-null-init.cpp (+16)
(added) clang/test/CodeGen/pfp-struct-gep.cpp (+25)
(modified) clang/test/CodeGenCXX/trivial_abi.cpp (+1-3)
(modified) libcxx/include/__config (+23)
(modified) libcxx/include/__functional/function.h (+1-1)
(modified) libcxx/include/__memory/shared_ptr.h (+2-2)
(modified) libcxx/include/__memory/unique_ptr.h (+2-2)
(modified) libcxx/include/__tree (+1-1)
(modified) libcxx/include/__type_traits/is_trivially_relocatable.h (+6-2)
(modified) libcxx/include/__vector/vector.h (+1-1)
(modified) libcxx/include/typeinfo (+1-1)
(modified) libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp (+9)
(modified) libcxxabi/include/__cxxabi_config.h (+10)
(modified) libcxxabi/src/private_typeinfo.h (+3-3)
(modified) llvm/include/llvm/Analysis/PtrUseVisitor.h (+15)
(modified) llvm/include/llvm/IR/Intrinsics.td (+23)
(modified) llvm/include/llvm/Transforms/Utils/Local.h (+2)
(modified) llvm/lib/Analysis/PtrUseVisitor.cpp (+2-1)
(modified) llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp (+246)
(modified) llvm/lib/Target/AArch64/AArch64InstrFormats.td (+3)
(modified) llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp (+2-4)
(modified) llvm/lib/Transforms/Scalar/SROA.cpp (+35-5)
(modified) llvm/lib/Transforms/Utils/SimplifyCFG.cpp (+20-5)

diff --git a/clang/include/clang/AST/ASTContext.h b/clang/include/clang/AST/ASTContext.h
index af8c49e99a7ce..abba83e1ff9c4 100644
--- a/clang/include/clang/AST/ASTContext.h
+++ b/clang/include/clang/AST/ASTContext.h
@@ -183,6 +183,12 @@ struct TypeInfoChars {
   }
 };
 
+struct PFPField {
+  CharUnits structOffset;
+  CharUnits offset;
+  FieldDecl *field;
+};
+
 /// Holds long-lived AST nodes (such as types and decls) that can be
 /// referred to throughout the semantic analysis of a file.
 class ASTContext : public RefCountedBase<ASTContext> {
@@ -3618,6 +3624,22 @@ OPT_LIST(V)
 
   StringRef getCUIDHash() const;
 
+  bool isPFPStruct(const RecordDecl *rec) const;
+  void findPFPFields(QualType Ty, CharUnits Offset,
+                     std::vector<PFPField> &Fields, bool IncludeVBases) const;
+  bool hasPFPFields(QualType ty) const;
+  bool isPFPField(const FieldDecl *field) const;
+
+  /// Returns whether this record's PFP fields (if any) are trivially
+  /// relocatable (i.e. may be memcpy'd). This may also return true if the
+  /// record does not have any PFP fields, so it may be necessary for the caller
+  /// to check for PFP fields, e.g. by calling hasPFPFields().
+  bool arePFPFieldsTriviallyRelocatable(const RecordDecl *RD) const;
+
+  llvm::SetVector<const FieldDecl *> PFPFieldsWithEvaluatedOffset;
+  void recordMemberDataPointerEvaluation(const ValueDecl *VD);
+  void recordOffsetOfEvaluation(const OffsetOfExpr *E);
+
 private:
   /// All OMPTraitInfo objects live in this collection, one per
   /// `pragma omp [begin] declare variant` directive.
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 0999d8065e9f5..3d26c2f001812 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -2460,6 +2460,12 @@ def CountedByOrNull : DeclOrTypeAttr {
   let LangOpts = [COnly];
 }
 
+def NoPointerFieldProtection : DeclOrTypeAttr {
+  let Spellings = [Clang<"no_field_protection">];
+  let Subjects = SubjectList<[Field], ErrorDiag>;
+  let Documentation = [Undocumented];
+}
+
 def SizedBy : DeclOrTypeAttr {
   let Spellings = [Clang<"sized_by">];
   let Subjects = SubjectList<[Field], ErrorDiag>;
diff --git a/clang/include/clang/Basic/Features.def b/clang/include/clang/Basic/Features.def
index 05ce214935fad..d4ded24c5a87e 100644
--- a/clang/include/clang/Basic/Features.def
+++ b/clang/include/clang/Basic/Features.def
@@ -262,6 +262,9 @@ FEATURE(shadow_call_stack,
 FEATURE(tls, PP.getTargetInfo().isTLSSupported())
 FEATURE(underlying_type, LangOpts.CPlusPlus)
 FEATURE(experimental_library, LangOpts.ExperimentalLibrary)
+FEATURE(pointer_field_protection,
+        LangOpts.getPointerFieldProtection() !=
+        LangOptions::PointerFieldProtectionKind::None)
 
 // C11 features supported by other languages as extensions.
 EXTENSION(c_alignas, true)
diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def
index 3879cc7942877..8eacb6e066007 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -501,6 +501,9 @@ LANGOPT(RelativeCXXABIVTables, 1, 0,
 LANGOPT(OmitVTableRTTI, 1, 0,
         "Use an ABI-incompatible v-table layout that omits the RTTI component")
 
+ENUM_LANGOPT(PointerFieldProtection, PointerFieldProtectionKind, 2, PointerFieldProtectionKind::None,
+             "Encode struct pointer fields to protect against UAF vulnerabilities")
+
 LANGOPT(VScaleMin, 32, 0, "Minimum vscale value")
 LANGOPT(VScaleMax, 32, 0, "Maximum vscale value")
 
diff --git a/clang/include/clang/Basic/LangOptions.h b/clang/include/clang/Basic/LangOptions.h
index e925e0f3b5d85..797a14038ba4b 100644
--- a/clang/include/clang/Basic/LangOptions.h
+++ b/clang/include/clang/Basic/LangOptions.h
@@ -365,6 +365,17 @@ class LangOptionsBase {
     BKey
   };
 
+  enum class PointerFieldProtectionKind {
+    /// Pointer field protection disabled
+    None,
+    /// Pointer field protection enabled, allocator does not tag heap
+    /// allocations.
+    Untagged,
+    /// Pointer field protection enabled, allocator is expected to tag heap
+    /// allocations.
+    Tagged,
+  };
+
   enum class ThreadModelKind {
     /// POSIX Threads.
     POSIX,
diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def
index 1bf9f43f80986..142e13b3ee784 100644
--- a/clang/include/clang/Basic/TokenKinds.def
+++ b/clang/include/clang/Basic/TokenKinds.def
@@ -542,6 +542,7 @@ TYPE_TRAIT_2(__is_pointer_interconvertible_base_of, IsPointerInterconvertibleBas
 
 // Clang-only C++ Type Traits
 TYPE_TRAIT_1(__is_trivially_relocatable, IsTriviallyRelocatable, KEYCXX)
+TYPE_TRAIT_1(__has_non_relocatable_fields, HasNonRelocatableFields, KEYCXX)
 TYPE_TRAIT_1(__is_trivially_equality_comparable, IsTriviallyEqualityComparable, KEYCXX)
 TYPE_TRAIT_1(__is_bounded_array, IsBoundedArray, KEYCXX)
 TYPE_TRAIT_1(__is_unbounded_array, IsUnboundedArray, KEYCXX)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index a7fcb160d3867..c903e0554319e 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2957,6 +2957,12 @@ defm experimental_omit_vtable_rtti : BoolFOption<"experimental-omit-vtable-rtti"
   NegFlag<SetFalse, [], [CC1Option], "Do not omit">,
   BothFlags<[], [CC1Option], " the RTTI component from virtual tables">>;
 
+def experimental_pointer_field_protection_EQ : Joined<["-"], "fexperimental-pointer-field-protection=">, Group<f_Group>,
+  Visibility<[ClangOption, CC1Option]>,
+  Values<"none,untagged,tagged">, NormalizedValuesScope<"LangOptions::PointerFieldProtectionKind">,
+  NormalizedValues<["None", "Untagged", "Tagged"]>,
+  MarshallingInfoEnum<LangOpts<"PointerFieldProtection">, "None">;
+
 def fcxx_abi_EQ : Joined<["-"], "fc++-abi=">,
                   Group<f_clang_Group>, Visibility<[ClangOption, CC1Option]>,
                   HelpText<"C++ ABI to use. This will override the target C++ ABI.">;
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index c9d1bea4c623a..c3cbfec2c93d3 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -79,6 +79,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Frontend/OpenMP/OMPIRBuilder.h"
 #include "llvm/Support/Capacity.h"
+#include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MD5.h"
@@ -14948,3 +14949,97 @@ bool ASTContext::useAbbreviatedThunkName(GlobalDecl VirtualMethodDecl,
   ThunksToBeAbbreviated[VirtualMethodDecl] = std::move(SimplifiedThunkNames);
   return Result;
 }
+
+bool ASTContext::arePFPFieldsTriviallyRelocatable(const RecordDecl *RD) const {
+  if (getLangOpts().getPointerFieldProtection() ==
+      LangOptions::PointerFieldProtectionKind::Tagged)
+    return !isa<CXXRecordDecl>(RD) ||
+           cast<CXXRecordDecl>(RD)->hasTrivialDestructor();
+  return true;
+}
+
+bool ASTContext::isPFPStruct(const RecordDecl *rec) const {
+  if (getLangOpts().getPointerFieldProtection() !=
+      LangOptions::PointerFieldProtectionKind::None)
+    if (auto *cxxRec = dyn_cast<CXXRecordDecl>(rec))
+      return !cxxRec->isStandardLayout();
+  return false;
+}
+
+void ASTContext::findPFPFields(QualType Ty, CharUnits Offset,
+                               std::vector<PFPField> &Fields,
+                               bool IncludeVBases) const {
+  if (auto *AT = getAsConstantArrayType(Ty)) {
+    if (auto *ElemDecl = AT->getElementType()->getAsCXXRecordDecl()) {
+      const ASTRecordLayout &ElemRL = getASTRecordLayout(ElemDecl);
+      for (unsigned i = 0; i != AT->getSize(); ++i) {
+        findPFPFields(AT->getElementType(), Offset + i * ElemRL.getSize(),
+                      Fields, true);
+      }
+    }
+  }
+  auto *Decl = Ty->getAsCXXRecordDecl();
+  if (!Decl)
+    return;
+  const ASTRecordLayout &RL = getASTRecordLayout(Decl);
+  for (FieldDecl *field : Decl->fields()) {
+    CharUnits fieldOffset =
+        Offset + toCharUnitsFromBits(RL.getFieldOffset(field->getFieldIndex()));
+    if (isPFPField(field))
+      Fields.push_back({Offset, fieldOffset, field});
+    findPFPFields(field->getType(), fieldOffset, Fields, true);
+  }
+  for (auto &Base : Decl->bases()) {
+    if (Base.isVirtual())
+      continue;
+    CharUnits BaseOffset =
+        Offset + RL.getBaseClassOffset(Base.getType()->getAsCXXRecordDecl());
+    findPFPFields(Base.getType(), BaseOffset, Fields, false);
+  }
+  if (IncludeVBases) {
+    for (auto &Base : Decl->vbases()) {
+      CharUnits BaseOffset =
+          Offset + RL.getVBaseClassOffset(Base.getType()->getAsCXXRecordDecl());
+      findPFPFields(Base.getType(), BaseOffset, Fields, false);
+    }
+  }
+}
+
+bool ASTContext::hasPFPFields(QualType ty) const {
+  std::vector<PFPField> pfpFields;
+  findPFPFields(ty, CharUnits::Zero(), pfpFields, true);
+  return !pfpFields.empty();
+}
+
+bool ASTContext::isPFPField(const FieldDecl *field) const {
+  if (!isPFPStruct(field->getParent()))
+    return false;
+  return field->getType()->isPointerType() &&
+         !field->hasAttr<NoPointerFieldProtectionAttr>();
+}
+
+void ASTContext::recordMemberDataPointerEvaluation(const ValueDecl *VD) {
+  if (getLangOpts().getPointerFieldProtection() ==
+      LangOptions::PointerFieldProtectionKind::None)
+    return;
+  auto *FD = dyn_cast<FieldDecl>(VD);
+  if (!FD)
+    FD = cast<FieldDecl>(cast<IndirectFieldDecl>(VD)->chain().back());
+  if (!isPFPField(FD))
+    return;
+  PFPFieldsWithEvaluatedOffset.insert(FD);
+}
+
+void ASTContext::recordOffsetOfEvaluation(const OffsetOfExpr *E) {
+  if (getLangOpts().getPointerFieldProtection() ==
+          LangOptions::PointerFieldProtectionKind::None ||
+      E->getNumComponents() == 0)
+    return;
+  OffsetOfNode Comp = E->getComponent(E->getNumComponents() - 1);
+  if (Comp.getKind() != OffsetOfNode::Field)
+    return;
+  FieldDecl *FD = Comp.getField();
+  if (!isPFPField(FD))
+    return;
+  PFPFieldsWithEvaluatedOffset.insert(FD);
+}
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index 95da7b067b459..51f319fd26f20 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -14932,6 +14932,7 @@ bool IntExprEvaluator::VisitUnaryExprOrTypeTraitExpr(
 }
 
 bool IntExprEvaluator::VisitOffsetOfExpr(const OffsetOfExpr *OOE) {
+  Info.Ctx.recordOffsetOfEvaluation(OOE);
   CharUnits Result;
   unsigned n = OOE->getNumComponents();
   if (n == 0)
diff --git a/clang/lib/AST/Type.cpp b/clang/lib/AST/Type.cpp
index 08798219c0b83..ca7ccc7f2bb5a 100644
--- a/clang/lib/AST/Type.cpp
+++ b/clang/lib/AST/Type.cpp
@@ -2852,7 +2852,9 @@ bool QualType::isTriviallyRelocatableType(const ASTContext &Context) const {
   } else if (!BaseElementType->isObjectType()) {
     return false;
   } else if (const auto *RD = BaseElementType->getAsRecordDecl()) {
-    return RD->canPassInRegisters();
+    return RD->canPassInRegisters() &&
+           (Context.arePFPFieldsTriviallyRelocatable(RD) ||
+            !Context.hasPFPFields(BaseElementType));
   } else if (BaseElementType.isTriviallyCopyableType(Context)) {
     return true;
   } else {
diff --git a/clang/lib/AST/TypePrinter.cpp b/clang/lib/AST/TypePrinter.cpp
index 3982ca3b50604..1382b4a9edfd4 100644
--- a/clang/lib/AST/TypePrinter.cpp
+++ b/clang/lib/AST/TypePrinter.cpp
@@ -2096,6 +2096,9 @@ void TypePrinter::printAttributedAfter(const AttributedType *T,
   case attr::ExtVectorType:
     OS << "ext_vector_type";
     break;
+  case attr::NoPointerFieldProtection:
+    OS << "no_field_protection";
+    break;
   }
   OS << "))";
 }
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 7aa77e55dbfcc..9d824231d02cb 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -1298,7 +1298,8 @@ static llvm::Value *CoerceIntOrPtrToIntOrPtr(llvm::Value *Val,
 /// This safely handles the case when the src type is smaller than the
 /// destination type; in this situation the values of bits which not
 /// present in the src are undefined.
-static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
+static llvm::Value *CreateCoercedLoad(Address Src, QualType SrcFETy,
+                                      llvm::Type *Ty,
                                       CodeGenFunction &CGF) {
   llvm::Type *SrcTy = Src.getElementType();
 
@@ -1306,6 +1307,57 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
   if (SrcTy == Ty)
     return CGF.Builder.CreateLoad(Src);
 
+  // Coercion directly through memory does not work if the structure has pointer
+  // field protection because the struct in registers has a different bit
+  // pattern to the struct in memory, so we must read the elements one by one
+  // and use them to form the coerced structure.
+  std::vector<PFPField> PFPFields;
+  CGF.getContext().findPFPFields(SrcFETy, CharUnits::Zero(), PFPFields, true);
+  if (!PFPFields.empty()) {
+    auto LoadCoercedField = [&](CharUnits Offset,
+                                llvm::Type *FieldType) -> llvm::Value * {
+      if (!PFPFields.empty() && PFPFields[0].offset == Offset) {
+        auto fieldAddr = CGF.EmitAddressOfPFPField(Src, PFPFields[0]);
+        llvm::Value *FieldVal = CGF.Builder.CreateLoad(fieldAddr);
+        if (isa<llvm::IntegerType>(FieldType))
+          FieldVal = CGF.Builder.CreatePtrToInt(FieldVal, FieldType);
+        PFPFields.erase(PFPFields.begin());
+        return FieldVal;
+      }
+      auto FieldAddr = CGF.Builder
+                           .CreateConstInBoundsByteGEP(
+                               Src.withElementType(CGF.Int8Ty), Offset)
+                           .withElementType(FieldType);
+      return CGF.Builder.CreateLoad(FieldAddr);
+    };
+    if (isa<llvm::IntegerType>(Ty) || isa<llvm::PointerType>(Ty)) {
+      auto Addr = CGF.EmitAddressOfPFPField(Src, PFPFields[0]);
+      llvm::Value *Val = CGF.Builder.CreateLoad(Addr);
+      if (isa<llvm::IntegerType>(Ty))
+        Val = CGF.Builder.CreatePtrToInt(Val, Ty);
+      return Val;
+    }
+    if (auto *AT = dyn_cast<llvm::ArrayType>(Ty)) {
+      auto *ET = AT->getElementType();
+      CharUnits wordSize = CGF.getContext().toCharUnitsFromBits(
+          CGF.CGM.getDataLayout().getTypeSizeInBits(ET));
+      CharUnits Offset = CharUnits::Zero();
+      llvm::Value *Val = llvm::UndefValue::get(AT);
+      for (unsigned i = 0; i != AT->getNumElements(); ++i, Offset += wordSize)
+        Val = CGF.Builder.CreateInsertValue(Val, LoadCoercedField(Offset, ET), i);
+      return Val;
+    }
+    auto *ST = cast<llvm::StructType>(Ty);
+    llvm::Value *Val = llvm::UndefValue::get(ST);
+    auto *SL = CGF.CGM.getDataLayout().getStructLayout(ST);
+    for (unsigned i = 0; i != ST->getNumElements(); ++i) {
+      CharUnits Offset = CharUnits::fromQuantity(SL->getElementOffset(i));
+      Val = CGF.Builder.CreateInsertValue(
+          Val, LoadCoercedField(Offset, ST->getElementType(i)), i);
+    }
+    return Val;
+  }
+
   llvm::TypeSize DstSize = CGF.CGM.getDataLayout().getTypeAllocSize(Ty);
 
   if (llvm::StructType *SrcSTy = dyn_cast<llvm::StructType>(SrcTy)) {
@@ -1374,7 +1426,9 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
   return CGF.Builder.CreateLoad(Tmp);
 }
 
-void CodeGenFunction::CreateCoercedStore(llvm::Value *Src, Address Dst,
+void CodeGenFunction::CreateCoercedStore(llvm::Value *Src,
+                                         QualType SrcFETy,
+                                         Address Dst,
                                          llvm::TypeSize DstSize,
                                          bool DstIsVolatile) {
   if (!DstSize)
@@ -1395,6 +1449,52 @@ void CodeGenFunction::CreateCoercedStore(llvm::Value *Src, Address Dst,
     }
   }
 
+  // Coercion directly through memory does not work if the structure has pointer
+  // field protection because the struct passed by value has a different bit
+  // pattern to the struct in memory, so we must read the elements one by one
+  // and use them to form the coerced structure.
+  std::vector<PFPField> PFPFields;
+  getContext().findPFPFields(SrcFETy, CharUnits::Zero(), PFPFields, true);
+  if (!PFPFields.empty()) {
+    auto StoreCoercedField = [&](CharUnits Offset, llvm::Value *FieldVal) {
+      if (!PFPFields.empty() && PFPFields[0].offset == Offset) {
+        auto fieldAddr = EmitAddressOfPFPField(Dst, PFPFields[0]);
+        if (isa<llvm::IntegerType>(FieldVal->getType()))
+          FieldVal = Builder.CreateIntToPtr(FieldVal, VoidPtrTy);
+        Builder.CreateStore(FieldVal, fieldAddr);
+        PFPFields.erase(PFPFields.begin());
+      } else {
+        auto fieldAddr =
+            Builder
+                .CreateConstInBoundsByteGEP(Dst.withElementType(Int8Ty), Offset)
+                .withElementType(FieldVal->getType());
+        Builder.CreateStore(FieldVal, fieldAddr);
+      }
+    };
+
+    if (isa<llvm::IntegerType>(SrcTy) || isa<llvm::PointerType>(SrcTy)) {
+      if (isa<llvm::IntegerType>(SrcTy))
+        Src = Builder.CreateIntToPtr(Src, VoidPtrTy);
+      auto Addr = EmitAddressOfPFPField(Dst, PFPFields[0]);
+      Builder.CreateStore(Src, Addr);
+    } else if (auto *at = dyn_cast<llvm::ArrayType>(SrcTy)) {
+      auto *et = at->getElementType();
+      CharUnits wordSize = getContext().toCharUnitsFromBits(
+          CGM.getDataLayout().getTypeSizeInBits(et));
+      CharUnits Offset = CharUnits::Zero();
+      for (unsigned i = 0; i != at->getNumElements(); ++i, Offset += wordSize)
+        StoreCoercedField(Offset, Builder.CreateExtractValue(Src, i));
+    } else {
+      auto *ST = cast<llvm::StructType>(SrcTy);
+      auto *SL = CGM.getDataLayout().getStructLayout(ST);
+      for (unsigned i = 0; i != ST->getNumElements(); ++i) {
+        CharUnits Offset = CharUnits::fromQuantity(SL->getElementOffset(i));
+        StoreCoercedField(Offset, Builder.CreateExtractValue(Src, i));
+      }
+    }
+    return;
+  }
+
   if (SrcSize.isScalable() || SrcSize <= DstSize) {
     if (SrcTy->isIntegerTy() && Dst.getElementType()->isPointerTy() &&
         SrcSize == CGM.getDataLayout().getTypeAllocSize(Dst.getElementType())) {
@@ -3347,7 +3447,7 @@ void CodeGenFunction::EmitFunctionProlog(const CGFunctionInfo &FI,
         auto AI = Fn->getArg(FirstIRArg);
         AI->setName(Arg->getName() + ".coerce");
         CreateCoercedStore(
-            AI, Ptr,
+            AI, Ty, Ptr,
             llvm::TypeSize::getFixed(
                 getContext().getTypeSizeInChars(Ty).getQuantity() -
                 ArgI.getDirectOffset()),
@@ -3969,7 +4069,7 @@ void CodeGenFunction::EmitFunctionEpilog(const CGFunctionInfo &FI,
       // If the value is offset in memory, apply the offset now.
       Address V = emitAddressAtOffset(*this, ReturnValue, RetAI);
 
-      RV = CreateCoercedLoad(V, RetAI.getCoerceToType(), *this);
+      RV = CreateCoercedLoad(V, RetTy, RetAI.getCoerceToType(), *this);
     }
 
     // In ARC, end functions that return a retainable type with a call
@@ -4020,6 +4120,7 @@ void CodeGenFunction::EmitFunctionEpilog(const CGFunctionInfo &FI,
       auto eltAddr = Builder.CreateStructGEP(addr, i);
       llvm::Value *elt = CreateCoercedLoad(
           eltAddr,
+          RetTy,
           unpaddedStruct ? unpaddedStruct->getElementType(unpaddedIndex++)
                          : unpaddedCoercionType,
           *this);
@@ -5550,7 +5651,7 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
         // In the simple case, just pass the coerced loaded value.
         assert(NumIRArgs == 1);
         llvm::Value *Load =
-            CreateCoercedLoad(Src, ArgInfo.getCoerceToType(), *this);
+            CreateCoercedLoad(Src, I->Ty, ArgInfo.getCoerceToType(), *this);
 
         if (CallInfo.isCmseNSCall()) {
           // For certain parameter types, clear padding bits, as they may reveal
@@ -5611,6 +5712,7 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
         Address eltAddr = Builder.CreateStructGEP(addr, i);
         llvm::Value *elt = CreateCoercedLoad(
             eltAddr,
+            I->Ty,
             unpaddedStruct ? unpaddedStruct->getElementType(unpaddedIndex++)
                            : unpaddedCoercionType,
             *this);
@@ -6105,7 +6207,7 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo...
[truncated]

llvmbot · Mar 28, 2025

@llvm/pr-subscribers-clang-driver

Author: Peter Collingbourne (pcc)

Changes

Pointer field protection is a use-after-free vulnerability
mitigation that works by changing how data structures' pointer
fields are stored in memory. For more information, see the RFC:
https://discourse.llvm.org/t/rfc-structure-protection-a-family-of-uaf-mitigation-techniques/85555

TODO:

Fix test failure.
Add more tests.
Add documentation.

Patch is 82.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133538.diff

52 Files Affected:

(modified) clang/include/clang/AST/ASTContext.h (+22)
(modified) clang/include/clang/Basic/Attr.td (+6)
(modified) clang/include/clang/Basic/Features.def (+3)
(modified) clang/include/clang/Basic/LangOptions.def (+3)
(modified) clang/include/clang/Basic/LangOptions.h (+11)
(modified) clang/include/clang/Basic/TokenKinds.def (+1)
(modified) clang/include/clang/Driver/Options.td (+6)
(modified) clang/lib/AST/ASTContext.cpp (+95)
(modified) clang/lib/AST/ExprConstant.cpp (+1)
(modified) clang/lib/AST/Type.cpp (+3-1)
(modified) clang/lib/AST/TypePrinter.cpp (+3)
(modified) clang/lib/CodeGen/CGCall.cpp (+108-6)
(modified) clang/lib/CodeGen/CGClass.cpp (+45-7)
(modified) clang/lib/CodeGen/CGExpr.cpp (+7-1)
(modified) clang/lib/CodeGen/CGExprAgg.cpp (+1-1)
(modified) clang/lib/CodeGen/CGExprCXX.cpp (+10)
(modified) clang/lib/CodeGen/CGExprConstant.cpp (+38-1)
(modified) clang/lib/CodeGen/CodeGenFunction.cpp (+38-5)
(modified) clang/lib/CodeGen/CodeGenFunction.h (+6-2)
(modified) clang/lib/CodeGen/CodeGenModule.cpp (+39)
(modified) clang/lib/CodeGen/CodeGenModule.h (+7)
(modified) clang/lib/CodeGen/ItaniumCXXABI.cpp (+1)
(modified) clang/lib/CodeGen/MicrosoftCXXABI.cpp (+1)
(modified) clang/lib/Driver/ToolChains/Clang.cpp (+4)
(modified) clang/lib/Sema/SemaDeclAttr.cpp (+8)
(modified) clang/lib/Sema/SemaExprCXX.cpp (+5)
(added) clang/test/CodeGen/pfp-attribute-disable.cpp (+33)
(added) clang/test/CodeGen/pfp-load-store.cpp (+40)
(added) clang/test/CodeGen/pfp-memcpy.cpp (+19)
(added) clang/test/CodeGen/pfp-null-init.cpp (+16)
(added) clang/test/CodeGen/pfp-struct-gep.cpp (+25)
(modified) clang/test/CodeGenCXX/trivial_abi.cpp (+1-3)
(modified) libcxx/include/__config (+23)
(modified) libcxx/include/__functional/function.h (+1-1)
(modified) libcxx/include/__memory/shared_ptr.h (+2-2)
(modified) libcxx/include/__memory/unique_ptr.h (+2-2)
(modified) libcxx/include/__tree (+1-1)
(modified) libcxx/include/__type_traits/is_trivially_relocatable.h (+6-2)
(modified) libcxx/include/__vector/vector.h (+1-1)
(modified) libcxx/include/typeinfo (+1-1)
(modified) libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp (+9)
(modified) libcxxabi/include/__cxxabi_config.h (+10)
(modified) libcxxabi/src/private_typeinfo.h (+3-3)
(modified) llvm/include/llvm/Analysis/PtrUseVisitor.h (+15)
(modified) llvm/include/llvm/IR/Intrinsics.td (+23)
(modified) llvm/include/llvm/Transforms/Utils/Local.h (+2)
(modified) llvm/lib/Analysis/PtrUseVisitor.cpp (+2-1)
(modified) llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp (+246)
(modified) llvm/lib/Target/AArch64/AArch64InstrFormats.td (+3)
(modified) llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp (+2-4)
(modified) llvm/lib/Transforms/Scalar/SROA.cpp (+35-5)
(modified) llvm/lib/Transforms/Utils/SimplifyCFG.cpp (+20-5)

diff --git a/clang/include/clang/AST/ASTContext.h b/clang/include/clang/AST/ASTContext.h
index af8c49e99a7ce..abba83e1ff9c4 100644
--- a/clang/include/clang/AST/ASTContext.h
+++ b/clang/include/clang/AST/ASTContext.h
@@ -183,6 +183,12 @@ struct TypeInfoChars {
   }
 };
 
+struct PFPField {
+  CharUnits structOffset;
+  CharUnits offset;
+  FieldDecl *field;
+};
+
 /// Holds long-lived AST nodes (such as types and decls) that can be
 /// referred to throughout the semantic analysis of a file.
 class ASTContext : public RefCountedBase<ASTContext> {
@@ -3618,6 +3624,22 @@ OPT_LIST(V)
 
   StringRef getCUIDHash() const;
 
+  bool isPFPStruct(const RecordDecl *rec) const;
+  void findPFPFields(QualType Ty, CharUnits Offset,
+                     std::vector<PFPField> &Fields, bool IncludeVBases) const;
+  bool hasPFPFields(QualType ty) const;
+  bool isPFPField(const FieldDecl *field) const;
+
+  /// Returns whether this record's PFP fields (if any) are trivially
+  /// relocatable (i.e. may be memcpy'd). This may also return true if the
+  /// record does not have any PFP fields, so it may be necessary for the caller
+  /// to check for PFP fields, e.g. by calling hasPFPFields().
+  bool arePFPFieldsTriviallyRelocatable(const RecordDecl *RD) const;
+
+  llvm::SetVector<const FieldDecl *> PFPFieldsWithEvaluatedOffset;
+  void recordMemberDataPointerEvaluation(const ValueDecl *VD);
+  void recordOffsetOfEvaluation(const OffsetOfExpr *E);
+
 private:
   /// All OMPTraitInfo objects live in this collection, one per
   /// `pragma omp [begin] declare variant` directive.
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 0999d8065e9f5..3d26c2f001812 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -2460,6 +2460,12 @@ def CountedByOrNull : DeclOrTypeAttr {
   let LangOpts = [COnly];
 }
 
+def NoPointerFieldProtection : DeclOrTypeAttr {
+  let Spellings = [Clang<"no_field_protection">];
+  let Subjects = SubjectList<[Field], ErrorDiag>;
+  let Documentation = [Undocumented];
+}
+
 def SizedBy : DeclOrTypeAttr {
   let Spellings = [Clang<"sized_by">];
   let Subjects = SubjectList<[Field], ErrorDiag>;
diff --git a/clang/include/clang/Basic/Features.def b/clang/include/clang/Basic/Features.def
index 05ce214935fad..d4ded24c5a87e 100644
--- a/clang/include/clang/Basic/Features.def
+++ b/clang/include/clang/Basic/Features.def
@@ -262,6 +262,9 @@ FEATURE(shadow_call_stack,
 FEATURE(tls, PP.getTargetInfo().isTLSSupported())
 FEATURE(underlying_type, LangOpts.CPlusPlus)
 FEATURE(experimental_library, LangOpts.ExperimentalLibrary)
+FEATURE(pointer_field_protection,
+        LangOpts.getPointerFieldProtection() !=
+        LangOptions::PointerFieldProtectionKind::None)
 
 // C11 features supported by other languages as extensions.
 EXTENSION(c_alignas, true)
diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def
index 3879cc7942877..8eacb6e066007 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -501,6 +501,9 @@ LANGOPT(RelativeCXXABIVTables, 1, 0,
 LANGOPT(OmitVTableRTTI, 1, 0,
         "Use an ABI-incompatible v-table layout that omits the RTTI component")
 
+ENUM_LANGOPT(PointerFieldProtection, PointerFieldProtectionKind, 2, PointerFieldProtectionKind::None,
+             "Encode struct pointer fields to protect against UAF vulnerabilities")
+
 LANGOPT(VScaleMin, 32, 0, "Minimum vscale value")
 LANGOPT(VScaleMax, 32, 0, "Maximum vscale value")
 
diff --git a/clang/include/clang/Basic/LangOptions.h b/clang/include/clang/Basic/LangOptions.h
index e925e0f3b5d85..797a14038ba4b 100644
--- a/clang/include/clang/Basic/LangOptions.h
+++ b/clang/include/clang/Basic/LangOptions.h
@@ -365,6 +365,17 @@ class LangOptionsBase {
     BKey
   };
 
+  enum class PointerFieldProtectionKind {
+    /// Pointer field protection disabled
+    None,
+    /// Pointer field protection enabled, allocator does not tag heap
+    /// allocations.
+    Untagged,
+    /// Pointer field protection enabled, allocator is expected to tag heap
+    /// allocations.
+    Tagged,
+  };
+
   enum class ThreadModelKind {
     /// POSIX Threads.
     POSIX,
diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def
index 1bf9f43f80986..142e13b3ee784 100644
--- a/clang/include/clang/Basic/TokenKinds.def
+++ b/clang/include/clang/Basic/TokenKinds.def
@@ -542,6 +542,7 @@ TYPE_TRAIT_2(__is_pointer_interconvertible_base_of, IsPointerInterconvertibleBas
 
 // Clang-only C++ Type Traits
 TYPE_TRAIT_1(__is_trivially_relocatable, IsTriviallyRelocatable, KEYCXX)
+TYPE_TRAIT_1(__has_non_relocatable_fields, HasNonRelocatableFields, KEYCXX)
 TYPE_TRAIT_1(__is_trivially_equality_comparable, IsTriviallyEqualityComparable, KEYCXX)
 TYPE_TRAIT_1(__is_bounded_array, IsBoundedArray, KEYCXX)
 TYPE_TRAIT_1(__is_unbounded_array, IsUnboundedArray, KEYCXX)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index a7fcb160d3867..c903e0554319e 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2957,6 +2957,12 @@ defm experimental_omit_vtable_rtti : BoolFOption<"experimental-omit-vtable-rtti"
   NegFlag<SetFalse, [], [CC1Option], "Do not omit">,
   BothFlags<[], [CC1Option], " the RTTI component from virtual tables">>;
 
+def experimental_pointer_field_protection_EQ : Joined<["-"], "fexperimental-pointer-field-protection=">, Group<f_Group>,
+  Visibility<[ClangOption, CC1Option]>,
+  Values<"none,untagged,tagged">, NormalizedValuesScope<"LangOptions::PointerFieldProtectionKind">,
+  NormalizedValues<["None", "Untagged", "Tagged"]>,
+  MarshallingInfoEnum<LangOpts<"PointerFieldProtection">, "None">;
+
 def fcxx_abi_EQ : Joined<["-"], "fc++-abi=">,
                   Group<f_clang_Group>, Visibility<[ClangOption, CC1Option]>,
                   HelpText<"C++ ABI to use. This will override the target C++ ABI.">;
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index c9d1bea4c623a..c3cbfec2c93d3 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -79,6 +79,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Frontend/OpenMP/OMPIRBuilder.h"
 #include "llvm/Support/Capacity.h"
+#include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MD5.h"
@@ -14948,3 +14949,97 @@ bool ASTContext::useAbbreviatedThunkName(GlobalDecl VirtualMethodDecl,
   ThunksToBeAbbreviated[VirtualMethodDecl] = std::move(SimplifiedThunkNames);
   return Result;
 }
+
+bool ASTContext::arePFPFieldsTriviallyRelocatable(const RecordDecl *RD) const {
+  if (getLangOpts().getPointerFieldProtection() ==
+      LangOptions::PointerFieldProtectionKind::Tagged)
+    return !isa<CXXRecordDecl>(RD) ||
+           cast<CXXRecordDecl>(RD)->hasTrivialDestructor();
+  return true;
+}
+
+bool ASTContext::isPFPStruct(const RecordDecl *rec) const {
+  if (getLangOpts().getPointerFieldProtection() !=
+      LangOptions::PointerFieldProtectionKind::None)
+    if (auto *cxxRec = dyn_cast<CXXRecordDecl>(rec))
+      return !cxxRec->isStandardLayout();
+  return false;
+}
+
+void ASTContext::findPFPFields(QualType Ty, CharUnits Offset,
+                               std::vector<PFPField> &Fields,
+                               bool IncludeVBases) const {
+  if (auto *AT = getAsConstantArrayType(Ty)) {
+    if (auto *ElemDecl = AT->getElementType()->getAsCXXRecordDecl()) {
+      const ASTRecordLayout &ElemRL = getASTRecordLayout(ElemDecl);
+      for (unsigned i = 0; i != AT->getSize(); ++i) {
+        findPFPFields(AT->getElementType(), Offset + i * ElemRL.getSize(),
+                      Fields, true);
+      }
+    }
+  }
+  auto *Decl = Ty->getAsCXXRecordDecl();
+  if (!Decl)
+    return;
+  const ASTRecordLayout &RL = getASTRecordLayout(Decl);
+  for (FieldDecl *field : Decl->fields()) {
+    CharUnits fieldOffset =
+        Offset + toCharUnitsFromBits(RL.getFieldOffset(field->getFieldIndex()));
+    if (isPFPField(field))
+      Fields.push_back({Offset, fieldOffset, field});
+    findPFPFields(field->getType(), fieldOffset, Fields, true);
+  }
+  for (auto &Base : Decl->bases()) {
+    if (Base.isVirtual())
+      continue;
+    CharUnits BaseOffset =
+        Offset + RL.getBaseClassOffset(Base.getType()->getAsCXXRecordDecl());
+    findPFPFields(Base.getType(), BaseOffset, Fields, false);
+  }
+  if (IncludeVBases) {
+    for (auto &Base : Decl->vbases()) {
+      CharUnits BaseOffset =
+          Offset + RL.getVBaseClassOffset(Base.getType()->getAsCXXRecordDecl());
+      findPFPFields(Base.getType(), BaseOffset, Fields, false);
+    }
+  }
+}
+
+bool ASTContext::hasPFPFields(QualType ty) const {
+  std::vector<PFPField> pfpFields;
+  findPFPFields(ty, CharUnits::Zero(), pfpFields, true);
+  return !pfpFields.empty();
+}
+
+bool ASTContext::isPFPField(const FieldDecl *field) const {
+  if (!isPFPStruct(field->getParent()))
+    return false;
+  return field->getType()->isPointerType() &&
+         !field->hasAttr<NoPointerFieldProtectionAttr>();
+}
+
+void ASTContext::recordMemberDataPointerEvaluation(const ValueDecl *VD) {
+  if (getLangOpts().getPointerFieldProtection() ==
+      LangOptions::PointerFieldProtectionKind::None)
+    return;
+  auto *FD = dyn_cast<FieldDecl>(VD);
+  if (!FD)
+    FD = cast<FieldDecl>(cast<IndirectFieldDecl>(VD)->chain().back());
+  if (!isPFPField(FD))
+    return;
+  PFPFieldsWithEvaluatedOffset.insert(FD);
+}
+
+void ASTContext::recordOffsetOfEvaluation(const OffsetOfExpr *E) {
+  if (getLangOpts().getPointerFieldProtection() ==
+          LangOptions::PointerFieldProtectionKind::None ||
+      E->getNumComponents() == 0)
+    return;
+  OffsetOfNode Comp = E->getComponent(E->getNumComponents() - 1);
+  if (Comp.getKind() != OffsetOfNode::Field)
+    return;
+  FieldDecl *FD = Comp.getField();
+  if (!isPFPField(FD))
+    return;
+  PFPFieldsWithEvaluatedOffset.insert(FD);
+}
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index 95da7b067b459..51f319fd26f20 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -14932,6 +14932,7 @@ bool IntExprEvaluator::VisitUnaryExprOrTypeTraitExpr(
 }
 
 bool IntExprEvaluator::VisitOffsetOfExpr(const OffsetOfExpr *OOE) {
+  Info.Ctx.recordOffsetOfEvaluation(OOE);
   CharUnits Result;
   unsigned n = OOE->getNumComponents();
   if (n == 0)
diff --git a/clang/lib/AST/Type.cpp b/clang/lib/AST/Type.cpp
index 08798219c0b83..ca7ccc7f2bb5a 100644
--- a/clang/lib/AST/Type.cpp
+++ b/clang/lib/AST/Type.cpp
@@ -2852,7 +2852,9 @@ bool QualType::isTriviallyRelocatableType(const ASTContext &Context) const {
   } else if (!BaseElementType->isObjectType()) {
     return false;
   } else if (const auto *RD = BaseElementType->getAsRecordDecl()) {
-    return RD->canPassInRegisters();
+    return RD->canPassInRegisters() &&
+           (Context.arePFPFieldsTriviallyRelocatable(RD) ||
+            !Context.hasPFPFields(BaseElementType));
   } else if (BaseElementType.isTriviallyCopyableType(Context)) {
     return true;
   } else {
diff --git a/clang/lib/AST/TypePrinter.cpp b/clang/lib/AST/TypePrinter.cpp
index 3982ca3b50604..1382b4a9edfd4 100644
--- a/clang/lib/AST/TypePrinter.cpp
+++ b/clang/lib/AST/TypePrinter.cpp
@@ -2096,6 +2096,9 @@ void TypePrinter::printAttributedAfter(const AttributedType *T,
   case attr::ExtVectorType:
     OS << "ext_vector_type";
     break;
+  case attr::NoPointerFieldProtection:
+    OS << "no_field_protection";
+    break;
   }
   OS << "))";
 }
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 7aa77e55dbfcc..9d824231d02cb 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -1298,7 +1298,8 @@ static llvm::Value *CoerceIntOrPtrToIntOrPtr(llvm::Value *Val,
 /// This safely handles the case when the src type is smaller than the
 /// destination type; in this situation the values of bits which not
 /// present in the src are undefined.
-static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
+static llvm::Value *CreateCoercedLoad(Address Src, QualType SrcFETy,
+                                      llvm::Type *Ty,
                                       CodeGenFunction &CGF) {
   llvm::Type *SrcTy = Src.getElementType();
 
@@ -1306,6 +1307,57 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
   if (SrcTy == Ty)
     return CGF.Builder.CreateLoad(Src);
 
+  // Coercion directly through memory does not work if the structure has pointer
+  // field protection because the struct in registers has a different bit
+  // pattern to the struct in memory, so we must read the elements one by one
+  // and use them to form the coerced structure.
+  std::vector<PFPField> PFPFields;
+  CGF.getContext().findPFPFields(SrcFETy, CharUnits::Zero(), PFPFields, true);
+  if (!PFPFields.empty()) {
+    auto LoadCoercedField = [&](CharUnits Offset,
+                                llvm::Type *FieldType) -> llvm::Value * {
+      if (!PFPFields.empty() && PFPFields[0].offset == Offset) {
+        auto fieldAddr = CGF.EmitAddressOfPFPField(Src, PFPFields[0]);
+        llvm::Value *FieldVal = CGF.Builder.CreateLoad(fieldAddr);
+        if (isa<llvm::IntegerType>(FieldType))
+          FieldVal = CGF.Builder.CreatePtrToInt(FieldVal, FieldType);
+        PFPFields.erase(PFPFields.begin());
+        return FieldVal;
+      }
+      auto FieldAddr = CGF.Builder
+                           .CreateConstInBoundsByteGEP(
+                               Src.withElementType(CGF.Int8Ty), Offset)
+                           .withElementType(FieldType);
+      return CGF.Builder.CreateLoad(FieldAddr);
+    };
+    if (isa<llvm::IntegerType>(Ty) || isa<llvm::PointerType>(Ty)) {
+      auto Addr = CGF.EmitAddressOfPFPField(Src, PFPFields[0]);
+      llvm::Value *Val = CGF.Builder.CreateLoad(Addr);
+      if (isa<llvm::IntegerType>(Ty))
+        Val = CGF.Builder.CreatePtrToInt(Val, Ty);
+      return Val;
+    }
+    if (auto *AT = dyn_cast<llvm::ArrayType>(Ty)) {
+      auto *ET = AT->getElementType();
+      CharUnits wordSize = CGF.getContext().toCharUnitsFromBits(
+          CGF.CGM.getDataLayout().getTypeSizeInBits(ET));
+      CharUnits Offset = CharUnits::Zero();
+      llvm::Value *Val = llvm::UndefValue::get(AT);
+      for (unsigned i = 0; i != AT->getNumElements(); ++i, Offset += wordSize)
+        Val = CGF.Builder.CreateInsertValue(Val, LoadCoercedField(Offset, ET), i);
+      return Val;
+    }
+    auto *ST = cast<llvm::StructType>(Ty);
+    llvm::Value *Val = llvm::UndefValue::get(ST);
+    auto *SL = CGF.CGM.getDataLayout().getStructLayout(ST);
+    for (unsigned i = 0; i != ST->getNumElements(); ++i) {
+      CharUnits Offset = CharUnits::fromQuantity(SL->getElementOffset(i));
+      Val = CGF.Builder.CreateInsertValue(
+          Val, LoadCoercedField(Offset, ST->getElementType(i)), i);
+    }
+    return Val;
+  }
+
   llvm::TypeSize DstSize = CGF.CGM.getDataLayout().getTypeAllocSize(Ty);
 
   if (llvm::StructType *SrcSTy = dyn_cast<llvm::StructType>(SrcTy)) {
@@ -1374,7 +1426,9 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
   return CGF.Builder.CreateLoad(Tmp);
 }
 
-void CodeGenFunction::CreateCoercedStore(llvm::Value *Src, Address Dst,
+void CodeGenFunction::CreateCoercedStore(llvm::Value *Src,
+                                         QualType SrcFETy,
+                                         Address Dst,
                                          llvm::TypeSize DstSize,
                                          bool DstIsVolatile) {
   if (!DstSize)
@@ -1395,6 +1449,52 @@ void CodeGenFunction::CreateCoercedStore(llvm::Value *Src, Address Dst,
     }
   }
 
+  // Coercion directly through memory does not work if the structure has pointer
+  // field protection because the struct passed by value has a different bit
+  // pattern to the struct in memory, so we must read the elements one by one
+  // and use them to form the coerced structure.
+  std::vector<PFPField> PFPFields;
+  getContext().findPFPFields(SrcFETy, CharUnits::Zero(), PFPFields, true);
+  if (!PFPFields.empty()) {
+    auto StoreCoercedField = [&](CharUnits Offset, llvm::Value *FieldVal) {
+      if (!PFPFields.empty() && PFPFields[0].offset == Offset) {
+        auto fieldAddr = EmitAddressOfPFPField(Dst, PFPFields[0]);
+        if (isa<llvm::IntegerType>(FieldVal->getType()))
+          FieldVal = Builder.CreateIntToPtr(FieldVal, VoidPtrTy);
+        Builder.CreateStore(FieldVal, fieldAddr);
+        PFPFields.erase(PFPFields.begin());
+      } else {
+        auto fieldAddr =
+            Builder
+                .CreateConstInBoundsByteGEP(Dst.withElementType(Int8Ty), Offset)
+                .withElementType(FieldVal->getType());
+        Builder.CreateStore(FieldVal, fieldAddr);
+      }
+    };
+
+    if (isa<llvm::IntegerType>(SrcTy) || isa<llvm::PointerType>(SrcTy)) {
+      if (isa<llvm::IntegerType>(SrcTy))
+        Src = Builder.CreateIntToPtr(Src, VoidPtrTy);
+      auto Addr = EmitAddressOfPFPField(Dst, PFPFields[0]);
+      Builder.CreateStore(Src, Addr);
+    } else if (auto *at = dyn_cast<llvm::ArrayType>(SrcTy)) {
+      auto *et = at->getElementType();
+      CharUnits wordSize = getContext().toCharUnitsFromBits(
+          CGM.getDataLayout().getTypeSizeInBits(et));
+      CharUnits Offset = CharUnits::Zero();
+      for (unsigned i = 0; i != at->getNumElements(); ++i, Offset += wordSize)
+        StoreCoercedField(Offset, Builder.CreateExtractValue(Src, i));
+    } else {
+      auto *ST = cast<llvm::StructType>(SrcTy);
+      auto *SL = CGM.getDataLayout().getStructLayout(ST);
+      for (unsigned i = 0; i != ST->getNumElements(); ++i) {
+        CharUnits Offset = CharUnits::fromQuantity(SL->getElementOffset(i));
+        StoreCoercedField(Offset, Builder.CreateExtractValue(Src, i));
+      }
+    }
+    return;
+  }
+
   if (SrcSize.isScalable() || SrcSize <= DstSize) {
     if (SrcTy->isIntegerTy() && Dst.getElementType()->isPointerTy() &&
         SrcSize == CGM.getDataLayout().getTypeAllocSize(Dst.getElementType())) {
@@ -3347,7 +3447,7 @@ void CodeGenFunction::EmitFunctionProlog(const CGFunctionInfo &FI,
         auto AI = Fn->getArg(FirstIRArg);
         AI->setName(Arg->getName() + ".coerce");
         CreateCoercedStore(
-            AI, Ptr,
+            AI, Ty, Ptr,
             llvm::TypeSize::getFixed(
                 getContext().getTypeSizeInChars(Ty).getQuantity() -
                 ArgI.getDirectOffset()),
@@ -3969,7 +4069,7 @@ void CodeGenFunction::EmitFunctionEpilog(const CGFunctionInfo &FI,
       // If the value is offset in memory, apply the offset now.
       Address V = emitAddressAtOffset(*this, ReturnValue, RetAI);
 
-      RV = CreateCoercedLoad(V, RetAI.getCoerceToType(), *this);
+      RV = CreateCoercedLoad(V, RetTy, RetAI.getCoerceToType(), *this);
     }
 
     // In ARC, end functions that return a retainable type with a call
@@ -4020,6 +4120,7 @@ void CodeGenFunction::EmitFunctionEpilog(const CGFunctionInfo &FI,
       auto eltAddr = Builder.CreateStructGEP(addr, i);
       llvm::Value *elt = CreateCoercedLoad(
           eltAddr,
+          RetTy,
           unpaddedStruct ? unpaddedStruct->getElementType(unpaddedIndex++)
                          : unpaddedCoercionType,
           *this);
@@ -5550,7 +5651,7 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
         // In the simple case, just pass the coerced loaded value.
         assert(NumIRArgs == 1);
         llvm::Value *Load =
-            CreateCoercedLoad(Src, ArgInfo.getCoerceToType(), *this);
+            CreateCoercedLoad(Src, I->Ty, ArgInfo.getCoerceToType(), *this);
 
         if (CallInfo.isCmseNSCall()) {
           // For certain parameter types, clear padding bits, as they may reveal
@@ -5611,6 +5712,7 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
         Address eltAddr = Builder.CreateStructGEP(addr, i);
         llvm::Value *elt = CreateCoercedLoad(
             eltAddr,
+            I->Ty,
             unpaddedStruct ? unpaddedStruct->getElementType(unpaddedIndex++)
                            : unpaddedCoercionType,
             *this);
@@ -6105,7 +6207,7 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo...
[truncated]

llvmbot · Mar 28, 2025

@llvm/pr-subscribers-clang

Author: Peter Collingbourne (pcc)

Changes

Pointer field protection is a use-after-free vulnerability
mitigation that works by changing how data structures' pointer
fields are stored in memory. For more information, see the RFC:
https://discourse.llvm.org/t/rfc-structure-protection-a-family-of-uaf-mitigation-techniques/85555

TODO:

Fix test failure.
Add more tests.
Add documentation.

Patch is 82.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133538.diff

52 Files Affected:

(modified) clang/include/clang/AST/ASTContext.h (+22)
(modified) clang/include/clang/Basic/Attr.td (+6)
(modified) clang/include/clang/Basic/Features.def (+3)
(modified) clang/include/clang/Basic/LangOptions.def (+3)
(modified) clang/include/clang/Basic/LangOptions.h (+11)
(modified) clang/include/clang/Basic/TokenKinds.def (+1)
(modified) clang/include/clang/Driver/Options.td (+6)
(modified) clang/lib/AST/ASTContext.cpp (+95)
(modified) clang/lib/AST/ExprConstant.cpp (+1)
(modified) clang/lib/AST/Type.cpp (+3-1)
(modified) clang/lib/AST/TypePrinter.cpp (+3)
(modified) clang/lib/CodeGen/CGCall.cpp (+108-6)
(modified) clang/lib/CodeGen/CGClass.cpp (+45-7)
(modified) clang/lib/CodeGen/CGExpr.cpp (+7-1)
(modified) clang/lib/CodeGen/CGExprAgg.cpp (+1-1)
(modified) clang/lib/CodeGen/CGExprCXX.cpp (+10)
(modified) clang/lib/CodeGen/CGExprConstant.cpp (+38-1)
(modified) clang/lib/CodeGen/CodeGenFunction.cpp (+38-5)
(modified) clang/lib/CodeGen/CodeGenFunction.h (+6-2)
(modified) clang/lib/CodeGen/CodeGenModule.cpp (+39)
(modified) clang/lib/CodeGen/CodeGenModule.h (+7)
(modified) clang/lib/CodeGen/ItaniumCXXABI.cpp (+1)
(modified) clang/lib/CodeGen/MicrosoftCXXABI.cpp (+1)
(modified) clang/lib/Driver/ToolChains/Clang.cpp (+4)
(modified) clang/lib/Sema/SemaDeclAttr.cpp (+8)
(modified) clang/lib/Sema/SemaExprCXX.cpp (+5)
(added) clang/test/CodeGen/pfp-attribute-disable.cpp (+33)
(added) clang/test/CodeGen/pfp-load-store.cpp (+40)
(added) clang/test/CodeGen/pfp-memcpy.cpp (+19)
(added) clang/test/CodeGen/pfp-null-init.cpp (+16)
(added) clang/test/CodeGen/pfp-struct-gep.cpp (+25)
(modified) clang/test/CodeGenCXX/trivial_abi.cpp (+1-3)
(modified) libcxx/include/__config (+23)
(modified) libcxx/include/__functional/function.h (+1-1)
(modified) libcxx/include/__memory/shared_ptr.h (+2-2)
(modified) libcxx/include/__memory/unique_ptr.h (+2-2)
(modified) libcxx/include/__tree (+1-1)
(modified) libcxx/include/__type_traits/is_trivially_relocatable.h (+6-2)
(modified) libcxx/include/__vector/vector.h (+1-1)
(modified) libcxx/include/typeinfo (+1-1)
(modified) libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp (+9)
(modified) libcxxabi/include/__cxxabi_config.h (+10)
(modified) libcxxabi/src/private_typeinfo.h (+3-3)
(modified) llvm/include/llvm/Analysis/PtrUseVisitor.h (+15)
(modified) llvm/include/llvm/IR/Intrinsics.td (+23)
(modified) llvm/include/llvm/Transforms/Utils/Local.h (+2)
(modified) llvm/lib/Analysis/PtrUseVisitor.cpp (+2-1)
(modified) llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp (+246)
(modified) llvm/lib/Target/AArch64/AArch64InstrFormats.td (+3)
(modified) llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp (+2-4)
(modified) llvm/lib/Transforms/Scalar/SROA.cpp (+35-5)
(modified) llvm/lib/Transforms/Utils/SimplifyCFG.cpp (+20-5)

diff --git a/clang/include/clang/AST/ASTContext.h b/clang/include/clang/AST/ASTContext.h
index af8c49e99a7ce..abba83e1ff9c4 100644
--- a/clang/include/clang/AST/ASTContext.h
+++ b/clang/include/clang/AST/ASTContext.h
@@ -183,6 +183,12 @@ struct TypeInfoChars {
   }
 };
 
+struct PFPField {
+  CharUnits structOffset;
+  CharUnits offset;
+  FieldDecl *field;
+};
+
 /// Holds long-lived AST nodes (such as types and decls) that can be
 /// referred to throughout the semantic analysis of a file.
 class ASTContext : public RefCountedBase<ASTContext> {
@@ -3618,6 +3624,22 @@ OPT_LIST(V)
 
   StringRef getCUIDHash() const;
 
+  bool isPFPStruct(const RecordDecl *rec) const;
+  void findPFPFields(QualType Ty, CharUnits Offset,
+                     std::vector<PFPField> &Fields, bool IncludeVBases) const;
+  bool hasPFPFields(QualType ty) const;
+  bool isPFPField(const FieldDecl *field) const;
+
+  /// Returns whether this record's PFP fields (if any) are trivially
+  /// relocatable (i.e. may be memcpy'd). This may also return true if the
+  /// record does not have any PFP fields, so it may be necessary for the caller
+  /// to check for PFP fields, e.g. by calling hasPFPFields().
+  bool arePFPFieldsTriviallyRelocatable(const RecordDecl *RD) const;
+
+  llvm::SetVector<const FieldDecl *> PFPFieldsWithEvaluatedOffset;
+  void recordMemberDataPointerEvaluation(const ValueDecl *VD);
+  void recordOffsetOfEvaluation(const OffsetOfExpr *E);
+
 private:
   /// All OMPTraitInfo objects live in this collection, one per
   /// `pragma omp [begin] declare variant` directive.
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 0999d8065e9f5..3d26c2f001812 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -2460,6 +2460,12 @@ def CountedByOrNull : DeclOrTypeAttr {
   let LangOpts = [COnly];
 }
 
+def NoPointerFieldProtection : DeclOrTypeAttr {
+  let Spellings = [Clang<"no_field_protection">];
+  let Subjects = SubjectList<[Field], ErrorDiag>;
+  let Documentation = [Undocumented];
+}
+
 def SizedBy : DeclOrTypeAttr {
   let Spellings = [Clang<"sized_by">];
   let Subjects = SubjectList<[Field], ErrorDiag>;
diff --git a/clang/include/clang/Basic/Features.def b/clang/include/clang/Basic/Features.def
index 05ce214935fad..d4ded24c5a87e 100644
--- a/clang/include/clang/Basic/Features.def
+++ b/clang/include/clang/Basic/Features.def
@@ -262,6 +262,9 @@ FEATURE(shadow_call_stack,
 FEATURE(tls, PP.getTargetInfo().isTLSSupported())
 FEATURE(underlying_type, LangOpts.CPlusPlus)
 FEATURE(experimental_library, LangOpts.ExperimentalLibrary)
+FEATURE(pointer_field_protection,
+        LangOpts.getPointerFieldProtection() !=
+        LangOptions::PointerFieldProtectionKind::None)
 
 // C11 features supported by other languages as extensions.
 EXTENSION(c_alignas, true)
diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def
index 3879cc7942877..8eacb6e066007 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -501,6 +501,9 @@ LANGOPT(RelativeCXXABIVTables, 1, 0,
 LANGOPT(OmitVTableRTTI, 1, 0,
         "Use an ABI-incompatible v-table layout that omits the RTTI component")
 
+ENUM_LANGOPT(PointerFieldProtection, PointerFieldProtectionKind, 2, PointerFieldProtectionKind::None,
+             "Encode struct pointer fields to protect against UAF vulnerabilities")
+
 LANGOPT(VScaleMin, 32, 0, "Minimum vscale value")
 LANGOPT(VScaleMax, 32, 0, "Maximum vscale value")
 
diff --git a/clang/include/clang/Basic/LangOptions.h b/clang/include/clang/Basic/LangOptions.h
index e925e0f3b5d85..797a14038ba4b 100644
--- a/clang/include/clang/Basic/LangOptions.h
+++ b/clang/include/clang/Basic/LangOptions.h
@@ -365,6 +365,17 @@ class LangOptionsBase {
     BKey
   };
 
+  enum class PointerFieldProtectionKind {
+    /// Pointer field protection disabled
+    None,
+    /// Pointer field protection enabled, allocator does not tag heap
+    /// allocations.
+    Untagged,
+    /// Pointer field protection enabled, allocator is expected to tag heap
+    /// allocations.
+    Tagged,
+  };
+
   enum class ThreadModelKind {
     /// POSIX Threads.
     POSIX,
diff --git a/clang/include/clang/Basic/TokenKinds.def b/clang/include/clang/Basic/TokenKinds.def
index 1bf9f43f80986..142e13b3ee784 100644
--- a/clang/include/clang/Basic/TokenKinds.def
+++ b/clang/include/clang/Basic/TokenKinds.def
@@ -542,6 +542,7 @@ TYPE_TRAIT_2(__is_pointer_interconvertible_base_of, IsPointerInterconvertibleBas
 
 // Clang-only C++ Type Traits
 TYPE_TRAIT_1(__is_trivially_relocatable, IsTriviallyRelocatable, KEYCXX)
+TYPE_TRAIT_1(__has_non_relocatable_fields, HasNonRelocatableFields, KEYCXX)
 TYPE_TRAIT_1(__is_trivially_equality_comparable, IsTriviallyEqualityComparable, KEYCXX)
 TYPE_TRAIT_1(__is_bounded_array, IsBoundedArray, KEYCXX)
 TYPE_TRAIT_1(__is_unbounded_array, IsUnboundedArray, KEYCXX)
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index a7fcb160d3867..c903e0554319e 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -2957,6 +2957,12 @@ defm experimental_omit_vtable_rtti : BoolFOption<"experimental-omit-vtable-rtti"
   NegFlag<SetFalse, [], [CC1Option], "Do not omit">,
   BothFlags<[], [CC1Option], " the RTTI component from virtual tables">>;
 
+def experimental_pointer_field_protection_EQ : Joined<["-"], "fexperimental-pointer-field-protection=">, Group<f_Group>,
+  Visibility<[ClangOption, CC1Option]>,
+  Values<"none,untagged,tagged">, NormalizedValuesScope<"LangOptions::PointerFieldProtectionKind">,
+  NormalizedValues<["None", "Untagged", "Tagged"]>,
+  MarshallingInfoEnum<LangOpts<"PointerFieldProtection">, "None">;
+
 def fcxx_abi_EQ : Joined<["-"], "fc++-abi=">,
                   Group<f_clang_Group>, Visibility<[ClangOption, CC1Option]>,
                   HelpText<"C++ ABI to use. This will override the target C++ ABI.">;
diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index c9d1bea4c623a..c3cbfec2c93d3 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -79,6 +79,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Frontend/OpenMP/OMPIRBuilder.h"
 #include "llvm/Support/Capacity.h"
+#include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/MD5.h"
@@ -14948,3 +14949,97 @@ bool ASTContext::useAbbreviatedThunkName(GlobalDecl VirtualMethodDecl,
   ThunksToBeAbbreviated[VirtualMethodDecl] = std::move(SimplifiedThunkNames);
   return Result;
 }
+
+bool ASTContext::arePFPFieldsTriviallyRelocatable(const RecordDecl *RD) const {
+  if (getLangOpts().getPointerFieldProtection() ==
+      LangOptions::PointerFieldProtectionKind::Tagged)
+    return !isa<CXXRecordDecl>(RD) ||
+           cast<CXXRecordDecl>(RD)->hasTrivialDestructor();
+  return true;
+}
+
+bool ASTContext::isPFPStruct(const RecordDecl *rec) const {
+  if (getLangOpts().getPointerFieldProtection() !=
+      LangOptions::PointerFieldProtectionKind::None)
+    if (auto *cxxRec = dyn_cast<CXXRecordDecl>(rec))
+      return !cxxRec->isStandardLayout();
+  return false;
+}
+
+void ASTContext::findPFPFields(QualType Ty, CharUnits Offset,
+                               std::vector<PFPField> &Fields,
+                               bool IncludeVBases) const {
+  if (auto *AT = getAsConstantArrayType(Ty)) {
+    if (auto *ElemDecl = AT->getElementType()->getAsCXXRecordDecl()) {
+      const ASTRecordLayout &ElemRL = getASTRecordLayout(ElemDecl);
+      for (unsigned i = 0; i != AT->getSize(); ++i) {
+        findPFPFields(AT->getElementType(), Offset + i * ElemRL.getSize(),
+                      Fields, true);
+      }
+    }
+  }
+  auto *Decl = Ty->getAsCXXRecordDecl();
+  if (!Decl)
+    return;
+  const ASTRecordLayout &RL = getASTRecordLayout(Decl);
+  for (FieldDecl *field : Decl->fields()) {
+    CharUnits fieldOffset =
+        Offset + toCharUnitsFromBits(RL.getFieldOffset(field->getFieldIndex()));
+    if (isPFPField(field))
+      Fields.push_back({Offset, fieldOffset, field});
+    findPFPFields(field->getType(), fieldOffset, Fields, true);
+  }
+  for (auto &Base : Decl->bases()) {
+    if (Base.isVirtual())
+      continue;
+    CharUnits BaseOffset =
+        Offset + RL.getBaseClassOffset(Base.getType()->getAsCXXRecordDecl());
+    findPFPFields(Base.getType(), BaseOffset, Fields, false);
+  }
+  if (IncludeVBases) {
+    for (auto &Base : Decl->vbases()) {
+      CharUnits BaseOffset =
+          Offset + RL.getVBaseClassOffset(Base.getType()->getAsCXXRecordDecl());
+      findPFPFields(Base.getType(), BaseOffset, Fields, false);
+    }
+  }
+}
+
+bool ASTContext::hasPFPFields(QualType ty) const {
+  std::vector<PFPField> pfpFields;
+  findPFPFields(ty, CharUnits::Zero(), pfpFields, true);
+  return !pfpFields.empty();
+}
+
+bool ASTContext::isPFPField(const FieldDecl *field) const {
+  if (!isPFPStruct(field->getParent()))
+    return false;
+  return field->getType()->isPointerType() &&
+         !field->hasAttr<NoPointerFieldProtectionAttr>();
+}
+
+void ASTContext::recordMemberDataPointerEvaluation(const ValueDecl *VD) {
+  if (getLangOpts().getPointerFieldProtection() ==
+      LangOptions::PointerFieldProtectionKind::None)
+    return;
+  auto *FD = dyn_cast<FieldDecl>(VD);
+  if (!FD)
+    FD = cast<FieldDecl>(cast<IndirectFieldDecl>(VD)->chain().back());
+  if (!isPFPField(FD))
+    return;
+  PFPFieldsWithEvaluatedOffset.insert(FD);
+}
+
+void ASTContext::recordOffsetOfEvaluation(const OffsetOfExpr *E) {
+  if (getLangOpts().getPointerFieldProtection() ==
+          LangOptions::PointerFieldProtectionKind::None ||
+      E->getNumComponents() == 0)
+    return;
+  OffsetOfNode Comp = E->getComponent(E->getNumComponents() - 1);
+  if (Comp.getKind() != OffsetOfNode::Field)
+    return;
+  FieldDecl *FD = Comp.getField();
+  if (!isPFPField(FD))
+    return;
+  PFPFieldsWithEvaluatedOffset.insert(FD);
+}
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index 95da7b067b459..51f319fd26f20 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -14932,6 +14932,7 @@ bool IntExprEvaluator::VisitUnaryExprOrTypeTraitExpr(
 }
 
 bool IntExprEvaluator::VisitOffsetOfExpr(const OffsetOfExpr *OOE) {
+  Info.Ctx.recordOffsetOfEvaluation(OOE);
   CharUnits Result;
   unsigned n = OOE->getNumComponents();
   if (n == 0)
diff --git a/clang/lib/AST/Type.cpp b/clang/lib/AST/Type.cpp
index 08798219c0b83..ca7ccc7f2bb5a 100644
--- a/clang/lib/AST/Type.cpp
+++ b/clang/lib/AST/Type.cpp
@@ -2852,7 +2852,9 @@ bool QualType::isTriviallyRelocatableType(const ASTContext &Context) const {
   } else if (!BaseElementType->isObjectType()) {
     return false;
   } else if (const auto *RD = BaseElementType->getAsRecordDecl()) {
-    return RD->canPassInRegisters();
+    return RD->canPassInRegisters() &&
+           (Context.arePFPFieldsTriviallyRelocatable(RD) ||
+            !Context.hasPFPFields(BaseElementType));
   } else if (BaseElementType.isTriviallyCopyableType(Context)) {
     return true;
   } else {
diff --git a/clang/lib/AST/TypePrinter.cpp b/clang/lib/AST/TypePrinter.cpp
index 3982ca3b50604..1382b4a9edfd4 100644
--- a/clang/lib/AST/TypePrinter.cpp
+++ b/clang/lib/AST/TypePrinter.cpp
@@ -2096,6 +2096,9 @@ void TypePrinter::printAttributedAfter(const AttributedType *T,
   case attr::ExtVectorType:
     OS << "ext_vector_type";
     break;
+  case attr::NoPointerFieldProtection:
+    OS << "no_field_protection";
+    break;
   }
   OS << "))";
 }
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 7aa77e55dbfcc..9d824231d02cb 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -1298,7 +1298,8 @@ static llvm::Value *CoerceIntOrPtrToIntOrPtr(llvm::Value *Val,
 /// This safely handles the case when the src type is smaller than the
 /// destination type; in this situation the values of bits which not
 /// present in the src are undefined.
-static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
+static llvm::Value *CreateCoercedLoad(Address Src, QualType SrcFETy,
+                                      llvm::Type *Ty,
                                       CodeGenFunction &CGF) {
   llvm::Type *SrcTy = Src.getElementType();
 
@@ -1306,6 +1307,57 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
   if (SrcTy == Ty)
     return CGF.Builder.CreateLoad(Src);
 
+  // Coercion directly through memory does not work if the structure has pointer
+  // field protection because the struct in registers has a different bit
+  // pattern to the struct in memory, so we must read the elements one by one
+  // and use them to form the coerced structure.
+  std::vector<PFPField> PFPFields;
+  CGF.getContext().findPFPFields(SrcFETy, CharUnits::Zero(), PFPFields, true);
+  if (!PFPFields.empty()) {
+    auto LoadCoercedField = [&](CharUnits Offset,
+                                llvm::Type *FieldType) -> llvm::Value * {
+      if (!PFPFields.empty() && PFPFields[0].offset == Offset) {
+        auto fieldAddr = CGF.EmitAddressOfPFPField(Src, PFPFields[0]);
+        llvm::Value *FieldVal = CGF.Builder.CreateLoad(fieldAddr);
+        if (isa<llvm::IntegerType>(FieldType))
+          FieldVal = CGF.Builder.CreatePtrToInt(FieldVal, FieldType);
+        PFPFields.erase(PFPFields.begin());
+        return FieldVal;
+      }
+      auto FieldAddr = CGF.Builder
+                           .CreateConstInBoundsByteGEP(
+                               Src.withElementType(CGF.Int8Ty), Offset)
+                           .withElementType(FieldType);
+      return CGF.Builder.CreateLoad(FieldAddr);
+    };
+    if (isa<llvm::IntegerType>(Ty) || isa<llvm::PointerType>(Ty)) {
+      auto Addr = CGF.EmitAddressOfPFPField(Src, PFPFields[0]);
+      llvm::Value *Val = CGF.Builder.CreateLoad(Addr);
+      if (isa<llvm::IntegerType>(Ty))
+        Val = CGF.Builder.CreatePtrToInt(Val, Ty);
+      return Val;
+    }
+    if (auto *AT = dyn_cast<llvm::ArrayType>(Ty)) {
+      auto *ET = AT->getElementType();
+      CharUnits wordSize = CGF.getContext().toCharUnitsFromBits(
+          CGF.CGM.getDataLayout().getTypeSizeInBits(ET));
+      CharUnits Offset = CharUnits::Zero();
+      llvm::Value *Val = llvm::UndefValue::get(AT);
+      for (unsigned i = 0; i != AT->getNumElements(); ++i, Offset += wordSize)
+        Val = CGF.Builder.CreateInsertValue(Val, LoadCoercedField(Offset, ET), i);
+      return Val;
+    }
+    auto *ST = cast<llvm::StructType>(Ty);
+    llvm::Value *Val = llvm::UndefValue::get(ST);
+    auto *SL = CGF.CGM.getDataLayout().getStructLayout(ST);
+    for (unsigned i = 0; i != ST->getNumElements(); ++i) {
+      CharUnits Offset = CharUnits::fromQuantity(SL->getElementOffset(i));
+      Val = CGF.Builder.CreateInsertValue(
+          Val, LoadCoercedField(Offset, ST->getElementType(i)), i);
+    }
+    return Val;
+  }
+
   llvm::TypeSize DstSize = CGF.CGM.getDataLayout().getTypeAllocSize(Ty);
 
   if (llvm::StructType *SrcSTy = dyn_cast<llvm::StructType>(SrcTy)) {
@@ -1374,7 +1426,9 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
   return CGF.Builder.CreateLoad(Tmp);
 }
 
-void CodeGenFunction::CreateCoercedStore(llvm::Value *Src, Address Dst,
+void CodeGenFunction::CreateCoercedStore(llvm::Value *Src,
+                                         QualType SrcFETy,
+                                         Address Dst,
                                          llvm::TypeSize DstSize,
                                          bool DstIsVolatile) {
   if (!DstSize)
@@ -1395,6 +1449,52 @@ void CodeGenFunction::CreateCoercedStore(llvm::Value *Src, Address Dst,
     }
   }
 
+  // Coercion directly through memory does not work if the structure has pointer
+  // field protection because the struct passed by value has a different bit
+  // pattern to the struct in memory, so we must read the elements one by one
+  // and use them to form the coerced structure.
+  std::vector<PFPField> PFPFields;
+  getContext().findPFPFields(SrcFETy, CharUnits::Zero(), PFPFields, true);
+  if (!PFPFields.empty()) {
+    auto StoreCoercedField = [&](CharUnits Offset, llvm::Value *FieldVal) {
+      if (!PFPFields.empty() && PFPFields[0].offset == Offset) {
+        auto fieldAddr = EmitAddressOfPFPField(Dst, PFPFields[0]);
+        if (isa<llvm::IntegerType>(FieldVal->getType()))
+          FieldVal = Builder.CreateIntToPtr(FieldVal, VoidPtrTy);
+        Builder.CreateStore(FieldVal, fieldAddr);
+        PFPFields.erase(PFPFields.begin());
+      } else {
+        auto fieldAddr =
+            Builder
+                .CreateConstInBoundsByteGEP(Dst.withElementType(Int8Ty), Offset)
+                .withElementType(FieldVal->getType());
+        Builder.CreateStore(FieldVal, fieldAddr);
+      }
+    };
+
+    if (isa<llvm::IntegerType>(SrcTy) || isa<llvm::PointerType>(SrcTy)) {
+      if (isa<llvm::IntegerType>(SrcTy))
+        Src = Builder.CreateIntToPtr(Src, VoidPtrTy);
+      auto Addr = EmitAddressOfPFPField(Dst, PFPFields[0]);
+      Builder.CreateStore(Src, Addr);
+    } else if (auto *at = dyn_cast<llvm::ArrayType>(SrcTy)) {
+      auto *et = at->getElementType();
+      CharUnits wordSize = getContext().toCharUnitsFromBits(
+          CGM.getDataLayout().getTypeSizeInBits(et));
+      CharUnits Offset = CharUnits::Zero();
+      for (unsigned i = 0; i != at->getNumElements(); ++i, Offset += wordSize)
+        StoreCoercedField(Offset, Builder.CreateExtractValue(Src, i));
+    } else {
+      auto *ST = cast<llvm::StructType>(SrcTy);
+      auto *SL = CGM.getDataLayout().getStructLayout(ST);
+      for (unsigned i = 0; i != ST->getNumElements(); ++i) {
+        CharUnits Offset = CharUnits::fromQuantity(SL->getElementOffset(i));
+        StoreCoercedField(Offset, Builder.CreateExtractValue(Src, i));
+      }
+    }
+    return;
+  }
+
   if (SrcSize.isScalable() || SrcSize <= DstSize) {
     if (SrcTy->isIntegerTy() && Dst.getElementType()->isPointerTy() &&
         SrcSize == CGM.getDataLayout().getTypeAllocSize(Dst.getElementType())) {
@@ -3347,7 +3447,7 @@ void CodeGenFunction::EmitFunctionProlog(const CGFunctionInfo &FI,
         auto AI = Fn->getArg(FirstIRArg);
         AI->setName(Arg->getName() + ".coerce");
         CreateCoercedStore(
-            AI, Ptr,
+            AI, Ty, Ptr,
             llvm::TypeSize::getFixed(
                 getContext().getTypeSizeInChars(Ty).getQuantity() -
                 ArgI.getDirectOffset()),
@@ -3969,7 +4069,7 @@ void CodeGenFunction::EmitFunctionEpilog(const CGFunctionInfo &FI,
       // If the value is offset in memory, apply the offset now.
       Address V = emitAddressAtOffset(*this, ReturnValue, RetAI);
 
-      RV = CreateCoercedLoad(V, RetAI.getCoerceToType(), *this);
+      RV = CreateCoercedLoad(V, RetTy, RetAI.getCoerceToType(), *this);
     }
 
     // In ARC, end functions that return a retainable type with a call
@@ -4020,6 +4120,7 @@ void CodeGenFunction::EmitFunctionEpilog(const CGFunctionInfo &FI,
       auto eltAddr = Builder.CreateStructGEP(addr, i);
       llvm::Value *elt = CreateCoercedLoad(
           eltAddr,
+          RetTy,
           unpaddedStruct ? unpaddedStruct->getElementType(unpaddedIndex++)
                          : unpaddedCoercionType,
           *this);
@@ -5550,7 +5651,7 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
         // In the simple case, just pass the coerced loaded value.
         assert(NumIRArgs == 1);
         llvm::Value *Load =
-            CreateCoercedLoad(Src, ArgInfo.getCoerceToType(), *this);
+            CreateCoercedLoad(Src, I->Ty, ArgInfo.getCoerceToType(), *this);
 
         if (CallInfo.isCmseNSCall()) {
           // For certain parameter types, clear padding bits, as they may reveal
@@ -5611,6 +5712,7 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
         Address eltAddr = Builder.CreateStructGEP(addr, i);
         llvm::Value *elt = CreateCoercedLoad(
             eltAddr,
+            I->Ty,
             unpaddedStruct ? unpaddedStruct->getElementType(unpaddedIndex++)
                            : unpaddedCoercionType,
             *this);
@@ -6105,7 +6207,7 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo...
[truncated]

github-actions · Mar 28, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff HEAD~1 HEAD --extensions ,h,cpp -- clang/test/CodeGen/pfp-attribute-disable.cpp clang/test/CodeGen/pfp-load-store.cpp clang/test/CodeGen/pfp-memcpy.cpp clang/test/CodeGen/pfp-null-init.cpp clang/test/CodeGen/pfp-struct-gep.cpp libcxx/include/__force_nonstandard_layout clang/include/clang/AST/ASTContext.h clang/include/clang/Basic/LangOptions.h clang/lib/AST/ASTContext.cpp clang/lib/AST/ExprConstant.cpp clang/lib/AST/TypePrinter.cpp clang/lib/CodeGen/CGBuiltin.cpp clang/lib/CodeGen/CGCall.cpp clang/lib/CodeGen/CGClass.cpp clang/lib/CodeGen/CGExpr.cpp clang/lib/CodeGen/CGExprAgg.cpp clang/lib/CodeGen/CGExprCXX.cpp clang/lib/CodeGen/CGExprConstant.cpp clang/lib/CodeGen/CodeGenFunction.cpp clang/lib/CodeGen/CodeGenFunction.h clang/lib/CodeGen/CodeGenModule.cpp clang/lib/CodeGen/CodeGenModule.h clang/lib/CodeGen/ItaniumCXXABI.cpp clang/lib/CodeGen/MicrosoftCXXABI.cpp clang/lib/Driver/ToolChains/Clang.cpp clang/lib/Sema/SemaDeclAttr.cpp clang/lib/Sema/SemaTypeTraits.cpp clang/test/CodeGenCXX/trivial_abi.cpp libcxx/include/__config libcxx/include/__functional/function.h libcxx/include/__memory/shared_ptr.h libcxx/include/__memory/unique_ptr.h libcxx/include/__tree libcxx/include/__type_traits/is_trivially_relocatable.h libcxx/include/__vector/vector.h libcxx/include/typeinfo libcxx/test/libcxx/gdb/gdb_pretty_printer_test.sh.cpp libcxxabi/include/__cxxabi_config.h libcxxabi/src/private_typeinfo.h llvm/include/llvm/Analysis/PtrUseVisitor.h llvm/include/llvm/Transforms/Utils/Local.h llvm/lib/Analysis/PtrUseVisitor.cpp llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp llvm/lib/Transforms/Scalar/SROA.cpp llvm/lib/Transforms/Utils/Local.cpp llvm/lib/Transforms/Utils/SimplifyCFG.cpp

View the diff from clang-format here.

diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 980c11876..68c6cbd98 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -1320,8 +1320,7 @@ static llvm::Value *CoerceIntOrPtrToIntOrPtr(llvm::Value *Val, llvm::Type *Ty,
 /// destination type; in this situation the values of bits which not
 /// present in the src are undefined.
 static llvm::Value *CreateCoercedLoad(Address Src, QualType SrcFETy,
-                                      llvm::Type *Ty,
-                                      CodeGenFunction &CGF) {
+                                      llvm::Type *Ty, CodeGenFunction &CGF) {
   llvm::Type *SrcTy = Src.getElementType();
 
   // If SrcTy and Ty are the same, just do a load.
@@ -1365,7 +1364,8 @@ static llvm::Value *CreateCoercedLoad(Address Src, QualType SrcFETy,
       CharUnits Offset = CharUnits::Zero();
       llvm::Value *Val = llvm::PoisonValue::get(AT);
       for (unsigned i = 0; i != AT->getNumElements(); ++i, Offset += wordSize)
-        Val = CGF.Builder.CreateInsertValue(Val, LoadCoercedField(Offset, ET), i);
+        Val =
+            CGF.Builder.CreateInsertValue(Val, LoadCoercedField(Offset, ET), i);
       return Val;
     }
     auto *ST = cast<llvm::StructType>(Ty);
@@ -1450,10 +1450,8 @@ static llvm::Value *CreateCoercedLoad(Address Src, QualType SrcFETy,
   return CGF.Builder.CreateLoad(Tmp);
 }
 
-void CodeGenFunction::CreateCoercedStore(llvm::Value *Src,
-                                         QualType SrcFETy,
-                                         Address Dst,
-                                         llvm::TypeSize DstSize,
+void CodeGenFunction::CreateCoercedStore(llvm::Value *Src, QualType SrcFETy,
+                                         Address Dst, llvm::TypeSize DstSize,
                                          bool DstIsVolatile) {
   if (!DstSize)
     return;
@@ -4172,8 +4170,7 @@ void CodeGenFunction::EmitFunctionEpilog(const CGFunctionInfo &FI,
 
       auto eltAddr = Builder.CreateStructGEP(addr, i);
       llvm::Value *elt = CreateCoercedLoad(
-          eltAddr,
-          RetTy,
+          eltAddr, RetTy,
           unpaddedStruct ? unpaddedStruct->getElementType(unpaddedIndex++)
                          : unpaddedCoercionType,
           *this);
@@ -5744,8 +5741,7 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
           continue;
         Address eltAddr = Builder.CreateStructGEP(addr, i);
         llvm::Value *elt = CreateCoercedLoad(
-            eltAddr,
-            I->Ty,
+            eltAddr, I->Ty,
             unpaddedStruct ? unpaddedStruct->getElementType(unpaddedIndex++)
                            : unpaddedCoercionType,
             *this);
diff --git a/clang/lib/CodeGen/CGClass.cpp b/clang/lib/CodeGen/CGClass.cpp
index 618f2f070..3687d9dda 100644
--- a/clang/lib/CodeGen/CGClass.cpp
+++ b/clang/lib/CodeGen/CGClass.cpp
@@ -669,7 +669,8 @@ static void EmitMemberInitializer(CodeGenFunction &CGF,
     QualType BaseElementTy = CGF.getContext().getBaseElementType(Array);
     CXXConstructExpr *CE = dyn_cast<CXXConstructExpr>(MemberInit->getInit());
     if (BaseElementTy.isPODType(CGF.getContext()) ||
-        (CE && isMemcpyEquivalentSpecialMember(CGF.CGM, CE->getConstructor()))) {
+        (CE &&
+         isMemcpyEquivalentSpecialMember(CGF.CGM, CE->getConstructor()))) {
       unsigned SrcArgIndex =
           CGF.CGM.getCXXABI().getSrcArgforCopyCtor(Constructor, Args);
       llvm::Value *SrcPtr
diff --git a/libcxx/include/__type_traits/is_trivially_relocatable.h b/libcxx/include/__type_traits/is_trivially_relocatable.h
index df77af200..dfcb6dd3b 100644
--- a/libcxx/include/__type_traits/is_trivially_relocatable.h
+++ b/libcxx/include/__type_traits/is_trivially_relocatable.h
@@ -37,10 +37,9 @@ struct __libcpp_is_trivially_relocatable : is_trivially_copyable<_Tp> {};
 // __trivially_relocatable on libc++'s builtin types does not currently return the right answer with PFP.
 #if !__has_feature(pointer_field_protection)
 template <class _Tp>
-struct __libcpp_is_trivially_relocatable<_Tp,
-                                         __enable_if_t<is_same<_Tp, typename _Tp::__trivially_relocatable>::value
-                                         __enable_if_t<is_same<_Tp, typename _Tp::__trivially_relocatable>::value> >
-    : true_type {};
+    struct __libcpp_is_trivially_relocatable < _Tp,
+    __enable_if_t<is_same<_Tp, typename _Tp::__trivially_relocatable>::value
+                      __enable_if_t<is_same<_Tp, typename _Tp::__trivially_relocatable>::value> > : true_type {};
 #endif
 
 _LIBCPP_END_NAMESPACE_STD
diff --git a/libcxxabi/src/private_typeinfo.h b/libcxxabi/src/private_typeinfo.h
index a3bc0bffd..c472bf076 100644
--- a/libcxxabi/src/private_typeinfo.h
+++ b/libcxxabi/src/private_typeinfo.h
@@ -145,7 +145,7 @@ public:
 // Has one non-virtual public base class at offset zero
 class _LIBCXXABI_TYPE_VIS __si_class_type_info : public __class_type_info {
 public:
-  _LIBCXXABI_NO_PFP const __class_type_info *__base_type;
+  _LIBCXXABI_NO_PFP const __class_type_info* __base_type;
 
   _LIBCXXABI_HIDDEN virtual ~__si_class_type_info();
 
@@ -204,7 +204,7 @@ public:
 class _LIBCXXABI_TYPE_VIS __pbase_type_info : public __shim_type_info {
 public:
   unsigned int __flags;
-  _LIBCXXABI_NO_PFP const __shim_type_info *__pointee;
+  _LIBCXXABI_NO_PFP const __shim_type_info* __pointee;
 
   enum __masks {
     __const_mask = 0x1,
@@ -245,7 +245,7 @@ public:
 class _LIBCXXABI_TYPE_VIS __pointer_to_member_type_info
     : public __pbase_type_info {
 public:
-  _LIBCXXABI_NO_PFP const __class_type_info *__context;
+  _LIBCXXABI_NO_PFP const __class_type_info* __context;
 
   _LIBCXXABI_HIDDEN virtual ~__pointer_to_member_type_info();
   _LIBCXXABI_HIDDEN virtual bool can_catch(const __shim_type_info *,
diff --git a/llvm/lib/Analysis/PtrUseVisitor.cpp b/llvm/lib/Analysis/PtrUseVisitor.cpp
index b34f380b7..d9e9e8d20 100644
--- a/llvm/lib/Analysis/PtrUseVisitor.cpp
+++ b/llvm/lib/Analysis/PtrUseVisitor.cpp
@@ -21,9 +21,9 @@ void detail::PtrUseVisitorBase::enqueueUsers(Value &I) {
   for (Use &U : I.uses()) {
     if (VisitedUses.insert(&U).second) {
       UseToVisit NewU = {
-        UseToVisit::UseAndIsOffsetKnownPair(&U, IsOffsetKnown),
-        Offset,
-        ProtectedField,
+          UseToVisit::UseAndIsOffsetKnownPair(&U, IsOffsetKnown),
+          Offset,
+          ProtectedField,
       };
       Worklist.push_back(std::move(NewU));
     }
diff --git a/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp b/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
index 0961494d3..52e607722 100644
--- a/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
+++ b/llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp
@@ -471,7 +471,7 @@ bool expandProtectedFieldPtr(Function &Intr) {
   FunctionCallee EmuAuthIntr = M.getOrInsertFunction("__emupac_autda", EmuFnTy);
 
   auto CreateSign = [&](IRBuilder<> &B, Value *Val, Value *Disc,
-                       OperandBundleDef DSBundle) {
+                        OperandBundleDef DSBundle) {
     Function *F = B.GetInsertBlock()->getParent();
     Attribute FSAttr = F->getFnAttribute("target-features");
     if (FSAttr.isValid() && FSAttr.getValueAsString().contains("+pauth"))
@@ -480,7 +480,7 @@ bool expandProtectedFieldPtr(Function &Intr) {
   };
 
   auto CreateAuth = [&](IRBuilder<> &B, Value *Val, Value *Disc,
-                       OperandBundleDef DSBundle) {
+                        OperandBundleDef DSBundle) {
     Function *F = B.GetInsertBlock()->getParent();
     Attribute FSAttr = F->getFnAttribute("target-features");
     if (FSAttr.isValid() && FSAttr.getValueAsString().contains("+pauth"))
@@ -622,8 +622,7 @@ bool expandProtectedFieldPtr(Function &Intr) {
     } else if (auto *SI = dyn_cast<StoreInst>(I)) {
       IRBuilder<> B(SI);
       auto *FieldAddr = SI->getPointerOperand();
-      auto *SIValInt =
-          B.CreatePtrToInt(SI->getValueOperand(), B.getInt64Ty());
+      auto *SIValInt = B.CreatePtrToInt(SI->getValueOperand(), B.getInt64Ty());
       Value *Sign;
       switch (Encoding) {
       case PointerEncoding::Rotate:
@@ -688,7 +687,7 @@ bool expandProtectedFieldPtr(Function &Intr) {
   return true;
 }
 
-}
+} // namespace
 
 bool PreISelIntrinsicLowering::lowerIntrinsics(Module &M) const {
   bool Changed = false;
diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
index b69b511cc..d5704695b 100644
--- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
@@ -2278,7 +2278,8 @@ static void emitAddress(MCStreamer &Streamer, MCRegister Reg,
             .addImm(0),
         STI);
   } else {
-    auto *SymRef = MCSymbolRefExpr::create(Val.getAddSym(), Streamer.getContext());
+    auto *SymRef =
+        MCSymbolRefExpr::create(Val.getAddSym(), Streamer.getContext());
     Streamer.emitInstruction(
         MCInstBuilder(AArch64::ADRP)
             .addReg(Reg)
diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp
index d42ba00ca..ff8d54d39 100644
--- a/llvm/lib/Transforms/Scalar/SROA.cpp
+++ b/llvm/lib/Transforms/Scalar/SROA.cpp
@@ -534,7 +534,8 @@ class Slice {
 public:
   Slice() = default;
 
-  Slice(uint64_t BeginOffset, uint64_t EndOffset, Use *U, bool IsSplittable, Value *ProtectedField)
+  Slice(uint64_t BeginOffset, uint64_t EndOffset, Use *U, bool IsSplittable,
+        Value *ProtectedField)
       : BeginOffset(BeginOffset), EndOffset(EndOffset),
         UseAndIsSplittable(U, IsSplittable), ProtectedField(ProtectedField) {}
 
@@ -1077,7 +1078,8 @@ private:
       EndOffset = AllocSize;
     }
 
-    AS.Slices.push_back(Slice(BeginOffset, EndOffset, U, IsSplittable, ProtectedField));
+    AS.Slices.push_back(
+        Slice(BeginOffset, EndOffset, U, IsSplittable, ProtectedField));
   }
 
   void visitBitCastInst(BitCastInst &BC) {

github-actions · Mar 28, 2025

✅ With the latest revision this PR passed the undef deprecator.

Pointer field protection is a use-after-free vulnerability mitigation that works by changing how data structures' pointer fields are stored in memory. For more information, see the RFC: https://discourse.llvm.org/t/rfc-structure-protection-a-family-of-uaf-mitigation-techniques/85555 TODO: - Fix test failure. - Add more tests. - Add documentation. Pull Request: llvm#133538

mordante

Some comments from libc++'s perspective:

As you mentioned you need more tests.
I think the current approach with a header only libc++ change approach will not work due to ABI changes. See review comment for std::vector.
in libc++ we have a pre-commit CI so this feature should be tested in our CI, if you need help with that, best reach out to us on Discord. However I think we first need to look at how we can properly integrate this in libc++.

I'm not fond of this patch since it mixes LLVM, Clang, and libc++(abi) code. This makes it hard to determine when all reviewer groups are satisfied. Is there a reason why the libc++ and the libc++abi part can't be in two separate patches?

(I don't know whether it makes sense to keep the LLVM & Clang part in one patch.)

mordante · Apr 9, 2025

libcxx/include/__config

@@ -1215,6 +1215,29 @@ typedef __char32_t char32_t;
 #    define _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER 0
 #  endif

+#  if __has_feature(pointer_field_protection)


Since this feature is an ABI break, it will cause ODR violations to these structs. I think we should ABI tag this in _LIBCPP_ODR_SIGNATURE.

Makes sense, will do.

mordante · Apr 9, 2025

libcxx/include/__config

+_LIBCPP_BEGIN_NAMESPACE_STD
+_LIBCPP_DIAGNOSTIC_PUSH
+_LIBCPP_CLANG_DIAGNOSTIC_IGNORED("-Winaccessible-base")
+class __force_nonstandard_layout_base1 {};
+class __force_nonstandard_layout_base2 : __force_nonstandard_layout_base1 {};
+class __force_nonstandard_layout : __force_nonstandard_layout_base1, __force_nonstandard_layout_base2 {};
+_LIBCPP_DIAGNOSTIC_POP
+_LIBCPP_END_NAMESPACE_STD


This should be done (conditionally?) is a different header. If we add _LIBCPP_BEGIN_NAMESPACE_STD to this header it suddenly breaks this.

Correct, I later found that adding this caused some check-libcxx test failures with my feature enabled so I ended up splitting this into a separate header.

mordante · Apr 9, 2025

libcxx/include/__vector/vector.h

@@ -84,7 +84,7 @@ _LIBCPP_PUSH_MACROS
 _LIBCPP_BEGIN_NAMESPACE_STD

 template <class _Tp, class _Allocator /* = allocator<_Tp> */>
-class _LIBCPP_TEMPLATE_VIS vector {
+class _LIBCPP_TEMPLATE_VIS vector _LIBCPP_MAYBE_FORCE_NONSTANDARD_LAYOUT {


std::vector is used in the libc++ dylib across the ABI boundary. So this means you can only use pointer_field_protection when the dylib is build with pointer_field_protection.

The intent would be that the user builds libc++.a with pointer_field_protection and statically links it into their binary. Because of the changes to pointer representation there are numerous other ABI incompatibilities. This feature is incompatible with dynamic linking so we may want to skip building the dynamic library if the feature is enabled.

pcc

in libc++ we have a pre-commit CI so this feature should be tested in our CI, if you need help with that, best reach out to us on Discord. However I think we first need to look at how we can properly integrate this in libc++.

Sure, I will get in touch once this is closer to landing (everything else in Structure Protection needs to land first). So far I've been running the libc++ test suite manually with some hand written cfg files but it would be good to get everything hooked up so that the tests can run in CI as well.

I'm not fond of this patch since it mixes LLVM, Clang, and libc++(abi) code. This makes it hard to determine when all reviewer groups are satisfied. Is there a reason why the libc++ and the libc++abi part can't be in two separate patches?

No problem, I'll split up the libc++ and libc++abi parts into separate patches the next time I upload.

(I don't know whether it makes sense to keep the LLVM & Clang part in one patch.)

That would make sense, the changes that add the new LLVM intrinsic can be made a prerequisite of the Clang changes.

pcc · Apr 9, 2025

libcxx/include/__config

@@ -1215,6 +1215,29 @@ typedef __char32_t char32_t;
 #    define _LIBCPP_HAS_EXPLICIT_THIS_PARAMETER 0
 #  endif

+#  if __has_feature(pointer_field_protection)


Makes sense, will do.

pcc · Apr 9, 2025

libcxx/include/__config

+_LIBCPP_BEGIN_NAMESPACE_STD
+_LIBCPP_DIAGNOSTIC_PUSH
+_LIBCPP_CLANG_DIAGNOSTIC_IGNORED("-Winaccessible-base")
+class __force_nonstandard_layout_base1 {};
+class __force_nonstandard_layout_base2 : __force_nonstandard_layout_base1 {};
+class __force_nonstandard_layout : __force_nonstandard_layout_base1, __force_nonstandard_layout_base2 {};
+_LIBCPP_DIAGNOSTIC_POP
+_LIBCPP_END_NAMESPACE_STD


Correct, I later found that adding this caused some check-libcxx test failures with my feature enabled so I ended up splitting this into a separate header.

pcc · Apr 9, 2025

libcxx/include/__vector/vector.h

@@ -84,7 +84,7 @@ _LIBCPP_PUSH_MACROS
 _LIBCPP_BEGIN_NAMESPACE_STD

 template <class _Tp, class _Allocator /* = allocator<_Tp> */>
-class _LIBCPP_TEMPLATE_VIS vector {
+class _LIBCPP_TEMPLATE_VIS vector _LIBCPP_MAYBE_FORCE_NONSTANDARD_LAYOUT {


The intent would be that the user builds libc++.a with pointer_field_protection and statically links it into their binary. Because of the changes to pointer representation there are numerous other ABI incompatibilities. This feature is incompatible with dynamic linking so we may want to skip building the dynamic library if the feature is enabled.

mordante · Apr 9, 2025

in libc++ we have a pre-commit CI so this feature should be tested in our CI, if you need help with that, best reach out to us on Discord. However I think we first need to look at how we can properly integrate this in libc++.

Sure, I will get in touch once this is closer to landing (everything else in Structure Protection needs to land first). So far I've been running the libc++ test suite manually with some hand written cfg files but it would be good to get everything hooked up so that the tests can run in CI as well.

SGTM!

Looking at your replies I think we indeed need to add a new configuration for this feature, much like building without RTTI or exceptions.

I'm not fond of this patch since it mixes LLVM, Clang, and libc++(abi) code. This makes it hard to determine when all reviewer groups are satisfied. Is there a reason why the libc++ and the libc++abi part can't be in two separate patches?

No problem, I'll split up the libc++ and libc++abi parts into separate patches the next time I upload.

Thanks!

Created using spr 1.3.6-beta.1

ojhunt

Thoughts:

This should be opt-in on a field or struct granularity, not just a global behavior.

In the RFC I think you mentioned not applying PFP to C types, but I'm unsure how you're deciding what is a C type?

There are a lot of special cases being added in places that should not need to be aware of PFP - the core type queries should be returning correct values for types containing PFP fields.

A lot of the "this read/write/copy/init special work" exactly aligns with the constraints of pointer auth, and I think the overall implementation could be made better by introducing the concept of a HardenedPointerQualifier or similar, that could be either PointerAuthQualifier or PFPQualifier. In principle this could even just be treated as a different PointerAuth key set.

That approach would also help with some of the problems you have with offsetof and pointers to PFP fields - the thing that causes the problems with the pointer to a PFP field is that the PFP schema for a given field is in reality part of the type, so given

struct A {
   void *field1;
   void *field2;
};

It looks you are trying to maintain the idea that A::field1 and A::field2 have the same type, which makes this code valid according to the type system:

void f(void** obj);
void g(A *a) {
  f(&a->field1);
  f(&a->field2);
}

but the real types are not the same. The semantic equivalent under pointer auth is something akin to

struct A {
   void * __ptrauth(1, 1, hash("A::field1")) field1;
   void * __ptrauth(1, 1, hash("A::field2")) field2;
};

Which makes it very clear that the types are different, and I think trying to pretend they are not is part of what is causing problems.

The more I think about it, the more I feel that this could be implemented almost entirely on top of the existing pointer auth work.

Currently for pointer auth there's a set of ARM keys specified, and I think you just need to create a different set, PFP_Keys or similar, and set up the appropriate schemas (basically null schemas for all the existing cases), and add a new schema: DefaultFieldSchema or similar, and apply that schema to every pointer field in an object unless there's an explicit qualifier applied already.

Then it would in principle just be a matter of making adding the appropriate logic to the ptrauth intrinsics in llvm.

Now there are some moderately large caveats to that idea

the existing ptrauth semantics simply say that any struct containing address discriminated values is no-longer a pod type and so gets copied through the stack, which is something you're trying to avoid, so you'd still need to keep that logic (though as I said, that should be done by generating and then calling the functions to do rather than regenerating the entire copy repeatedly).
the way that pointer auth is currently implemented assumes that there is only a single pointer auth abi active so you wouldn't be able to trivially have both pointer auth and PFP live at the same time. That's what makes me think having a SecuredPointer abstraction might be a batter approach, but it might also not be too difficult to update the PointerAuthQualifier to include the ABI being used -- I'm really not sure.

ojhunt · May 14, 2025

clang/include/clang/Basic/Attr.td

@@ -2513,6 +2513,12 @@ def CountedByOrNull : DeclOrTypeAttr {
  let LangOpts = [COnly];
 }

+def NoPointerFieldProtection : DeclOrTypeAttr {


This an ABI break so I don't think it can reasonably an on by default for all structs - we can already see annotations in libc++, and they would be needed on every single struct field.

We can imagine a new platform where this would be the platform ABI, but for every other case it is functionally unusable.

I would recommend attributes to permit opt in and opt out on a per-struct basis, and CLI flags to select the default behavior.

There are numerous circumstances where the C++ ABI is distinct from the platform ABI and is allowed to change. For example, most Chromium-based web browsers ship with a statically linked libc++, and the same is true for many Android apps and server-side software at Google. In the intended usage model, all of libc++ would be built with a -fexperimental-pointer-field-protection= flag consistent with the rest of the program.

The libc++abi opt-outs that I added are only for opting out pointer fields of RTTI structs that are generated by the compiler. In principle, we could teach the compiler to sign those pointer fields which would let us remove the opt-outs, but there is a lower ROI for subjecting those fields to PFP because the RTTI structs are not typically dynamically allocated.

We may consider adding an attribute to allow opting in in the case where the flag is not passed. But I think we should do that in a followup. We would need to carefully consider how that would interact with the other aspects of the opt-in solution, such as the pointer qualifiers.

ojhunt · May 14, 2025

clang/include/clang/Basic/LangOptions.h

@@ -362,6 +362,17 @@ class LangOptionsBase {
    BKey
  };

+  enum class PointerFieldProtectionKind {


I'm not sure I like this being solely a global decision - it makes custom allocators much harder, and it makes it hard for allocators that have different policies, e.g

struct ImportantStruct { void *operator new(size_t) { /*tagged allocator*/ } }; struct BoringStruct { ... }; struct SomeOtherStruct { ImportantStruct *important; // should be tagged pointer BoringString *lessImportant; // should be a non tagged pointer };

This would again require a struct attribute to indicate the pointer policy

I think that allowing this level of customization should be implemented as part of the separate opt-in solution (e.g. it may be a property of the qualifier). It is generally a reasonable assumption that operator new is the same for all types. In the unlikely case that it isn't, we aren't really doing anything wrong by using the malloc-derived decision here. It just means that we lose some entropy from the type or the pointer tag.

ojhunt · May 14, 2025

clang/include/clang/Basic/TokenKinds.def

@@ -544,6 +544,7 @@ TYPE_TRAIT_2(__is_pointer_interconvertible_base_of, IsPointerInterconvertibleBas
 #include "clang/Basic/TransformTypeTraits.def"

 // Clang-only C++ Type Traits
+TYPE_TRAIT_1(__has_non_relocatable_fields, HasNonRelocatableFields, KEYCXX)


I do not like this -- there are existing functions for handling whether an object is trivially copying, relocatable, etc that should just be updated to correctly report the traits of impacted types. See the various Sema functions querying isAddressDiscriminated(). That function currently only cares about pointer auth, but it would not be unreasonable to have it consider other properties, like PDP.

In fact a lot of the behavior you're wanting in this feature is essentially covered by the PointerAuthQualfier. It would perhaps be reasonable to generalize that to something akin to SecurePointerQualifier that could be either a PAQ or PFP qualifier. Then most of the current semantic queries made to support pointer auth could just be a single shared implementation.

I don't like it either. As mentioned in the main comment, I hope to get rid of this when the P2786 implementation is finalized.

Sharing the qualifiers with PAuth ABI would be a good idea for the separate opt-in solution.

I think you misunderstand my intent (I tried to suggest in one the comments or the global comment):

What I'm suggesting is that the qualifier would be implicitly added at the point of declaration, the overarching behavior you're after would be achieved that way. From your other comment about how you determine "C" strictness (just keying off standard layout) that might be challenging as it means you can't really finalize the types of fields until the struct definition is completed, and I don't know if there's anything else in C++ that behaves that way (@cor3ntin might have an idea).

However I'm also not sure how reasonable it is to exclude standard layout types from this protection - what exactly is the rationale for that? (if the reason is a belief that substituting C++ libraries is fine but C libraries is not, I suspect that's not actually very sound, and silently ignoring some structs is similarly a hazard - it can result in "trivial" refactoring invisibly reducing security guarantees).

Regarding P2786: the exact same issues apply to pointer authentication today - trying to solve them separately and outside of the existing triviality queries and operations is not the correct approach.

I'm ok with the logic being distinct from pointer auth, though I would regret that as I believe the shared semantics warrant a shared implementation, but there is no reason for them to be distinct from any of the type queries.

If you want the types to be trivially relocatable - which is not a requirement, it's a performance optimization - then ensure the existing queries and operations handle that correctly. Otherwise you run the risk of other code in clang using the existing trivial behavior queries, and expecting them to be correct, when they are not.

I think you misunderstand my intent (I tried to suggest in one the comments or the global comment):

What I'm suggesting is that the qualifier would be implicitly added at the point of declaration, the overarching behavior you're after would be achieved that way.

No, I understood that part. Please refer to the main comment where I explained why that does not work.

However I'm also not sure how reasonable it is to exclude standard layout types from this protection - what exactly is the rationale for that?

Please refer to the main reply.

silently ignoring some structs is similarly a hazard - it can result in "trivial" refactoring invisibly reducing security guarantees

While it is true that a refactoring has the potential to remove protection from a pointer, the same is also true for adding protection to a pointer (usually code is modified by adding things to objects, and the rules of the language are such that adding something to an object can usually only make it non-standard-layout or non-trivially-copyable). See also "statistical perspective" from the main reply.

Regarding P2786: the exact same issues apply to pointer authentication today - trying to solve them separately and outside of the existing triviality queries and operations is not the correct approach.

The results of the triviality queries are unchanged because PFP does not change the triviality from a language rule perspective. If a type was trivially copyable without PFP, it remains trivially copyable with PFP (i.e. the program is still allowed to memcpy the object, because the pointers are not address discriminated). If a type was trivially relocatable without PFP, it remains trivially relocatable with PFP (i.e. std::trivially_relocate still works because of changes that I will make (not uploaded yet) to __builtin_trivially_relocate in CGBuiltin.cpp). Divergence from the standard C++ triviality rules creates an incompatible dialect of C++, which I think we should avoid.

The copying inhibitions I added in clang/lib/CodeGen are for cases such as those where the object is not trivially copyable according to the language rules but Clang figures out that it can memcpy the object anyway without causing an observable violation of the rules. In other words, Clang is already going outside of the existing triviality queries for optimization purposes, and I am adjusting the conditions under which it does so to account for PFP.

If you want the types to be trivially relocatable - which is not a requirement, it's a performance optimization - then ensure the existing queries and operations handle that correctly. Otherwise you run the risk of other code in clang using the existing trivial behavior queries, and expecting them to be correct, when they are not.

On trivial relocatability, see the main reply.

Code in Clang that uses the standards-defined triviality queries will continue to work. Non-standard-defined queries such as isMemcpyEquivalentSpecialMember and isMemcpyableField need to be adjusted, as previously mentioned. It's possible that someone could add a new non-standard query that breaks PFP (and could also break PAuth ABI because it also uses address discriminated pointers) but that's a bug like any other, which our traditional solution for is to rely on testing to flush out any regressions. The patch series I posted for review has been tested as described in the RFC and the remaining test failures have been assessed to be likely to be bugs in the programs being compiled, so we have a reasonable level of confidence in the implementation.

ojhunt · May 14, 2025

clang/include/clang/Driver/Options.td

@@ -3011,6 +3011,12 @@ defm experimental_omit_vtable_rtti : BoolFOption<"experimental-omit-vtable-rtti"
  NegFlag<SetFalse, [], [CC1Option], "Do not omit">,
  BothFlags<[], [CC1Option], " the RTTI component from virtual tables">>;

+def experimental_pointer_field_protection_EQ : Joined<["-"], "fexperimental-pointer-field-protection=">, Group<f_Group>,


As above, not super sure that I like this being a global compile time setting, but maybe it makes sense to be.

(See other reply.)

ojhunt · May 14, 2025

clang/lib/CodeGen/CGCall.cpp

+  // pattern to the struct in memory, so we must read the elements one by one
+  // and use them to form the coerced structure.
+  std::vector<PFPField> PFPFields;
+  CGF.getContext().findPFPFields(SrcFETy, CharUnits::Zero(), PFPFields, true);


This should be done by generating a copy thunk and calling that, rather than performing this logic repeatedly, and more importantly regenerating it repeatedly. If the update function is small enough it will be inlined, and if it's not you probably do not want a large number of copies of the code.

Your intent appears to be to maintain the register transfer abi for POD types, despite the objects logically no longer being POD types - which isn't unreasonable given that the current design means that all structs containing pointers would become non-POD, which is probably suboptimal - but if you are doing so, because a large enough struct can spill, I'd recommend you adopt a model where the PFP fields in a return or parameter copy of the object are tagged specifically as transfer fields, something akin to hash(argument position || PFP offset || some_this_is_a_transfer_constant).

This code is used to pass and return small structs by value in registers (aka direct pass/return). In this case, we typically have 1-2 fields (the exact conditions for passing a struct in registers is up to the platform ABI), so it is unlikely to be useful to outline it. Outlining would be counterproductive from an optimization standpoint because it would inhibit the SROA pass until the function is inlined. We would also need to create a separate ABI for passing and returning the struct to the copy thunk function, because CreateCoercedLoad is how we would normally pass the struct to the function.

When a larger struct is passed or returned by value, the ABI typically requires a pointer to be passed (aka indirect pass/return). In that case, this code is not executed, and we pass/return using a pointer pointing to a struct whose fields are signed in the same way as any other struct. I don't think we should sign the fields in an indirectly passed/returned struct differently, as that would prevent direct use of the struct in the caller or callee.

It still doesn't make things worse than generating (and caching) a function, if the function is small then the inliner handles it - though here we're getting into: you should just be generating the correct copy, move, etc functions for the type, and all the other behaviors fall out of that automatically.

I still don't understand why you are insisting on this. From your use of the words "caching" and "repeatedly", maybe it's because you think that it would too expensive to execute this code every time we generate code to pass or return a struct by value? I think that it would be a premature optimization (i.e. optimization of time spent compiling) to outline the function and that would also substantially increase complexity. It's also unclear that it would even be an overall optimization at all because the inliner would now need to inline (copy) the code into every call site, SROA would now need to do more analysis and move the passed values into registers, and so on.

It still doesn't make things worse

It would be highly unlikely to make the generated code better. In order to make the generated code better, I think you would need at least all of the following:

compiling at -O0 (which means that you don't care about performance anyway)

constant discriminators that aren't already being used by other code in the function

both fields are of pointer type

SROA is unable to eliminate all sign/auth instructions entirely

And even then, maybe you would be able to reduce the number of instructions in the caller by I think 1. These are all unusual circumstances and we shouldn't add such complexity to maybe reduce the number of instructions by 1 in such an unusual case where the user isn't even asking for optimization. There are more effective ways of reducing instruction counts that apply to more places, may be applied at higher optimization levels and don't involve adding this much complexity. Instead, we should do the simpler thing that I implemented that generates optimal code in almost every case (and every case that matters).

you should just be generating the correct copy, move, etc functions for the type, and all the other behaviors fall out of that automatically.

The direct pass/return ABI is generally only used if the copy/move constructors are trivial (trivial_abi is an exception to this), which generally means that no actual constructor functions are generated. This is in conformance to the rules of the language, which prevents an object's address from changing without being observed by a copy constructor.

ojhunt · May 14, 2025

clang/lib/CodeGen/CGClass.cpp

+  // constructor will need to read and authenticate any pointer fields in order
+  // to copy the object to a new address, which will fail if the pointers are
+  // uninitialized.
+  if (!getContext().arePFPFieldsTriviallyRelocatable(D->getParent())) {


Correctly initializing the object is the job of the constructor, not the caller. in cases where a constructor is being called, you should be ensuring that the constructor does the correct thing.

That's fair. This code was added while testing in our internal codebase at a time when we had a more expansive view of which fields would be subject to PFP. I will re-evaluate whether this is still needed.

ojhunt · May 14, 2025

clang/lib/CodeGen/CGClass.cpp

@@ -2976,7 +3006,15 @@ void CodeGenFunction::EmitForwardingCallToLambda(
  QualType resultType = FPT->getReturnType();
  ReturnValueSlot returnSlot;
  if (!resultType->isVoidType() &&
-      calleeFnInfo->getReturnInfo().getKind() == ABIArgInfo::Indirect &&
+      (calleeFnInfo->getReturnInfo().getKind() == ABIArgInfo::Indirect ||
+       // With pointer field protection, we need to set up the return slot when


This should also not need to be modified. Something is up with how you're setting up the record decls if you need to have so many insertions of PFP specific behavior.

The return value slot for forwarding lambdas is set up in an unusual way compared to other functions. I think I first tried simplifying this to work like normal functions but that led to other problems. I agree that ideally we wouldn't need to modify this.

Just looking through the code again, I think I might be able to eliminate the change here by extracting the implementation of __builtin_trivially_relocate and having CodeGenFunction::EmitAggregateCopy use it.

ojhunt · May 14, 2025

clang/lib/CodeGen/CodeGenFunction.cpp

@@ -2268,13 +2293,22 @@ CodeGenFunction::EmitNullInitialization(Address DestPtr, QualType Ty) {

    // Get and call the appropriate llvm.memcpy overload.
    Builder.CreateMemCpy(DestPtr, SrcPtr, SizeVal, false);
-    return;
+  } else {
+    // Otherwise, just memset the whole thing to zero.  This is legal


This means types that require constructors will be doing zero initialization, then reinitializing fields. From a codegen PoV this can lead to codegen along the lines of

zero the memory; // compiler is set to initialize all memory zero the memory; // this branch means objects that are not zero initializable get initialized again initialize the memory; // the struct is not zero initializable the constructor/initializer will run

Ideally the compiler will optimized this down, but it's both extra codegen, and extra optimization work.

Given that your model assumes that null is a safe value for PFP fields, the isZeroInitializable() call should return true. For non-zero initializable objects, the initialization code for the object is responsible for initializing the PFP fields.

A PFP field containing a null pointer has a non-null bit pattern (it is a signed null). This is to avoid a conditional branch at every location when loading/storing a pointer and the nullness of the pointer cannot be statically determined. That's why we need to store nulls here.

The EmitNullInitialization function is not used in case of a constructor call. It is used in cases like

struct S { void *p; }; void f() { S s{}; }

I am not adding a call to memset here. The change here is a bit confusing because I needed to change some of the indentation to get rid of an early return. The memset previously on line 2277, which was moved to line 2301, was previously, and is now, being executed if isZeroInitializable. The only code being added is the null stores.

I'll re-read this section once you deal with the misery that is indentation rules :-/

I'm sorry, I don't get what action you think I need to take here. This is the code before this patch:

if (!isZeroInitializable) { [...] Builder.CreateMemCpy; return; } Builder.CreateMemSet;

This is the intermediate state (no functional change):

if (!isZeroInitializable) { [...] Builder.CreateMemCpy; } else { Builder.CreateMemSet; }

And this is the code after the patch:

if (!isZeroInitializable) { [...] Builder.CreateMemCpy; } else { Builder.CreateMemSet; } std::vector<PFPField> PFPFields; [...]

ojhunt · May 14, 2025

clang/lib/Driver/ToolChains/Clang.cpp

@@ -7756,6 +7756,10 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
  Args.addOptInFlag(CmdArgs, options::OPT_funique_source_file_names,
                    options::OPT_fno_unique_source_file_names);

+  if (!IsCudaDevice)


Why would a GPU not want this? :D :D

More seriously, there are a number of gpu backends here. I think the correct thing here is to assume that if someone is turning this on they know what they're doing.

If a future platform were to adopt this as the platform ABI, then that platform would set the flag as part of the automatic target features.

e.g. just make this unconditional - if the flag has been set, propagate it.

With CUDA the driver creates two compilation jobs, one for the host and one for the device. Since it is an ABI requirement that all C++ translation units are built with the same -fexperimental-pointer-field-protection flags, the flag must be passed when compiling the CUDA translation units as well, so that the host side is built correctly. And since GPUs don't support PFP, we need to exclude the flag from the CUDA device job. It looks like the same ends up happening for PAuth ABI because the flags are copied in Clang::AddAArch64TargetArgs, which means that they are ignored for the CUDA device which will have a different architecture.

If someone did end up implementing PFP for GPUs, I think we would create a separate flag to enable PFP for the device.

I was joking - what I was trying to say is that if (!IsCudaDevice) check should not be needed - if an environment is targeting cuda it should not be setting this flag (unless it has added support in cuda)

PAuth ABI because the flags are copied in Clang::AddAArch64TargetArgs, which means that they are ignored for the CUDA device which will have a different architecture.

Yes, the flags are set on the basis of the target - my point with saying !Cuda is that the exact same logic applies to OpenCL, HLSL, etc and every other GPU, or specialized coprocessor, which is why it's not the job of this code to exclude cuda, but the job of the initializing instance to ensure that the settings are valid for the target.

I think I know a thing or two about CUDA. CUDA has a different compilation model to OpenCL or HLSL where some of the code in the translation unit is compiled for the host and some is compiled for the device, and all that happens in a single compile command. That's why we don't need something equivalent for OpenCL and HLSL: it's because in those languages, the device is the only target. I explained why passing the flag is necessary for the host side to be compiled correctly. You can see how the driver builds for two targets below.

I didn't add this condition gratuitously, I added it because I tried building a binary in our internal codebase that included a CUDA translation unit and encountered this problem.

We can see the variable being used similarly in other places, such as here for profile generation. The condition is presumably present because there's no support for collecting profile data from GPUs, but we still want to profile the host-side CUDA code.

That being said, you'll note that that code is also checking IsHIPDevice. Other places are also checking IsSYCLDevice. It's been a while since I did any GPU programming, but as far as I'm aware, HIP and SYCL have similar compilation models to CUDA. Since the other usage of these variables seems to be inconsistent and possibly missing for some of the languages, it may be a good idea to unify all of them under an IsGPUDevice variable and then we can use that here as well as in other places.

$ touch foo.cu $ clang foo.cu -nocudainc -nocudalib -v -S clang version 19.1.7 (Fedora 19.1.7-1.fc41) Target: aarch64-redhat-linux-gnu Thread model: posix InstalledDir: /usr/bin Configuration file: /etc/clang/aarch64-redhat-linux-gnu-clang.cfg System configuration file directory: /etc/clang/ Found candidate GCC installation: /usr/bin/../lib/gcc/aarch64-redhat-linux/14 Selected GCC installation: /usr/bin/../lib/gcc/aarch64-redhat-linux/14 Candidate multilib: .;@m64 Selected multilib: .;@m64 "/usr/bin/clang-19" -cc1 -triple aarch64-redhat-linux-gnu -aux-triple nvptx64-nvidia-cuda -S -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name foo.cu -mrelocation-model static -mframe-pointer=non-leaf -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu generic -target-feature +v8a -target-feature +fp-armv8 -target-feature +neon -target-abi aapcs -debugger-tuning=gdb -fdebug-compilation-dir=/home/pcc/2/l -v -fcoverage-compilation-dir=/home/pcc/2/l -resource-dir /usr/bin/../lib/clang/19 -internal-isystem /usr/bin/../lib/clang/19/include/cuda_wrappers -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14 -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14 -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward -internal-isystem /usr/bin/../lib/clang/19/include -internal-isystem /usr/local/include -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/bin/../lib/clang/19/include -internal-isystem /usr/local/include -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdeprecated-macro -ferror-limit 19 -nogpulib -fno-signed-char -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -fcolor-diagnostics -cuid=45b32799ded63598 -target-feature +outline-atomics -target-feature -fmv -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o foo.s -x cuda foo.cu clang -cc1 version 19.1.7 based upon LLVM 19.1.7 default target aarch64-redhat-linux-gnu ignoring nonexistent directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include" ignoring nonexistent directory "/include" ignoring nonexistent directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include" ignoring nonexistent directory "/include" ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14" ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux" ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward" ignoring duplicate directory "/usr/bin/../lib/clang/19/include" ignoring duplicate directory "/usr/local/include" ignoring duplicate directory "/usr/include" #include "..." search starts here: #include <...> search starts here: /usr/bin/../lib/clang/19/include/cuda_wrappers /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14 /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward /usr/bin/../lib/clang/19/include /usr/local/include /usr/include End of search list. "/usr/bin/clang-19" -cc1 -triple nvptx64-nvidia-cuda -aux-triple aarch64-redhat-linux-gnu -S -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name foo.cu -mrelocation-model static -mframe-pointer=all -fno-rounding-math -no-integrated-as -aux-target-cpu generic -aux-target-feature +v8a -aux-target-feature +fp-armv8 -aux-target-feature +neon -fcuda-is-device -mllvm -enable-memcpyopt-without-libcalls -target-cpu sm_52 -target-feature +ptx42 -debugger-tuning=gdb -fno-dwarf-directory-asm -fdebug-compilation-dir=/home/pcc/2/l -v -resource-dir /usr/bin/../lib/clang/19 -internal-isystem /usr/bin/../lib/clang/19/include/cuda_wrappers -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14 -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14 -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward -internal-isystem /usr/bin/../lib/clang/19/include -internal-isystem /usr/local/include -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/bin/../lib/clang/19/include -internal-isystem /usr/local/include -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdeprecated-macro -fno-autolink -ferror-limit 19 -nogpulib -fno-signed-char -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -fcolor-diagnostics -cuid=45b32799ded63598 -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o foo-cuda-nvptx64-nvidia-cuda-sm_52.s -x cuda foo.cu clang -cc1 version 19.1.7 based upon LLVM 19.1.7 default target aarch64-redhat-linux-gnu ignoring nonexistent directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include" ignoring nonexistent directory "/include" ignoring nonexistent directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include" ignoring nonexistent directory "/include" ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14" ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux" ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward" ignoring duplicate directory "/usr/bin/../lib/clang/19/include" ignoring duplicate directory "/usr/local/include" ignoring duplicate directory "/usr/include" #include "..." search starts here: #include <...> search starts here: /usr/bin/../lib/clang/19/include/cuda_wrappers /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14 /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward /usr/bin/../lib/clang/19/include /usr/local/include /usr/include End of search list.

ojhunt · May 14, 2025

clang/lib/Sema/SemaDeclCXX.cpp

+  // FIXME: PFP should not affect trivial relocatability, instead it should
+  // affect the implementation of std::trivially_relocate. See:
+  // https://discourse.llvm.org/t/rfc-structure-protection-a-family-of-uaf-mitigation-techniques/85555/16?u=pcc
+  if (!SemaRef.Context.arePFPFieldsTriviallyRelocatable(D) &&


If PFP fields are present, and we haven't yet implemented support for them in trivially_relocate, IsCXXTriviallyRelocatableType should be returning false.

This is something we have to address for address discriminated pointer auth as well.

Yes, that does look like the right place for now until we have std::trivially_relocate fully implemented. I see there is already a call to BaseElementType.hasAddressDiscriminatedPointerAuth() in that function.

I think it would be reasonable to make a general ASTContext::HasAddressDiscriminatedData(...) ( or similar method that can check for pointer auth and PFP, and an associated update of the existing hasAddressDiscriminatedPointerAuth calls to use that function for semantic checks related to "does this contain values with storage derived representation" at least. Don't quote me on the naming though, because I am terrible at that.

Sounds good, I think we could also call it from isMemcpyableField.

Looking at the code again: IsCXXTriviallyRelocatableType(QualType Type) is already returning false as a result of this change because it calls IsCXXTriviallyRelocatableType(Sema &S, const CXXRecordDecl *RD) if the base type is determined to be a record type. That function will call CheckCXX2CRelocatableAndReplaceable if the result is not cached. CheckCXX2CRelocatableAndReplaceable will use the code that I am modifying to set CXXRecordDeclRelocationInfo::IsRelocatable which will become the return value of IsCXXTriviallyRelocatableType(Sema &S, const CXXRecordDecl *RD). So I think this is the right place to make the change and PAuth ABI should likely move to the same place if it wants to cache the result.

ojhunt

unfortunately I really don't know enough about actual backend codegen to comment on the actual codegen implementation here, but I think there's some design issues with having the backend determine the discriminators rather than having those selected in the front end

ojhunt · May 15, 2025

llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp

+
+  for (User *U : Intr.users()) {
+    auto *Call = cast<CallInst>(U);
+    auto *FieldName = cast<Metadata>(


I'm not really a backend person, but I feel that it's incorrect for the discriminator selection/computation to be occurring in llvm. I think clang should be passing the discriminator as a parameter to the intrinsics - this would also permit the PFP system to allow users to override the discriminator for fields, which is likely to end up being necessary, as time changes the exact types or naming of fields (side note: because I'm a muppet, I only just realized that the current model derives the discriminator from the name of the field which means the names of fields actually becomes ABI - it might be better to use something like "name of struct + offset in struct + type" though obviously that reduces the variation in discriminators - OTOH it's not uncommon for different projects to declare there own versions of POD structs with slightly different naming so maybe it helps there)

Yeah, the exact semantics of the intrinsic are a bit inelegant at the moment. For example, Clang shares knowledge with this pass about how deactivation symbols are named, as well as how to compute hashes (it uses std::hash because I didn't get around to switching it to SipHash yet, my bad). It may be best to move all of that knowledge into Clang and then the intrinsic would take a deactivation symbol as well as a discriminator hash.

that would definitely be my preference, and I think it should end up being simpler.

pcc

Hi Oliver, thanks for your comments! I'll address them below.

Thoughts:

This should be opt-in on a field or struct granularity, not just a global behavior.

This would certainly be easier if it were an opt-in behavior, as it would allow avoiding a substantial amount of complexity that you point out below. The problem is that would not make the solution very effective as a UAF mitigation, as it would only help with code that was opted in, which is likely a tiny fraction of an entire codebase. That fraction is unlikely to be the fraction with UAF bugs, not only because of the size of the fraction but also because if developers aren't thinking about memory safety issues, they certainly won't be manually opting into this. With PFP, the problem that I set out to solve was to find a way to automatically protect as many pointers stored in memory as possible in standards-conforming code.

In the RFC I think you mentioned not applying PFP to C types, but I'm unsure how you're deciding what is a C type?

This is based on whether the type has standard layout. C does not permit declaring a type that is non-standard layout.

There are a lot of special cases being added in places that should not need to be aware of PFP - the core type queries should be returning correct values for types containing PFP fields.

Agreed on type queries. The current adjustment to how trivially-relocatable is defined is a workaround that will be removed once P2786 is fully implemented in libc++, and at that point we shouldn't need __has_non_relocatable_fields either. The intent is that turning on PFP should be invisible to the developer (in other words, the modification to the storage format of pointer fields in certain structs is permitted by the as-if rule). As a result, most of the implementation is in CodeGen and below.

A lot of the "this read/write/copy/init special work" exactly aligns with the constraints of pointer auth, and I think the overall implementation could be made better by introducing the concept of a HardenedPointerQualifier or similar, that could be either PointerAuthQualifier or PFPQualifier. In principle this could even just be treated as a different PointerAuth key set.

That approach would also help with some of the problems you have with offsetof and pointers to PFP fields - the thing that causes the problems with the pointer to a PFP field is that the PFP schema for a given field is in reality part of the type, so given
struct A {
   void *field1;
   void *field2;
};
It looks you are trying to maintain the idea that A::field1 and A::field2 have the same type, which makes this code valid according to the type system:
void f(void** obj);
void g(A *a) {
  f(&a->field1);
  f(&a->field2);
}
but the real types are not the same. The semantic equivalent under pointer auth is something akin to
struct A {
   void * __ptrauth(1, 1, hash("A::field1")) field1;
   void * __ptrauth(1, 1, hash("A::field2")) field2;
};
Which makes it very clear that the types are different, and I think trying to pretend they are not is part of what is causing problems.

In the early stages of the project, I wanted to make it so that the PFP fields would implicitly receive qualifiers, similar to what you proposed, and I was trying to come up with an approach that would make it work, but I couldn't see a way because of the inter-convertibility of pointers from different structs (and pointers outside of structs entirely), and disallowing conversions between differently qualified pointer types would render the compiler unable to compile standard C++. The only way that I could see to make it work was to utilize the as-if rule and invisibly modify the in-memory representation.

For pointers in standard-layout types (i.e. pointers that explicitly opt in) I agree with you that the right approach would be to use a qualifier, but I think that should be done as a separate change, possibly as a followup. In this change, I would like to implement the solution for implicitly protected fields.

The more I think about it, the more I feel that this could be implemented almost entirely on top of the existing pointer auth work.

Currently for pointer auth there's a set of ARM keys specified, and I think you just need to create a different set, PFP_Keys or similar, and set up the appropriate schemas (basically null schemas for all the existing cases), and add a new schema: DefaultFieldSchema or similar, and apply that schema to every pointer field in an object unless there's an explicit qualifier applied already.

Then it would in principle just be a matter of making adding the appropriate logic to the ptrauth intrinsics in llvm.

Now there are some moderately large caveats to that idea

the existing ptrauth semantics simply say that any struct containing address discriminated values is no-longer a pod type and so gets copied through the stack, which is something you're trying to avoid, so you'd still need to keep that logic (though as I said, that should be done by generating and then calling the functions to do rather than regenerating the entire copy repeatedly).

the way that pointer auth is currently implemented assumes that there is only a single pointer auth abi active so you wouldn't be able to trivially have both pointer auth and PFP live at the same time. That's what makes me think having a SecuredPointer abstraction might be a batter approach, but it might also not be too difficult to update the PointerAuthQualifier to include the ABI being used -- I'm really not sure.

Yes, something like that is how I would imagine implementing this for opt-in fields.

pcc · May 15, 2025

clang/include/clang/Basic/Attr.td

@@ -2513,6 +2513,12 @@ def CountedByOrNull : DeclOrTypeAttr {
  let LangOpts = [COnly];
 }

+def NoPointerFieldProtection : DeclOrTypeAttr {


There are numerous circumstances where the C++ ABI is distinct from the platform ABI and is allowed to change. For example, most Chromium-based web browsers ship with a statically linked libc++, and the same is true for many Android apps and server-side software at Google. In the intended usage model, all of libc++ would be built with a -fexperimental-pointer-field-protection= flag consistent with the rest of the program.

The libc++abi opt-outs that I added are only for opting out pointer fields of RTTI structs that are generated by the compiler. In principle, we could teach the compiler to sign those pointer fields which would let us remove the opt-outs, but there is a lower ROI for subjecting those fields to PFP because the RTTI structs are not typically dynamically allocated.

We may consider adding an attribute to allow opting in in the case where the flag is not passed. But I think we should do that in a followup. We would need to carefully consider how that would interact with the other aspects of the opt-in solution, such as the pointer qualifiers.

pcc · May 15, 2025

clang/include/clang/Basic/LangOptions.h

@@ -362,6 +362,17 @@ class LangOptionsBase {
    BKey
  };

+  enum class PointerFieldProtectionKind {


I think that allowing this level of customization should be implemented as part of the separate opt-in solution (e.g. it may be a property of the qualifier). It is generally a reasonable assumption that operator new is the same for all types. In the unlikely case that it isn't, we aren't really doing anything wrong by using the malloc-derived decision here. It just means that we lose some entropy from the type or the pointer tag.

pcc · May 15, 2025

clang/include/clang/Basic/TokenKinds.def

@@ -544,6 +544,7 @@ TYPE_TRAIT_2(__is_pointer_interconvertible_base_of, IsPointerInterconvertibleBas
 #include "clang/Basic/TransformTypeTraits.def"

 // Clang-only C++ Type Traits
+TYPE_TRAIT_1(__has_non_relocatable_fields, HasNonRelocatableFields, KEYCXX)


I don't like it either. As mentioned in the main comment, I hope to get rid of this when the P2786 implementation is finalized.

Sharing the qualifiers with PAuth ABI would be a good idea for the separate opt-in solution.

pcc · May 15, 2025

clang/lib/CodeGen/CGCall.cpp

+  // pattern to the struct in memory, so we must read the elements one by one
+  // and use them to form the coerced structure.
+  std::vector<PFPField> PFPFields;
+  CGF.getContext().findPFPFields(SrcFETy, CharUnits::Zero(), PFPFields, true);


This code is used to pass and return small structs by value in registers (aka direct pass/return). In this case, we typically have 1-2 fields (the exact conditions for passing a struct in registers is up to the platform ABI), so it is unlikely to be useful to outline it. Outlining would be counterproductive from an optimization standpoint because it would inhibit the SROA pass until the function is inlined. We would also need to create a separate ABI for passing and returning the struct to the copy thunk function, because CreateCoercedLoad is how we would normally pass the struct to the function.

When a larger struct is passed or returned by value, the ABI typically requires a pointer to be passed (aka indirect pass/return). In that case, this code is not executed, and we pass/return using a pointer pointing to a struct whose fields are signed in the same way as any other struct. I don't think we should sign the fields in an indirectly passed/returned struct differently, as that would prevent direct use of the struct in the caller or callee.

pcc · May 15, 2025

clang/lib/CodeGen/CGCall.cpp

+  // pattern to the struct in memory, so we must read the elements one by one
+  // and use them to form the coerced structure.
+  std::vector<PFPField> PFPFields;
+  getContext().findPFPFields(SrcFETy, CharUnits::Zero(), PFPFields, true);


See my other comment.

pcc · May 15, 2025

clang/lib/CodeGen/CodeGenFunction.cpp

@@ -2268,13 +2293,22 @@ CodeGenFunction::EmitNullInitialization(Address DestPtr, QualType Ty) {

    // Get and call the appropriate llvm.memcpy overload.
    Builder.CreateMemCpy(DestPtr, SrcPtr, SizeVal, false);
-    return;
+  } else {
+    // Otherwise, just memset the whole thing to zero.  This is legal


A PFP field containing a null pointer has a non-null bit pattern (it is a signed null). This is to avoid a conditional branch at every location when loading/storing a pointer and the nullness of the pointer cannot be statically determined. That's why we need to store nulls here.

The EmitNullInitialization function is not used in case of a constructor call. It is used in cases like

struct S { void *p; }; void f() { S s{}; }

I am not adding a call to memset here. The change here is a bit confusing because I needed to change some of the indentation to get rid of an early return. The memset previously on line 2277, which was moved to line 2301, was previously, and is now, being executed if isZeroInitializable. The only code being added is the null stores.

pcc · May 15, 2025

clang/lib/Driver/ToolChains/Clang.cpp

@@ -7756,6 +7756,10 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
  Args.addOptInFlag(CmdArgs, options::OPT_funique_source_file_names,
                    options::OPT_fno_unique_source_file_names);

+  if (!IsCudaDevice)


With CUDA the driver creates two compilation jobs, one for the host and one for the device. Since it is an ABI requirement that all C++ translation units are built with the same -fexperimental-pointer-field-protection flags, the flag must be passed when compiling the CUDA translation units as well, so that the host side is built correctly. And since GPUs don't support PFP, we need to exclude the flag from the CUDA device job. It looks like the same ends up happening for PAuth ABI because the flags are copied in Clang::AddAArch64TargetArgs, which means that they are ignored for the CUDA device which will have a different architecture.

If someone did end up implementing PFP for GPUs, I think we would create a separate flag to enable PFP for the device.

pcc · May 15, 2025

clang/lib/Sema/SemaDeclCXX.cpp

+  // FIXME: PFP should not affect trivial relocatability, instead it should
+  // affect the implementation of std::trivially_relocate. See:
+  // https://discourse.llvm.org/t/rfc-structure-protection-a-family-of-uaf-mitigation-techniques/85555/16?u=pcc
+  if (!SemaRef.Context.arePFPFieldsTriviallyRelocatable(D) &&


Yes, that does look like the right place for now until we have std::trivially_relocate fully implemented. I see there is already a call to BaseElementType.hasAddressDiscriminatedPointerAuth() in that function.

pcc · May 15, 2025

llvm/lib/CodeGen/PreISelIntrinsicLowering.cpp

+
+  for (User *U : Intr.users()) {
+    auto *Call = cast<CallInst>(U);
+    auto *FieldName = cast<Metadata>(


Yeah, the exact semantics of the intrinsic are a bit inelegant at the moment. For example, Clang shares knowledge with this pass about how deactivation symbols are named, as well as how to compute hashes (it uses std::hash because I didn't get around to switching it to SipHash yet, my bad). It may be best to move all of that knowledge into Clang and then the intrinsic would take a deactivation symbol as well as a discriminator hash.

pcc · May 15, 2025

clang/include/clang/Driver/Options.td

@@ -3011,6 +3011,12 @@ defm experimental_omit_vtable_rtti : BoolFOption<"experimental-omit-vtable-rtti"
  NegFlag<SetFalse, [], [CC1Option], "Do not omit">,
  BothFlags<[], [CC1Option], " the RTTI component from virtual tables">>;

+def experimental_pointer_field_protection_EQ : Joined<["-"], "fexperimental-pointer-field-protection=">, Group<f_Group>,


(See other reply.)

ojhunt · May 15, 2025

Hi Oliver, thanks for your comments! I'll address them below.

Thoughts:
This should be opt-in on a field or struct granularity, not just a global behavior.

This would certainly be easier if it were an opt-in behavior, as it would allow avoiding a substantial amount of complexity that you point out below. The problem is that would not make the solution very effective as a UAF mitigation, as it would only help with code that was opted in, which is likely a tiny fraction of an entire codebase. That fraction is unlikely to be the fraction with UAF bugs, not only because of the size of the fraction but also because if developers aren't thinking about memory safety issues, they certainly won't be manually opting into this. With PFP, the problem that I set out to solve was to find a way to automatically protect as many pointers stored in memory as possible in standards-conforming code.

There are a significant number of standard layout types that contain meaningfully powerful data if they are compromised, which leads to saying the standard layout types should also be covered.

That said, there's a significant cost to applying pointer protection universally though. The cost of existing mitigations rely on them not being global properties as they can saturate the required compute resources.

I would be interested in a performance comparison of enumerating struct fields as you increase the number of fields being accessed just to see if there's a point that perf starts to collapse - I suspect there would be, but if the number of concurrent (from the cpu pov, not threading) fields accesses to get to that point is high enough it may not matter. Similar tests for increasing levels of dependent loads are also probably worth while. The intent is to make this behavior at least partially universal (all pointers in objects that are not standard layout), which means that extrapolating from individual loads, or loads from the same addresses, are not necessarily representative.

In the RFC you talk about the time and/or cycles required to perform operations, but that's the best case, where the required units are not saturated. This gate on performance isn't unique to pointer protections: every operation consumes some amount of cpu resources, once those resources are saturated the pipeline stalls, so it's necessary to compare performance when many of these operations are happening in succession.

Now there's some caching (saying the tag remains constant, etc) you might be able to do, but that's dependent on being willing to say "this is a software mitigation, so we're ok if we don't catch everything" - these are things that a cpu can mitigate in hardware via direct knowledge of what is happening on a given core, and peeking on the memory bus. (There are also performance costs when you're accessing the same memory on different cores, but that goes so bad so quickly even in normal cases I'm not concerned about that yet).

In the RFC I think you mentioned not applying PFP to C types, but I'm unsure how you're deciding what is a C type?

This is based on whether the type has standard layout. C does not permit declaring a type that is non-standard layout.

The issue here is that there are many cases where standard layout structs contain important data, which is why I'm not 100% sold on excluding C types from this.

There are a lot of special cases being added in places that should not need to be aware of PFP - the core type queries should be returning correct values for types containing PFP fields.

Agreed on type queries. The current adjustment to how trivially-relocatable is defined is a workaround that will be removed once P2786 is fully implemented in libc++, and at that point we shouldn't need __has_non_relocatable_fields either. The intent is that turning on PFP should be invisible to the developer (in other words, the modification to the storage format of pointer fields in certain structs is permitted by the as-if rule). As a result, most of the implementation is in CodeGen and below.

My point was that this should not have a separate attempt to make trivial relocation work, it should make sure the type queries report the correct information, and the implementation of trivially_relocate agrees with that definition. Also trivial relocation isn't meaningfully supported or used anywhere yet - it's a new feature - so I don't think you should be prioritizing supporting that yet: just make sure that the (internal) trait queries return correct/consistent results for what can be handled, and worry about optimizing that later.

A lot of the "this read/write/copy/init special work" exactly aligns with the constraints of pointer auth, and I think the overall implementation could be made better by introducing the concept of a HardenedPointerQualifier or similar, that could be either PointerAuthQualifier or PFPQualifier. In principle this could even just be treated as a different PointerAuth key set.
That approach would also help with some of the problems you have with offsetof and pointers to PFP fields - the thing that causes the problems with the pointer to a PFP field is that the PFP schema for a given field is in reality part of the type, so given
struct A {
   void *field1;
   void *field2;
};
It looks you are trying to maintain the idea that A::field1 and A::field2 have the same type, which makes this code valid according to the type system:
void f(void** obj);
void g(A *a) {
  f(&a->field1);
  f(&a->field2);
}
but the real types are not the same. The semantic equivalent under pointer auth is something akin to
struct A {
   void * __ptrauth(1, 1, hash("A::field1")) field1;
   void * __ptrauth(1, 1, hash("A::field2")) field2;
};
Which makes it very clear that the types are different, and I think trying to pretend they are not is part of what is causing problems.
In the early stages of the project, I wanted to make it so that the PFP fields would implicitly receive qualifiers, similar to what you proposed, and I was trying to come up with an approach that would make it work, but I couldn't see a way because of the inter-convertibility of pointers from different structs (and pointers outside of structs entirely), and disallowing conversions between differently qualified pointer types would render the compiler unable to compile standard C++. The only way that I could see to make it work was to utilize the as-if rule and invisibly modify the in-memory representation.

Can you provide a code example of the conversion case you're describing? I'm not sure I understand what case you're concerned about. If the qualifiers you placed implicitly on a field disagree, then the alternative of computing a schema later would also disagree. The semantics of implicitly adding a qualifier or attribute to carry the schema vs computing the schema upon use should not impact what schema you end up with, because both cases are describing the same field.

For pointers in standard-layout types (i.e. pointers that explicitly opt in) I agree with you that the right approach would be to use a qualifier, but I think that should be done as a separate change, possibly as a followup. In this change, I would like to implement the solution for implicitly protected fields.

The problem you get there is similar to what pointer auth gets with function pointers: the authentication is present in some cases but not in others, so we end up with a function on ASTContext that looks first for an explicit policy, and then enumerates the various ways an implicit schema might apply. It's not great, and can result in errors where the correct schema is not used.

The more I think about it, the more I feel that this could be implemented almost entirely on top of the existing pointer auth work.
Currently for pointer auth there's a set of ARM keys specified, and I think you just need to create a different set, PFP_Keys or similar, and set up the appropriate schemas (basically null schemas for all the existing cases), and add a new schema: DefaultFieldSchema or similar, and apply that schema to every pointer field in an object unless there's an explicit qualifier applied already.
Then it would in principle just be a matter of making adding the appropriate logic to the ptrauth intrinsics in llvm.
Now there are some moderately large caveats to that idea

the existing ptrauth semantics simply say that any struct containing address discriminated values is no-longer a pod type and so gets copied through the stack, which is something you're trying to avoid, so you'd still need to keep that logic (though as I said, that should be done by generating and then calling the functions to do rather than regenerating the entire copy repeatedly).

the way that pointer auth is currently implemented assumes that there is only a single pointer auth abi active so you wouldn't be able to trivially have both pointer auth and PFP live at the same time. That's what makes me think having a SecuredPointer abstraction might be a batter approach, but it might also not be too difficult to update the PointerAuthQualifier to include the ABI being used -- I'm really not sure.

Yes, something like that is how I would imagine implementing this for opt-in fields.

What I mean is that in an environment where you enable the behavior globally, you would simply apply the qualifier (or attribute) implicitly to all relevant pointer fields, computing the exact same schema you currently compute, only you do it at the point of declaration. In both cases all fields for which PFP applies will have the correct schema attached, and all the changes is the ration of "has PFP" to "does not have PFP"

Rather than going back and forth in review, I'm also happy to chat in discord :D

pcc

There are a significant number of standard layout types that contain meaningfully powerful data if they are compromised, which leads to saying the standard layout types should also be covered.

That said, there's a significant cost to applying pointer protection universally though. The cost of existing mitigations rely on them not being global properties as they can saturate the required compute resources.

Yes, there's a cost in terms of CPU cycles. The other costs that we need to think about are the ecosystem cost of creating an incompatible language dialect as well as the pragmatic need to remain compatible with the platform C ABI.

Please refer to this section of the RFC which explains the standard layout restriction. For point 2, please refer to the rule of pointer-interconvertibility to understand how standard-layout types may cause problems for an as-if rule based approach.

I would be interested in a performance comparison of enumerating struct fields as you increase the number of fields being accessed just to see if there's a point that perf starts to collapse - I suspect there would be, but if the number of concurrent (from the cpu pov, not threading) fields accesses to get to that point is high enough it may not matter. Similar tests for increasing levels of dependent loads are also probably worth while. The intent is to make this behavior at least partially universal (all pointers in objects that are not standard layout), which means that extrapolating from individual loads, or loads from the same addresses, are not necessarily representative.

This is something that we'd like to look at in future work; you'll note the link to this proposal in the RFC. Already we can see that the overhead of applying this to everything is low in the realistic internal benchmark that we used (and there are early indications that it could be even lower; the 1.5-2% number was collected almost a year ago when the implementation was in a different state), but for hyperscalers, tiny performance differences can translate to big $$$, so even something like a 0.5% overhead can be too much.

In the RFC you talk about the time and/or cycles required to perform operations, but that's the best case, where the required units are not saturated. This gate on performance isn't unique to pointer protections: every operation consumes some amount of cpu resources, once those resources are saturated the pipeline stalls, so it's necessary to compare performance when many of these operations are happening in succession.

To a degree. What ultimately matters is how much more expensive it becomes to run real workloads with the feature turned on, and the closest way that we have to measure that right now is the internal benchmark that I mentioned. (The gold standard involves an A/B test in production, but we won't be able to run that until the feature lands in LLVM.)

In a realistic workload where a large amount of data is processed, the cache utilization is high so the probability of a cache hit is lower (affecting relative slowdown of pointer loads), which also leads to the frequency of pointer loads being less than for microbenchmarks (affecting execution unit utilization). So one must be mindful that a typical microbenchmark may over-represent the cost. I think these factors explain the low overhead numbers that we're seeing.

The issue here is that there are many cases where standard layout structs contain important data, which is why I'm not 100% sold on excluding C types from this.

The overall goal is to achieve protection from a statistical perspective: we want to maximize the probability that a particular pointer will be protected as much as we can, while staying within the language rules, so as to maximize the probability that a particular UAF bug will be mitigated.

The standard layout issue is something that we continue to investigate internally and was one of the issues highlighted by an internal effectiveness analysis. I think it may be possible to increase the coverage in certain cases, such as when a standard layout object is a field of a non-standard layout struct, but the matter will require more investigation so we can find a way that doesn't create a dialect. As things are we achieve a fairly high level of coverage.

My point was that this should not have a separate attempt to make trivial relocation work, it should make sure the type queries report the correct information, and the implementation of trivially_relocate agrees with that definition. Also trivial relocation isn't meaningfully supported or used anywhere yet - it's a new feature - so I don't think you should be prioritizing supporting that yet: just make sure that the (internal) trait queries return correct/consistent results for what can be handled, and worry about optimizing that later.

My goal is to align the queries with the implementation and share as much as possible with PAuth ABI. I'm already sharing the ptrauth intrinsics and the constant initializers.

The reason for the trivial relocatability changes is that libc++ already contains a partial (and not aligned with C++26) implementation of trivial relocatability where it assumes that types for which __libcpp_is_trivially_relocatable is true may be memcpy'd. That assumption is invalid under existing language rules and breaks under PFP. Hence there was a need to do something about that. The temporary implementation that uses __has_non_relocatable_fields is intended to roughly emulate what will be implemented for C++26 and was necessary for benchmarking as we would like to avoid measuring an unrealistic slowdown that would be caused by disabling the interim trivially relocatable implementation in libc++. For upstream I would likely be happy to remove __has_non_relocatable_fields and make __libcpp_is_trivially_relocatable always return false when PFP is enabled until the C++26 based implementation is ready.

Can you provide a code example of the conversion case you're describing? I'm not sure I understand what case you're concerned about.

The case is as in your reply:

struct A {
   virtual ~A(); // for non-standard layout
   void *field1;
   void *field2;
};

void f(void** obj);
void g(A *a) {
  f(&a->field1);
  f(&a->field2);
}

This is standard C++, so we need to make it work somehow. With implicit qualifiers, the conversion of void *__ptrauth(1, 1, hash("A::field1")) * to void ** would be disallowed. Moreover, the type differences are detectable in numerous other ways, such as via std::is_same, so even if we somehow made them convertible, the non-standard behavior would be visible to the program. By creating a dialect of C++ in which these types are different, we fragment the C++ community, some of whom may refuse (or just not have bandwidth) to support the dialect, which in turn makes it harder for PFP users to import and use third-party code. This situation is distinguishable from things like the -fbounds-safety annotations whose absence does not cause standards-conforming programs to fail to compile.

If the qualifiers you placed implicitly on a field disagree, then the alternative of computing a schema later would also disagree. The semantics of implicitly adding a qualifier or attribute to carry the schema vs computing the schema upon use should not impact what schema you end up with, because both cases are describing the same field.

By effectively computing a schema at link time, we may take advantage of whole program information to only enable pointer encoding for fields that are not address-taken anywhere in the program, and thereby avoid breaking compatibility with standard C++. This is what deactivation symbols achieve.

In the example, compiling f(&a->field1); (assuming that f is not inlined) would trigger the emission of a deactivation symbol for A::field1 which would disable the PFP instructions for direct accesses to field1. If f is inlined and does not escape the address of field1 we can usually retain protection for field1.

You may rightly point out that this can result in silent disablement of protection of fields, similar to the standard layout situation. However, we found that it is uncommon for field addresses to escape like this, so in practice it does not have much of an impact on effectiveness.

There is a similar situation with LLVM's software CFI implementation (and to some extent PAuth ABI) where taking the address of a function can weaken the overall protection by enabling its use in JOP attacks, but this doesn't mean that CFI isn't useful.

What I mean is that in an environment where you enable the behavior globally, you would simply apply the qualifier (or attribute) implicitly to all relevant pointer fields, computing the exact same schema you currently compute, only you do it at the point of declaration. In both cases all fields for which PFP applies will have the correct schema attached, and all the changes is the ration of "has PFP" to "does not have PFP"

I understand your suggestion, and I think it would be great if that actually worked out, not least because it would let me avoid lengthy internet arguments. :-) Unfortunately, careful consideration has shown that it would not be a practical solution, as detailed above.

Rather than going back and forth in review, I'm also happy to chat in discord :D

I may take you up on that. These replies take me some effort to write because I value preciseness. I hope I've managed to convey my perspective to you and let's get in touch on discord if you have any questions.

pcc · May 16, 2025

clang/include/clang/Basic/TokenKinds.def

@@ -544,6 +544,7 @@ TYPE_TRAIT_2(__is_pointer_interconvertible_base_of, IsPointerInterconvertibleBas
 #include "clang/Basic/TransformTypeTraits.def"

 // Clang-only C++ Type Traits
+TYPE_TRAIT_1(__has_non_relocatable_fields, HasNonRelocatableFields, KEYCXX)


I think you misunderstand my intent (I tried to suggest in one the comments or the global comment):

What I'm suggesting is that the qualifier would be implicitly added at the point of declaration, the overarching behavior you're after would be achieved that way.

No, I understood that part. Please refer to the main comment where I explained why that does not work.

However I'm also not sure how reasonable it is to exclude standard layout types from this protection - what exactly is the rationale for that?

Please refer to the main reply.

silently ignoring some structs is similarly a hazard - it can result in "trivial" refactoring invisibly reducing security guarantees

While it is true that a refactoring has the potential to remove protection from a pointer, the same is also true for adding protection to a pointer (usually code is modified by adding things to objects, and the rules of the language are such that adding something to an object can usually only make it non-standard-layout or non-trivially-copyable). See also "statistical perspective" from the main reply.

Regarding P2786: the exact same issues apply to pointer authentication today - trying to solve them separately and outside of the existing triviality queries and operations is not the correct approach.

The results of the triviality queries are unchanged because PFP does not change the triviality from a language rule perspective. If a type was trivially copyable without PFP, it remains trivially copyable with PFP (i.e. the program is still allowed to memcpy the object, because the pointers are not address discriminated). If a type was trivially relocatable without PFP, it remains trivially relocatable with PFP (i.e. std::trivially_relocate still works because of changes that I will make (not uploaded yet) to __builtin_trivially_relocate in CGBuiltin.cpp). Divergence from the standard C++ triviality rules creates an incompatible dialect of C++, which I think we should avoid.

The copying inhibitions I added in clang/lib/CodeGen are for cases such as those where the object is not trivially copyable according to the language rules but Clang figures out that it can memcpy the object anyway without causing an observable violation of the rules. In other words, Clang is already going outside of the existing triviality queries for optimization purposes, and I am adjusting the conditions under which it does so to account for PFP.

If you want the types to be trivially relocatable - which is not a requirement, it's a performance optimization - then ensure the existing queries and operations handle that correctly. Otherwise you run the risk of other code in clang using the existing trivial behavior queries, and expecting them to be correct, when they are not.

On trivial relocatability, see the main reply.

Code in Clang that uses the standards-defined triviality queries will continue to work. Non-standard-defined queries such as isMemcpyEquivalentSpecialMember and isMemcpyableField need to be adjusted, as previously mentioned. It's possible that someone could add a new non-standard query that breaks PFP (and could also break PAuth ABI because it also uses address discriminated pointers) but that's a bug like any other, which our traditional solution for is to rely on testing to flush out any regressions. The patch series I posted for review has been tested as described in the RFC and the remaining test failures have been assessed to be likely to be bugs in the programs being compiled, so we have a reasonable level of confidence in the implementation.

pcc · May 16, 2025

clang/lib/CodeGen/CGCall.cpp

+  // pattern to the struct in memory, so we must read the elements one by one
+  // and use them to form the coerced structure.
+  std::vector<PFPField> PFPFields;
+  CGF.getContext().findPFPFields(SrcFETy, CharUnits::Zero(), PFPFields, true);


I still don't understand why you are insisting on this. From your use of the words "caching" and "repeatedly", maybe it's because you think that it would too expensive to execute this code every time we generate code to pass or return a struct by value? I think that it would be a premature optimization (i.e. optimization of time spent compiling) to outline the function and that would also substantially increase complexity. It's also unclear that it would even be an overall optimization at all because the inliner would now need to inline (copy) the code into every call site, SROA would now need to do more analysis and move the passed values into registers, and so on.

It still doesn't make things worse

It would be highly unlikely to make the generated code better. In order to make the generated code better, I think you would need at least all of the following:

compiling at -O0 (which means that you don't care about performance anyway)

constant discriminators that aren't already being used by other code in the function

both fields are of pointer type

SROA is unable to eliminate all sign/auth instructions entirely

And even then, maybe you would be able to reduce the number of instructions in the caller by I think 1. These are all unusual circumstances and we shouldn't add such complexity to maybe reduce the number of instructions by 1 in such an unusual case where the user isn't even asking for optimization. There are more effective ways of reducing instruction counts that apply to more places, may be applied at higher optimization levels and don't involve adding this much complexity. Instead, we should do the simpler thing that I implemented that generates optimal code in almost every case (and every case that matters).

you should just be generating the correct copy, move, etc functions for the type, and all the other behaviors fall out of that automatically.

The direct pass/return ABI is generally only used if the copy/move constructors are trivial (trivial_abi is an exception to this), which generally means that no actual constructor functions are generated. This is in conformance to the rules of the language, which prevents an object's address from changing without being observed by a copy constructor.

pcc · May 16, 2025

clang/lib/CodeGen/CodeGenFunction.cpp

@@ -2268,13 +2293,22 @@ CodeGenFunction::EmitNullInitialization(Address DestPtr, QualType Ty) {

    // Get and call the appropriate llvm.memcpy overload.
    Builder.CreateMemCpy(DestPtr, SrcPtr, SizeVal, false);
-    return;
+  } else {
+    // Otherwise, just memset the whole thing to zero.  This is legal


I'm sorry, I don't get what action you think I need to take here. This is the code before this patch:

if (!isZeroInitializable) { [...] Builder.CreateMemCpy; return; } Builder.CreateMemSet;

This is the intermediate state (no functional change):

if (!isZeroInitializable) { [...] Builder.CreateMemCpy; } else { Builder.CreateMemSet; }

And this is the code after the patch:

if (!isZeroInitializable) { [...] Builder.CreateMemCpy; } else { Builder.CreateMemSet; } std::vector<PFPField> PFPFields; [...]

pcc · May 16, 2025

clang/lib/Driver/ToolChains/Clang.cpp

@@ -7756,6 +7756,10 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
  Args.addOptInFlag(CmdArgs, options::OPT_funique_source_file_names,
                    options::OPT_fno_unique_source_file_names);

+  if (!IsCudaDevice)


I think I know a thing or two about CUDA. CUDA has a different compilation model to OpenCL or HLSL where some of the code in the translation unit is compiled for the host and some is compiled for the device, and all that happens in a single compile command. That's why we don't need something equivalent for OpenCL and HLSL: it's because in those languages, the device is the only target. I explained why passing the flag is necessary for the host side to be compiled correctly. You can see how the driver builds for two targets below.

I didn't add this condition gratuitously, I added it because I tried building a binary in our internal codebase that included a CUDA translation unit and encountered this problem.

We can see the variable being used similarly in other places, such as here for profile generation. The condition is presumably present because there's no support for collecting profile data from GPUs, but we still want to profile the host-side CUDA code.

That being said, you'll note that that code is also checking IsHIPDevice. Other places are also checking IsSYCLDevice. It's been a while since I did any GPU programming, but as far as I'm aware, HIP and SYCL have similar compilation models to CUDA. Since the other usage of these variables seems to be inconsistent and possibly missing for some of the languages, it may be a good idea to unify all of them under an IsGPUDevice variable and then we can use that here as well as in other places.

$ touch foo.cu $ clang foo.cu -nocudainc -nocudalib -v -S clang version 19.1.7 (Fedora 19.1.7-1.fc41) Target: aarch64-redhat-linux-gnu Thread model: posix InstalledDir: /usr/bin Configuration file: /etc/clang/aarch64-redhat-linux-gnu-clang.cfg System configuration file directory: /etc/clang/ Found candidate GCC installation: /usr/bin/../lib/gcc/aarch64-redhat-linux/14 Selected GCC installation: /usr/bin/../lib/gcc/aarch64-redhat-linux/14 Candidate multilib: .;@m64 Selected multilib: .;@m64 "/usr/bin/clang-19" -cc1 -triple aarch64-redhat-linux-gnu -aux-triple nvptx64-nvidia-cuda -S -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name foo.cu -mrelocation-model static -mframe-pointer=non-leaf -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu generic -target-feature +v8a -target-feature +fp-armv8 -target-feature +neon -target-abi aapcs -debugger-tuning=gdb -fdebug-compilation-dir=/home/pcc/2/l -v -fcoverage-compilation-dir=/home/pcc/2/l -resource-dir /usr/bin/../lib/clang/19 -internal-isystem /usr/bin/../lib/clang/19/include/cuda_wrappers -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14 -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14 -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward -internal-isystem /usr/bin/../lib/clang/19/include -internal-isystem /usr/local/include -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/bin/../lib/clang/19/include -internal-isystem /usr/local/include -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdeprecated-macro -ferror-limit 19 -nogpulib -fno-signed-char -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -fcolor-diagnostics -cuid=45b32799ded63598 -target-feature +outline-atomics -target-feature -fmv -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o foo.s -x cuda foo.cu clang -cc1 version 19.1.7 based upon LLVM 19.1.7 default target aarch64-redhat-linux-gnu ignoring nonexistent directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include" ignoring nonexistent directory "/include" ignoring nonexistent directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include" ignoring nonexistent directory "/include" ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14" ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux" ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward" ignoring duplicate directory "/usr/bin/../lib/clang/19/include" ignoring duplicate directory "/usr/local/include" ignoring duplicate directory "/usr/include" #include "..." search starts here: #include <...> search starts here: /usr/bin/../lib/clang/19/include/cuda_wrappers /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14 /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward /usr/bin/../lib/clang/19/include /usr/local/include /usr/include End of search list. "/usr/bin/clang-19" -cc1 -triple nvptx64-nvidia-cuda -aux-triple aarch64-redhat-linux-gnu -S -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name foo.cu -mrelocation-model static -mframe-pointer=all -fno-rounding-math -no-integrated-as -aux-target-cpu generic -aux-target-feature +v8a -aux-target-feature +fp-armv8 -aux-target-feature +neon -fcuda-is-device -mllvm -enable-memcpyopt-without-libcalls -target-cpu sm_52 -target-feature +ptx42 -debugger-tuning=gdb -fno-dwarf-directory-asm -fdebug-compilation-dir=/home/pcc/2/l -v -resource-dir /usr/bin/../lib/clang/19 -internal-isystem /usr/bin/../lib/clang/19/include/cuda_wrappers -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14 -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14 -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward -internal-isystem /usr/bin/../lib/clang/19/include -internal-isystem /usr/local/include -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/bin/../lib/clang/19/include -internal-isystem /usr/local/include -internal-isystem /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdeprecated-macro -fno-autolink -ferror-limit 19 -nogpulib -fno-signed-char -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -fcolor-diagnostics -cuid=45b32799ded63598 -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o foo-cuda-nvptx64-nvidia-cuda-sm_52.s -x cuda foo.cu clang -cc1 version 19.1.7 based upon LLVM 19.1.7 default target aarch64-redhat-linux-gnu ignoring nonexistent directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include" ignoring nonexistent directory "/include" ignoring nonexistent directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../aarch64-redhat-linux/include" ignoring nonexistent directory "/include" ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14" ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux" ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward" ignoring duplicate directory "/usr/bin/../lib/clang/19/include" ignoring duplicate directory "/usr/local/include" ignoring duplicate directory "/usr/include" #include "..." search starts here: #include <...> search starts here: /usr/bin/../lib/clang/19/include/cuda_wrappers /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14 /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/aarch64-redhat-linux /usr/bin/../lib/gcc/aarch64-redhat-linux/14/../../../../include/c++/14/backward /usr/bin/../lib/clang/19/include /usr/local/include /usr/include End of search list.

pcc · May 16, 2025

clang/lib/Sema/SemaDeclCXX.cpp

+  // FIXME: PFP should not affect trivial relocatability, instead it should
+  // affect the implementation of std::trivially_relocate. See:
+  // https://discourse.llvm.org/t/rfc-structure-protection-a-family-of-uaf-mitigation-techniques/85555/16?u=pcc
+  if (!SemaRef.Context.arePFPFieldsTriviallyRelocatable(D) &&


Sounds good, I think we could also call it from isMemcpyableField.

pcc · May 16, 2025

clang/lib/CodeGen/CGExprCXX.cpp

@@ -260,6 +260,16 @@ RValue CodeGenFunction::EmitCXXMemberOrOperatorMemberCallExpr(
      (MD->isCopyAssignmentOperator() || MD->isMoveAssignmentOperator()) &&
      !MD->getParent()->mayInsertExtraPadding();

+  // Assignment operators for objects with non-trivially-relocatable PFP fields


The purpose of this code is to avoid a call to CodeGenFunction::EmitAggregateCopy, so I think the fix I mentioned above should let me remove this part as well. I think there might also be a problem with the PAuth ABI that would be fixed by such a change.

pcc · May 16, 2025

clang/lib/CodeGen/CGClass.cpp

@@ -2976,7 +3006,15 @@ void CodeGenFunction::EmitForwardingCallToLambda(
  QualType resultType = FPT->getReturnType();
  ReturnValueSlot returnSlot;
  if (!resultType->isVoidType() &&
-      calleeFnInfo->getReturnInfo().getKind() == ABIArgInfo::Indirect &&
+      (calleeFnInfo->getReturnInfo().getKind() == ABIArgInfo::Indirect ||
+       // With pointer field protection, we need to set up the return slot when


Just looking through the code again, I think I might be able to eliminate the change here by extracting the implementation of __builtin_trivially_relocate and having CodeGenFunction::EmitAggregateCopy use it.

ojhunt · May 16, 2025

@pcc and I have been discussing this.

The perf issues I was concerned about were predicated on access to a pointer loaded from a field continuing to be checked after the original field load, this is not the case (and in hindsight doing so would imply passing the pointer as a parameter to a function would maintain the tag and require the target knowing about it).
I have a better understanding of why the offset behavior is required - it is easier for me to reason about when considering reference parameters as those are much more ubiquitous and invisible to the user, so requiring annotations or breaking up such calls isn't really an option. I still don't like the current approach but I have yet to think of anything that would be better :-/
For the POD ABI fixups I realized what I really want is just to have the codegen separated out - if it turns out there are cases where the codegen size is excessive I'm sure we could add a heuristic for outlining it if needed, but I think the real issue for me now was actually just that the fixup codegen was inline.

…ction Created using spr 1.3.6-beta.1

Pointer field protection is a use-after-free vulnerability mitigation that works by changing how data structures' pointer fields are stored in memory. For more information, see the RFC: https://discourse.llvm.org/t/rfc-structure-protection-a-family-of-uaf-mitigation-techniques/85555 TODO: - Fix test failure. - Add more tests. - Add documentation. Pull Request: llvm#133538

[𝘀𝗽𝗿] initial version

3bdbe71

Created using spr 1.3.6-beta.1

pcc requested review from nikic and a team as code owners March 28, 2025 22:34

pcc added this to Structure Protection Apr 3, 2025

mordante requested changes Apr 9, 2025

View reviewed changes

pcc commented Apr 9, 2025

View reviewed changes

Rebase

40b2f77

Created using spr 1.3.6-beta.1

cor3ntin requested review from cor3ntin and ojhunt May 14, 2025 22:29

ojhunt requested changes May 15, 2025

View reviewed changes

ojhunt reviewed May 15, 2025

View reviewed changes

pcc commented May 15, 2025

View reviewed changes

pcc commented May 16, 2025

View reviewed changes

Rebase, extract trivially relocatable inhibition condition into a fun…

0d82cb3

…ction Created using spr 1.3.6-beta.1

Search code, repositories, users, issues, pull requests...

Add pointer field protection feature. #133538

Are you sure you want to change the base?

Add pointer field protection feature. #133538

Uh oh!

Conversation

pcc commented Mar 28, 2025

Uh oh!

llvmbot commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 28, 2025

Uh oh!

llvmbot commented Mar 28, 2025

Uh oh!

github-actions bot commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mordante left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mordante commented Apr 9, 2025

Uh oh!

ojhunt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Mar 28, 2025 •

edited

Loading

github-actions bot commented Mar 28, 2025 •

edited

Loading

github-actions bot commented Mar 28, 2025 •

edited

Loading