Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Equation Indices and BC Types - Issue (#828) #860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: master
Choose a base branch
Loading
from

Conversation

mohdsaid497566
Copy link
Collaborator

Description

making derived types and addressing issue #828

Fixes #(issue) [optional]

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

Scope

  • This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

  • Test A
  • Test B

Test Configuration:

  • What computers and compilers did you use to test this:

Checklist

  • I have added comments for the new code
  • I added Doxygen docstrings to the new code
  • I have made corresponding changes to the documentation (docs/)
  • I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
  • I have added example cases in examples/ that demonstrate my new feature performing as expected.
    They run to completion and demonstrate "interesting physics"
  • I ran ./mfc.sh format before committing my code
  • New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
  • This PR does not introduce any repeated code (it follows the DRY principle)
  • I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in src/simulation)

To make sure the code is performing as expected on GPU devices, I have:

  • Checked that the code compiles using NVHPC compilers
  • Checked that the code compiles using CRAY compilers
  • Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
  • Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
  • Enclosed the new feature via nvtx ranges so that they can be identified in profiles
  • Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
  • Ran a Rocprof Systems profile using ./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.
  • Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

@mohdsaid497566 mohdsaid497566 requested review from a team and sbryngelson as code owners June 5, 2025 23:08
@@ -16,8 +16,8 @@
"python.defaultInterpreterPath": "${workspaceFolder}/build/venv/bin/python3",

"files.associations": {
"*.f90": "FortranFreeForm",
"*.fpp": "FortranFreeForm",
"*.f90": "fortran-modern",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this help anything?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really

@@ -22,7 +22,7 @@ module m_boundary_common
type(scalar_field), dimension(:, :), allocatable :: bc_buffers
!$acc declare create(bc_buffers)

real(wp) :: bcxb, bcxe, bcyb, bcye, bczb, bcze
type(boundary_flags) :: bc_flag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't flags but rather bounds. bc_x_b = boundarycondition_xdirection_beginning.

@@ -345,6 +345,11 @@ module m_derived_types
type(vec3_dt), allocatable, dimension(:) :: var
end type mpi_io_airfoil_ib_var

!> Derived type for boundary flags
type boundary_flags
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above comment about the word 'flags'

call s_convert_species_to_mixture_variables_acc(rho, gamma, pi_inf, qv_K, alpha_IP, &
alpha_rho_IP, Re_K, j, k, l, G_K, Gs)
call s_convert_species_to_mixture_variables_acc(rho, gamma, pi_inf, qv_K, &
gp%ip%alpha, gp%ip%alpha_rho, Re_K, j, k, l, G_K, Gs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one advantage of this strategy of using more derived types is that we can just pass in gp%ip (for ibm) or state%L (or something like this) for riemann solvers. basically gathering all the data that could exist at a given point and lumping it into a derived type, and then the argument list becomes very short. for example, you just pass in

call s_routine(state)

where

subroutine s_routine(mystate)
	type(state) :: mystate
	
	[...]
	
	mystate%rho = [......]
end subroutine

@@ -743,22 +757,15 @@ contains

!> Function that uses the interpolation coefficients and the current state
!! at the cell centers in order to estimate the state at the image point
pure subroutine s_interpolate_image_point(q_prim_vf, gp, alpha_rho_IP, alpha_IP, pres_IP, vel_IP, c_IP, r_IP, v_IP, pb_IP, mv_IP, nmom_IP, pb, mv, presb_IP, massv_IP)
pure subroutine s_interpolate_image_point(q_prim_vf, gp, pb, mv)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an good example of what i was talking about above... for removing so many arguments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup yup I see that


if (allocated(ghost_points)) then
do i = 1, size(ghost_points)
if (allocated(ghost_points(i)%ip%alpha_rho)) deallocate (ghost_points(i)%ip%alpha_rho)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI from perplexity:

Perfect! Given that your point_data type uses allocatable components (not pointers), your manual deallocation loop is completely unnecessary.

The Clean Solution

Since all the components you're manually deallocating (alpha_rho, alpha, r, v, pb, mv, nmom, presb, massv) are declared as allocatable, Fortran will automatically deallocate them when the parent object is deallocated.

You can replace your entire verbose loop with just:

if (allocated(ghost_points)) deallocate(ghost_points)

That's it! When ghost_points is deallocated, Fortran automatically:

  1. Deallocates all allocatable components within each ghost_points(i)%ip
  2. Deallocates the ghost_points array itself

Why This Works

The Fortran standard guarantees that when a derived type object with allocatable components is deallocated (or goes out of scope), all of its allocatable components are automatically deallocated first. This is called automatic deallocation and is one of the key advantages of using allocatable over pointer.

Additional Benefits

This approach is not only cleaner but also:

  • Safer: No risk of forgetting to deallocate a component
  • More maintainable: Adding new allocatable components to point_data requires no changes to deallocation code
  • Better performance: No unnecessary allocation status checks in loops

Your original manual approach was doing work that Fortran handles automatically for allocatable components. The verbose loop is only necessary when dealing with pointer components, which require explicit deallocation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat will remove the loop

@@ -95,6 +95,35 @@ contains
@:ALLOCATE(ghost_points(1:num_gps))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure how this ever worked. i would ask on the MFC Slack and tag @abbotts . this is an array of derived types. it seems like you would need to use something like this macro for AMD GPUs:

#:def ACC_SETUP_SFs(*args)
#ifdef _CRAYFTN
block
@:LOG({'@:ACC_SETUP_SFs(${', '.join(args)}$)'})
#:for arg in args
!$acc enter data copyin(${arg}$)
if (associated(${arg}$%sf)) then
!$acc enter data create(${arg}$%sf)
end if
#:endfor
end block
#endif
#:enddef

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason is perhaps that sf is another derived type. when you have a derived type that has another allocatable derived type within it, you need to use a macro like this (except instead of sf it would be something else)

end if

if (qbmm) then
allocate (ghost_points(i)%ip%nmom(nb*nmom))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use the @:ALLOCATE macro in almost all cases i think, since it declares the variable on the device.

Copy link
Member

@sbryngelson sbryngelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good PR but isn't quite right. the issues are subtle though, and are somewhat MFC specific. ask about the things i brought up on the slack channel (or ask @wilfonba @abbotts @anandrdbz etc.)

Copy link

codecov bot commented Jun 6, 2025

Codecov Report

Attention: Patch coverage is 25.00000% with 72 lines in your changes missing coverage. Please review.

Project coverage is 43.39%. Comparing base (32c0347) to head (e689a6b).

Files with missing lines Patch % Lines
src/simulation/m_ibm.fpp 5.79% 65 Missing ⚠️
src/common/m_boundary_common.fpp 33.33% 6 Missing ⚠️
src/simulation/m_cbc.fpp 92.85% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #860      +/-   ##
==========================================
- Coverage   43.42%   43.39%   -0.04%     
==========================================
  Files          68       68              
  Lines       19630    19655      +25     
  Branches     2353     2355       +2     
==========================================
+ Hits         8525     8529       +4     
- Misses       9676     9697      +21     
  Partials     1429     1429              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sbryngelson
Copy link
Member

This CCE failure suggests something is wrong with allocation, deallocation, or memory access. It happens on Cray CPU cases.

+ srun --nodes 1 --ntasks-per-node 1 --cpus-per-task 7 /lustre/orion/cfd154/scratch/sbryngelson/runner/actions-runner/_work/MFC/MFC/build/install/877bac7bd7/bin/simulation
corrupted size vs. prev_size in fastbins
srun: error: frontier10353: task 0: Aborted (core dumped)
srun: Terminating StepId=3499970.414

and chatgpt says

Thanks — since your simulation is written in Fortran, the corrupted size vs. prev_size in fastbins error still points to heap corruption, but it's likely caused by one of the following Fortran-specific issues:


🔍 Common Causes in Fortran

  1. Out-of-bounds array access

    • Fortran does not check array bounds unless explicitly compiled with bounds checking.

    • For example:

      real, allocatable :: arr(:)
      allocate(arr(10))
      arr(11) = 1.0  ! <- invalid access, no error unless bounds checking is on
  2. Use of uninitialized pointers or allocatables

    • Deallocating something not allocated, or reusing pointers improperly, can corrupt memory.

      real, pointer :: p(:)
      deallocate(p)  ! <- invalid if never allocated
  3. Double deallocate()

    • Fortran does not always protect against this unless you're careful.
  4. Mixing C interop with Fortran allocatables

    • If you’re using ISO_C_BINDING and passing Fortran arrays to C (e.g., through OpenACC, CUDA Fortran, or MPI buffers), and there's a mismatch in memory handling, heap corruption can result.

🛠️ Recommended Actions

✅ Compile with bounds and memory checks:

Add flags to enable runtime checking:

  • For gfortran:

    -g -O0 -fcheck=all -fbacktrace -Wall
  • For ifort (Intel Fortran):

    -g -check all -traceback -warn all

✅ Use valgrind with MPI wrapper or standalone:

valgrind ./simulation

Or with MPI:

mpirun -n 1 valgrind ./simulation

✅ Reproduce with small input

Try to isolate the problem with a minimal input case. Heap corruption is often non-deterministic, so smaller runs can help localize it.

✅ Consider AddressSanitizer (if using gfortran):

Modern gfortran supports it:

-fsanitize=address -fsanitize=undefined -g

⚠️ This slows down execution but will immediately point out invalid memory use.

@mohdsaid497566
Copy link
Collaborator Author

So I made the following derived type, but the build always corrupts mainly in the post-process because sys_size clashes with all processes while being in the form eqn_idx%sys_size for some reason. So for now, I will undo all changes pertaining to sys_size. Other than that, all ought to be sound and functional I hope so.

    !> @name Annotations of the structure of the state and flux vectors in terms of the
    !! size and the configuration of the system of equations to which they belong
    !> @{
    type system_of_equations
        integer :: sys_size                            !< Number of unknowns in system of eqns.
        type(int_bounds_info) :: cont                  !< Indexes of first & last continuity eqns.
        type(int_bounds_info) :: mom                   !< Indexes of first & last momentum eqns.
        integer :: E                                   !< Index of energy equation
        integer :: n                                   !< Index of number density
        type(int_bounds_info) :: adv                   !< Indexes of first & last advection eqns.
        type(int_bounds_info) :: internalEnergies      !< Indexes of first & last internal energy eqns.
        type(bub_bounds_info) :: bub                   !< Indexes of first & last bubble variable eqns.
        integer :: alf                                 !< Index of void fraction
        integer :: gamma                               !< Index of specific heat ratio func. eqn.
        integer :: pi_inf                              !< Index of liquid stiffness func. eqn.
        type(int_bounds_info) :: B                     !< Indexes of first and last magnetic field eqns.
        type(int_bounds_info) :: stress                !< Indexes of first and last shear stress eqns.
        type(int_bounds_info) :: xi                    !< Indexes of first and last reference map eqns.
        integer :: b_size                              !< Number of elements in the symmetric b tensor, plus one
        integer :: tensor_size                         !< Number of elements in the full tensor plus one
        type(int_bounds_info) :: species               !< Indexes of first & last concentration eqns.
        integer :: c                                   !< Index of color function
        integer :: damage                              !< Index of damage state variable (D) for continuum damage model
        integer, dimension(3) :: dir
        real(wp), dimension(3) :: dir_flg
        integer, dimension(3) :: dir_tau !!used for hypoelasticity=true
        integer, dimension(2) :: Re_size
        integer, allocatable, dimension(:, :) :: Re
    end type system_of_equations

if (allocated(ghost_points(i)%ip%massv)) deallocate (ghost_points(i)%ip%massv)
end do
end if

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abbotts

Do you mind checking out this line?
I suspect it to be causing all IBM cases to fail.

real(wp), dimension(3) :: dir_flg
integer, dimension(3) :: dir_tau !!used for hypoelasticity=true
integer, dimension(2) :: Re_size
integer, allocatable, dimension(:, :) :: Re
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may need to be careful with this on GPU cases (especially Frontier/Cray). i forget exactly how we deal with cases where we have an allocatable in a derived type but i would look for other examples of the code that do this. also, isn't Re a real and not an integer? and why is its size unknown at compile time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda omitted _idx from all variables listed. It was Re_idx and I just copied the declaration into under system_of_equations type.

@sbryngelson
Copy link
Member

So I made the following derived type, but the build always corrupts mainly in the post-process because sys_size clashes with all processes while being in the form eqn_idx%sys_size for some reason. So for now, I will undo all changes pertaining to sys_size. Other than that, all ought to be sound and functional I hope so.

    !> @name Annotations of the structure of the state and flux vectors in terms of the
    !! size and the configuration of the system of equations to which they belong
    !> @{
    type system_of_equations
        integer :: sys_size                            !< Number of unknowns in system of eqns.
        type(int_bounds_info) :: cont                  !< Indexes of first & last continuity eqns.
        type(int_bounds_info) :: mom                   !< Indexes of first & last momentum eqns.
        integer :: E                                   !< Index of energy equation
        integer :: n                                   !< Index of number density
        type(int_bounds_info) :: adv                   !< Indexes of first & last advection eqns.
        type(int_bounds_info) :: internalEnergies      !< Indexes of first & last internal energy eqns.
        type(bub_bounds_info) :: bub                   !< Indexes of first & last bubble variable eqns.
        integer :: alf                                 !< Index of void fraction
        integer :: gamma                               !< Index of specific heat ratio func. eqn.
        integer :: pi_inf                              !< Index of liquid stiffness func. eqn.
        type(int_bounds_info) :: B                     !< Indexes of first and last magnetic field eqns.
        type(int_bounds_info) :: stress                !< Indexes of first and last shear stress eqns.
        type(int_bounds_info) :: xi                    !< Indexes of first and last reference map eqns.
        integer :: b_size                              !< Number of elements in the symmetric b tensor, plus one
        integer :: tensor_size                         !< Number of elements in the full tensor plus one
        type(int_bounds_info) :: species               !< Indexes of first & last concentration eqns.
        integer :: c                                   !< Index of color function
        integer :: damage                              !< Index of damage state variable (D) for continuum damage model
        integer, dimension(3) :: dir
        real(wp), dimension(3) :: dir_flg
        integer, dimension(3) :: dir_tau !!used for hypoelasticity=true
        integer, dimension(2) :: Re_size
        integer, allocatable, dimension(:, :) :: Re
    end type system_of_equations

sys_size is used in so many places in the code that it's possible you missed something somewhere. also, did you make sure to !$acc update device(eqn_idx%sys_size) after it is set in m_global_parameters? i'm thinking about this line here

!$acc update device(momxb, momxe, advxb, advxe, contxb, contxe, bubxb, bubxe, intxb, intxe, sys_size, buff_size, E_idx, alf_idx, n_idx, adv_n, adap_dt, pi_fac, strxb, strxe, chemxb, chemxe, c_idx)

@sbryngelson
Copy link
Member

nitpick but probably can change the naming pattern of bub_bounds_info to bub_bounds and similiar. these types are used in the code a lot so it makes sense to keep the names of types short-ish.

@sbryngelson
Copy link
Member

FYI i moved pressure relaxation from RHS to its own module, which is why the merge conflict there looks so crazy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.