-
Notifications
You must be signed in to change notification settings - Fork 105
MPI Refactor #831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
MPI Refactor #831
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #831 +/- ##
==========================================
+ Coverage 42.95% 45.62% +2.67%
==========================================
Files 69 68 -1
Lines 19504 18656 -848
Branches 2366 2250 -116
==========================================
+ Hits 8377 8511 +134
+ Misses 9704 8785 -919
+ Partials 1423 1360 -63 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
I'm going to do a code review, but I think I only have two lingering questions for verification:
Mostly asking because the above two things aren't covered in CI (though probably should be). |
The scaling plots I included show RDMA MPI on Frontier, so that's covered. I'm not sure which NVIDIA machines we have that support RDMA MPI. I'll try Phoenix since I know that's what Max has been using it on. I'll have to go back and look at my changes to serial output files to remember what they even were, but I can check on that as well. |
Description
This PR refactors a lot of MPI code to reduce duplicated code and shorten the MPI-related code in the codebase. Significant testing is needed to verify the changes' correctness, but I'm opening this as a draft now so that people know what's being changed and can start reviews/make suggestions early.
Type of change
Please delete options that are not relevant.
Scope
How Has This Been Tested?
Black lines in all videos show where processor boundaries and ghost cell regions are.
examples/2D_advection
case file. It is ran on 1 and 4 ranks with A100 GPUs. The video shows the advection of volume fraction through MPI boundaries.test.mp4
examples/2D_advection
case file. It is ran on 1 and 8 ranks with A100 GPUs. The video shows the advection of volume fraction contour through MPI boundaries. This video shows the advecting sphere for the case with 1 and 8 ranks. The half of the sphere from the one rank simulation is shown in red, and the half from the eight rank simulation is in blue.test.mp4
examples/3D_recovering_sphere
case (remove the use of symmetry and have meaningful halo exchange of the color function and move square off center). It is ran with 1 and 8 ranks on A100 GPUs. This video shows slices of the color function in all three dimensions with 1 and 4 ranks.test.mp4
examples/3D_recovering_sphere
case in 2D with the square off center. It is ran on 1 and 4 ranks with A100 GPUs. The video shows the volume fraction and color function with 1 and 4 ranks.test.mp4
/examples/1D_qbmm
. A high-pressure region is placed off-center in the middle of the bubble cloud to break symmetry. The video shows nV003 along three slices across the domain for the one and eight rank case on A100 GPUs.test.mp4
/examples/1D_qbmm
. Two high-pressure regions are added to create blast waves and break symmetry. The video shows pressure and nV001 for the one and four rank case on A100 GPUs.test.mp4
/examples/3D_lagrange_bubblescreen
case. It is ran on 1 and 8 ranks on A100 GPUs. The video shows the void fraction in the bubble cloud through three slices. The left column is one rank, and the right is eight ranks.test.mp4
Checklist
./mfc.sh format
before committing my codeIf your code changes any code source files (anything in
src/simulation
)To make sure the code is performing as expected on GPU devices, I have:
Checked that the code compiles using NVHPC compilers
Checked that the code compiles using CRAY compilers
Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
Ran a Nsight Systems profile using
./mfc.sh run XXXX --gpu -t simulation --nsys
, and have attached the output file (.nsys-rep
) and plain text results to this PRMPIRefactor.txt
https://drive.google.com/file/d/1pmM3s8q2UbqNmLsumdCs12u-6p3Tm_8C/view?usp=sharing
Ran an Omniperf profile using
./mfc.sh run XXXX --gpu -t simulation --omniperf
, and have attached the output file and plain text results to this PR. The trace results are gathered for a 200^3 instance ofexamples/3D_performance_test
.master.csv
pr.csv
Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature. Strong scaling is performed using a 300^3 instance of

examples/3D_performance_test
. Weak scaling is done with a 300^3 instance ofexamples/3D_performance_test
on each processor.