Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

@amametjanov
Copy link
Contributor

@amametjanov amametjanov commented May 29, 2020

This merge addresses several sources of pool and field deallocation errors
that arise when MPAS is built with the XL 16 compilers.

Additional discussion is at #367 (review).

Fixes #481

[bit-for-bit]

@mgduda mgduda self-requested a review June 2, 2020 21:19
@mark-petersen mark-petersen self-requested a review June 8, 2020 15:03
@mark-petersen mark-petersen self-assigned this Jun 8, 2020
Copy link
Contributor

@mark-petersen mark-petersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amametjanov thanks for your work on this. I merged this in locally, and it passes the ocean regression suite for both intel and gnu, opt and debug. I did not test the longer simulations that originally had a memory leak, but I would assume a nulify removes the memory the same way a deallocate does.

@mark-petersen
Copy link
Contributor

@mgduda do you have any comments on this one?

@mgduda
Copy link
Contributor

mgduda commented Jun 12, 2020

Let me take another look in the next couple of days. I had been working independently to address Issue #481 back in April. It looks like what I had done is a bit different from the approach taken in this PR, and I want to convince myself that the approaches are equivalent before approving (or to determine why what I was doing was wrong).

@mark-petersen
Copy link
Contributor

@mgduda if we could merge this in the next few days, that would be helpful.

Copy link
Contributor

@mgduda mgduda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes in this PR introduce a new memory leak. To demonstrate this, define a new field in a core's Registry.xml file:

<var name="foo" type="real" dimensions="vertexDegree maxEdges nEdges"/>

Then use code like the following to allocate a duplicate of the foo field before calling mpas_deallocate_field:

      type (field3dReal), pointer :: foo, foo_copy

      call mpas_pool_get_field(mesh, 'foo', foo)
      call mpas_duplicate_field(foo, foo_copy)
      call mpas_deallocate_field(foo_copy)

@mgduda
Copy link
Contributor

mgduda commented Jun 16, 2020

@amametjanov @mark-petersen I have some changes in progress that should fix the issue with the XL compilers while avoiding the memory leak I mentioned in my review. The next few days are somewhat busy, but I could probably finish the changes by 18 June. I'm not sure whether it would be easier to push additional changes to this PR, or to just create a new PR. I'm open to suggestions.

@mark-petersen
Copy link
Contributor

@mgduda thanks for looking into this. We are hoping to merge the develop branch to all our cores on Monday. If you have time to update this before then, we will stick with this PR. Otherwise, we can merge this one and then make a correction on a new PR.

@mgduda
Copy link
Contributor

mgduda commented Jun 17, 2020

@amametjanov Do you have any objection to my pushing of additional commits to your framework/nullify-field-pointers branch?

@amametjanov
Copy link
Contributor Author

Not at all, please go ahead.

@mgduda
Copy link
Contributor

mgduda commented Jun 20, 2020

@amametjanov @mark-petersen I've taken a slightly different approach to fixing the deallocation errors when using the XL compilers. I need to clean up the commit messages and do more testing, but if either of you have a few minutes, would you be able to give the framework/fix_xl_deallocate_errors branch from my fork a try on Summit? If that branch works for you, I can try to formulate it as an addition to the changes in this PR. Since we've spent so much time on this issue, I do think having a detailed description of the changes in the commit history will be helpful to anyone looking through the code in future.

@mgduda
Copy link
Contributor

mgduda commented Jun 23, 2020

@mark-petersen @amametjanov Since you mentioned merging the develop branch to all core-specific branches on Monday (6/22, I assume), I just wanted to check whether you had been able to test the framework/fix_xl_deallocate_errors branch in my fork, which I think should fix the issues seen on Summit without introducing any memory leaks.

@amametjanov
Copy link
Contributor Author

I ran SMS_P12x2.ne4_oQU240.A_WCYCL1850 on Summit+IBM compiler and it built and ran without errors with the new 6 commits from the branch (on top of e3sm-master).

@mgduda please go ahead with the updates. Thanks.

mgduda added 6 commits June 23, 2020 17:19
Prior to this commit, MPAS models would fail with the error message

   "mpas_field_routines.F", line 1981: 1525-109 Error encountered while attempting to deallocate a data object.  The program will stop.

when using the XL compilers. The issue appears to be that the XL compiler does
not allow for deallocation of dummy arguments that have the 'target' attribute.

To address these runtime deallocation errors, the mpas_deallocate_field_target
routine never deallocates the head pointer for the field; accordingly,
the keep_head_ptr optional argument has been removed.

** Note ** : The changes in this commit introduce memory leaks, since
  the mpas_deallocate_field routine calls mpas_deallocate_field_target, and only
  nullifies the head pointer of the field after these calls.
This commit removes the unused function write_field_pointers from the registry
code-generation tool.
This commit removes the unused function generate_struct_links from the registry
code-generation tool.
This commit addresses an error in deallocating fields in
the mpas_pool_destroy_pool routine for fields with only one time level:

   "mpas_pool_routines.F", line 259: 1525-109 Error encountered while attempting to deallocate a data object.  The program will stop.

Fields with just one time level are allocated as an array of fields with size
one, and the first (and only) element of this array is added to a pool. During
deallocation of the fields, the above error is a result of the attempt to
deallocate an element of an array (rather than the entire array).

To address this problem, the registry-generated code to build pools allocates
fields with a single time level using a scalar field pointer, rather than
as an array of fields of size one.
Previously, the head pointer of a field was just nullified in
mpas_deallocate_field after the call to mpas_deallocate_field_target.
Now, the head pointers for all blocks of a field are deallocated.

Also, the mpas_pool_destroy_pool routine can now call mpas_deallocate_field
rather than calling mpas_deallocate_field_target followed by a deallocation
of the field head pointer.
This commit first removes the source of a potential deallocation error:

   "mpas_pool_routines.F", line 358: 1525-109 Error encountered while attempting to deallocate a data object.  The program will stop.

when using the XL compilers; this error is a result of the fact that
the mpas_pool_add_subpool routine declares the subpool dummy argument with
the target attribute, and deallocation can only happen for variables that
are allocatable or have the pointer attribute.

With the change to the mpas_pool_add_subpool routine, all "stat" optional
arguments can be removed from calls to deallocate in the mpas_pool_destroy_pool
routine.
@mgduda mgduda force-pushed the framework/nullify-field-pointers branch from c02d215 to a9d2aea Compare June 23, 2020 23:41
@mgduda
Copy link
Contributor

mgduda commented Jun 23, 2020

@amametjanov It got a bit messy trying to rebase my changes onto c02d215, and doing so also invalidated some of the line numbers that I referenced in error messages in my commit messages, so the easiest approach seem to be to simply push my branch to framework/nullify-field-pointers on your fork. I do appreciate that this wasn't a very polite thing to do, so if you have any objections at all (and please don't hesitate to raise those objections!), I can restore commit c02d215 and try again with a rebase. In case @mark-petersen is eager to merge this PR, though, this seemed like the less risky approach in the end.

@amametjanov
Copy link
Contributor Author

That's okay. I keep running into this error with multiple repo clones: sooner=better.
I think github stores the Jan 28 commit, in case it needs to be looked up.

@mgduda mgduda self-requested a review June 24, 2020 00:56
@mark-petersen
Copy link
Contributor

@mgduda thanks for your timely assistance. I tested these last commits with our tests that exercise this part of the code, and I get bit-for-bit match with the previous.

@mgduda if you could just approve the PR now, I'll go ahead and merge.

@mark-petersen
Copy link
Contributor

Never mind - I refreshed my page and it shows approved now. Thanks!

@mark-petersen mark-petersen merged commit 6525f33 into MPAS-Dev:develop Jun 25, 2020
@amametjanov amametjanov deleted the framework/nullify-field-pointers branch June 26, 2020 14:24
jonbob added a commit to E3SM-Project/E3SM that referenced this pull request Jul 27, 2020
#3737)

Update MPAS framework

This PR brings in a new mpas-source submodule with updates to the mpas framework
itself, plus changes to the cores supporting the framework update. Some changes
are made to the atmosphere core, even though it is not used by E3SM, in order to
maintain consistency with the framework. This update includes the following
individual branches and PRs, many of which are additions to the makefiles for
summit, or optional libraries:
* az/azamat/mpas-cmake/mpas-tool-dir  (MPAS-Dev/MPAS-Model#629)
* init_atmosphere/kd_tree_ties  (MPAS-Dev/MPAS-Model#630)
* mark-petersen/framework/add_FillValue  (MPAS-Dev/MPAS-Model#616)
* mark-petersen/framework/add_lapack_option  (MPAS-Dev/MPAS-Model#613)
* amametjanov/framework/nullify-field-pointers  (MPAS-Dev/MPAS-Model#578)
* framework/makefile_e3sm  (MPAS-Dev/MPAS-Model#603)
* xylar/framework/make_shell_bash  (MPAS-Dev/MPAS-Model#594)
* registry/missing_value  (MPAS-Dev/MPAS-Model#562)
* pwolfram/updates_make_intel_stack  (MPAS-Dev/MPAS-Model#592)
* azamat/framework/pgi-cpr-omp-workaround  (MPAS-Dev/MPAS-Model#449)
* az/framework/e3sm-cmake-qnosmp-typo  (MPAS-Dev/MPAS-Model#579)
* xylar/compass/add_prerequisite_tag
* atmosphere/fix_timekeeping_imports (MPAS-Dev/MPAS-Model#582)
* xylar/fix_docs_ci  (MPAS-Dev/MPAS-Model#575)
* xylar/add_compass_docs  (MPAS-Dev/MPAS-Model#472)
* philipwjones/framework/summitmake  (MPAS-Dev/MPAS-Model#536)
* atmosphere/atm_core_cleanup  (MPAS-Dev/MPAS-Model#548)
* init_atmosphere/parse_geoindex  (MPAS-Dev/MPAS-Model#459)
* xylar/compass/add_copy_file  (MPAS-Dev/MPAS-Model#467)
* init_atmosphere/kd_tree  (MPAS-Dev/MPAS-Model#438)
* init_atmosphere/mpas_stack  (MPAS-Dev/MPAS-Model#426)
* operators/add_mpas_in_cell  (MPAS-Dev/MPAS-Model#400)

Fixes #3396
Fixes #3236

[BFB]
jonbob added a commit to E3SM-Project/E3SM that referenced this pull request Jul 29, 2020
Update MPAS framework

This PR brings in a new mpas-source submodule with updates to the mpas framework
itself, plus changes to the cores supporting the framework update. Some changes
are made to the atmosphere core, even though it is not used by E3SM, in order to
maintain consistency with the framework. This update includes the following
individual branches and PRs, many of which are additions to the makefiles for
summit, or optional libraries:
* az/azamat/mpas-cmake/mpas-tool-dir  (MPAS-Dev/MPAS-Model#629)
* init_atmosphere/kd_tree_ties  (MPAS-Dev/MPAS-Model#630)
* mark-petersen/framework/add_FillValue  (MPAS-Dev/MPAS-Model#616)
* mark-petersen/framework/add_lapack_option  (MPAS-Dev/MPAS-Model#613)
* amametjanov/framework/nullify-field-pointers  (MPAS-Dev/MPAS-Model#578)
* framework/makefile_e3sm  (MPAS-Dev/MPAS-Model#603)
* xylar/framework/make_shell_bash  (MPAS-Dev/MPAS-Model#594)
* registry/missing_value  (MPAS-Dev/MPAS-Model#562)
* pwolfram/updates_make_intel_stack  (MPAS-Dev/MPAS-Model#592)
* azamat/framework/pgi-cpr-omp-workaround  (MPAS-Dev/MPAS-Model#449)
* az/framework/e3sm-cmake-qnosmp-typo  (MPAS-Dev/MPAS-Model#579)
* xylar/compass/add_prerequisite_tag
* atmosphere/fix_timekeeping_imports (MPAS-Dev/MPAS-Model#582)
* xylar/fix_docs_ci  (MPAS-Dev/MPAS-Model#575)
* xylar/add_compass_docs  (MPAS-Dev/MPAS-Model#472)
* philipwjones/framework/summitmake  (MPAS-Dev/MPAS-Model#536)
* atmosphere/atm_core_cleanup  (MPAS-Dev/MPAS-Model#548)
* init_atmosphere/parse_geoindex  (MPAS-Dev/MPAS-Model#459)
* xylar/compass/add_copy_file  (MPAS-Dev/MPAS-Model#467)
* init_atmosphere/kd_tree  (MPAS-Dev/MPAS-Model#438)
* init_atmosphere/mpas_stack  (MPAS-Dev/MPAS-Model#426)
* operators/add_mpas_in_cell  (MPAS-Dev/MPAS-Model#400)

Fixes #3396
Fixes #3236

[BFB]
mark-petersen added a commit to gcapodag/MPAS-Model that referenced this pull request Jan 13, 2021
I found exactly the same problem as we are having here:
MPAS-Dev#367 (review).
And I said it was later fixed here:
MPAS-Dev#367 (comment).

The problem was introduced since then, in this PR:
MPAS-Dev#578
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.