Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

convert inductor codecache to use getArtifactLogger #153766

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

bdhirsh
Copy link
Contributor

@bdhirsh bdhirsh commented May 16, 2025

I'm not entirely sure of the background for why inductor codecache code uses default python logging instead of the new TORCH_LOGS-based artifact logging, but switching it over to artifact logging makes it easier to use nice testing utils in the next PR.

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Copy link

pytorch-bot bot commented May 16, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153766

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 2 Unrelated Failures

As of commit 88ef48c with merge base 76f182f (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@albanD albanD removed their request for review May 21, 2025 00:31
I'm not entirely sure of the background for why inductor codecache code uses default python logging instead of the new TORCH_LOGS-based artifact logging, but switching it over to artifact logging makes it easier to use nice testing utils in the next PR.




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Starting merge as part of PR stack under #153672

pytorchmergebot pushed a commit that referenced this pull request May 30, 2025
…53672)

Fixes pytorch/torchtitan#1185

It looks like inductor's logic to include inductor configs in the cache key skips configs with a leading underscore by default. This came up in torchtitan - there's an asyncTP pipelining pass in inductor gated by a private config, and by not caching on the config we were attempting to use asyncTP when we shouldn't be.

I'm not sure how worried we should be on the blast radius of this change. On the one hand:

(1) it technically fixes any silent correctness issues in the cache around any other private inductor configs (it looks like there are a few)

(2) there is some risk that there are some "harmless" configs that we are now including in the key, which may increase false negatives. I do see that there is an explicit list for "configs we want to ignore for caching" (`_save_config_ignore`), so my hope is that all harmless configs are already encapsulated there.

Pull Request resolved: #153672
Approved by: https://github.com/oulgen
ghstack dependencies: #153766
pytorchmergebot pushed a commit that referenced this pull request May 30, 2025
@malfet
Copy link
Contributor

malfet commented May 30, 2025

@pytorchbot revert -m "I want to revert this change as I'm 90+% certain it somehow broke testing" -c weird

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@malfet
Copy link
Contributor

malfet commented May 30, 2025

If I'm to install wheel generated for the top of the stack, it fails

% python3 workspace/test/inductor/test_compile_subprocess.py -v -k test_AllenaiLongformerBase_repro_cpu 
Test results will be stored in test-reports/python-unittest/workspace.test.inductor.test_compile_subprocess

Running tests...
----------------------------------------------------------------------
  test_AllenaiLongformerBase_repro_cpu (__main__.CpuTests.test_AllenaiLongformerBase_repro_cpu) ... fatal: not a git repository (or any of the parent directories): .git
cannot compute commit date of ba51f4876d88bccf104da613ee2687201e230b47
frames [('total', 1), ('ok', 1)]
stats [('calls_captured', 11), ('unique_graphs', 1)]
inductor [('pattern_matcher_nodes', 5), ('pattern_matcher_count', 4), ('fxgraph_cache_miss', 1)]
aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
ERROR (12.926s)

======================================================================
ERROR [12.926s]: test_AllenaiLongformerBase_repro_cpu (__main__.CpuTests.test_AllenaiLongformerBase_repro_cpu)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/conda/envs/py_3.13/lib/python3.13/site-packages/torch/testing/_internal/common_utils.py", line 3141, in wrapper
    method(*args, **kwargs)
    ~~~~~~^^^^^^^^^^^^^^^^^
  File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 13474, in new_test
    return value(self)
  File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 928, in wrapper
    return fn(self, *args, **kwargs)
  File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 11345, in test_AllenaiLongformerBase_repro
    ).run(code)
      ~~~^^^^^^
RuntimeError: Expected to find "static_cast<int64_t>(256)" but did not find it
Searched string:
From CHECK-COUNT-2: static_cast<int64_t>(256)


To execute this test, run the following from the base repo dir:
    python test/inductor/test_compile_subprocess.py CpuTests.test_AllenaiLongformerBase_repro_cpu

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

----------------------------------------------------------------------
Ran 1 test in 12.926s

FAILED (errors=1)

but if I re-run the same test with TORCH_LOGS=output_code it passes

$ TORCH_LOGS=output_code python3 workspace/test/inductor/test_compile_subprocess.py -v -k test_AllenaiLongformerBase_repro_cpu 
Test results will be stored in test-reports/python-unittest/workspace.test.inductor.test_compile_subprocess

Running tests...
----------------------------------------------------------------------
  test_AllenaiLongformerBase_repro_cpu (__main__.CpuTests.test_AllenaiLongformerBase_repro_cpu) ... V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] Output code: 
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] # AOT ID: ['0_inference']
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from ctypes import c_void_p, c_long, c_int
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] import torch
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] import math
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] import random
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] import os
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] import tempfile
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from math import inf, nan
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from cmath import nanj
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from torch._inductor.hooks import run_intermediate_hooks
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from torch._inductor.utils import maybe_profile
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from torch._inductor.codegen.memory_planning import _align as align
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from torch import device, empty_strided
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from torch._inductor.async_compile import AsyncCompile
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from torch._inductor.select_algorithm import extern_kernels
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from torch._inductor.codegen.multi_kernel import MultiKernelCall
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] aten = torch.ops.aten
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] inductor_ops = torch.ops.inductor
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] _quantized = torch.ops._quantized
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] assert_size_stride = torch._C._dynamo.guards.assert_size_stride
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] assert_alignment = torch._C._dynamo.guards.assert_alignment
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] empty_strided_cpu = torch._C._dynamo.guards._empty_strided_cpu
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] empty_strided_cuda = torch._C._dynamo.guards._empty_strided_cuda
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] empty_strided_xpu = torch._C._dynamo.guards._empty_strided_xpu
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] reinterpret_tensor = torch._C._dynamo.guards._reinterpret_tensor
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] alloc_from_pool = torch.ops.inductor._alloc_from_pool
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] async_compile = AsyncCompile()
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] empty_strided_p2p = torch._C._distributed_c10d._SymmetricMemory.empty_strided_p2p
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] cpp_fused_copy_full_like_0 = async_compile.cpp_pybinding(['const float*', 'float*', 'float*'], '''
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] #include <torch/csrc/inductor/cpp_prefix.h>
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] extern "C"  void kernel(const float* in_ptr0,
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                        float* out_ptr0,
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                        float* out_ptr1)
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     #pragma omp parallel num_threads(8)
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]         int tid = omp_get_thread_num();
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]         {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             #pragma omp for collapse(2)
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             for(int64_t x0=static_cast<int64_t>(0L); x0<static_cast<int64_t>(4L); x0+=static_cast<int64_t>(1L))
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                 for(int64_t x1=static_cast<int64_t>(0L); x1<static_cast<int64_t>(12L); x1+=static_cast<int64_t>(1L))
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                 {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     #pragma GCC ivdep
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     for(int64_t x2=static_cast<int64_t>(0L); x2<static_cast<int64_t>(1024L); x2+=static_cast<int64_t>(1L))
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                         for(int64_t x3=static_cast<int64_t>(0L); x3<static_cast<int64_t>(513L); x3+=static_cast<int64_t>(16L))
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                         {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                             {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 if(C10_LIKELY(x3 >= static_cast<int64_t>(0) && x3 < static_cast<int64_t>(512L)))
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp0 = x2;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp1 = c10::convert<int64_t>(tmp0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp2 = static_cast<int64_t>(256);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp3 = tmp1 < tmp2;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp4 = [&]
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp5 = x3;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp6 = c10::convert<int64_t>(tmp5);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp7 = at::vec::VectorizedN<int64_t,2>::arange(tmp6, 1);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp8 = static_cast<int64_t>(257);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp9 = at::vec::VectorizedN<int64_t,2>(tmp8);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp10 = at::vec::VecMask<int64_t,2>(tmp7 < tmp9);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp12 = at::vec::VecMask<float,1>::from(tmp3);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp13 = tmp10 & tmp12;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp11 = [&]
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp14 = -std::numeric_limits<float>::infinity();
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             return tmp14;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp17 =
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         [&]
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             if (tmp13.all_zero())
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             else
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp15 = at::vec::Vectorized<float>(tmp11());
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp16 = at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return decltype(tmp15)::blendv(tmp16, tmp15, tmp13.template cast<float,1>());
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ()
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp18 = c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp19 = c10::convert<int64_t>(tmp18);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp20 = static_cast<int64_t>(3);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp21 = tmp19 < tmp20;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp23 = tmp21 & tmp3;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp22 = [&]
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp24 = at::vec::VectorizedN<int64_t,2>(tmp2);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp25 = at::vec::VecMask<int64_t,2>(tmp7 >= tmp24);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp27 = at::vec::VecMask<float,1>::from(tmp23);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp28 = tmp25 & tmp27;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp26 = [&]
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp29 = tmp28.template cast<float,1>().template loadu<float,1>(in_ptr0 + static_cast<int64_t>((-256L) + x3 + 513L*((static_cast<int64_t>(x2) % static_cast<int64_t>(256L))) + 262656L*(c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L))) + 787968L*x1 + 9455616L*x0));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return tmp29;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             ;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp32 =
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             [&]
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 if (tmp28.all_zero())
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                     return at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 else
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                     auto tmp30 = tmp26();
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                     auto tmp31 = at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                     return decltype(tmp30)::blendv(tmp31, tmp30, tmp28.template cast<float,1>());
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             ()
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             ;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp33 = static_cast<float>(0.0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp34 = at::vec::Vectorized<float>(tmp33);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp35 = decltype(tmp32)::blendv(tmp34, tmp32, tmp25.template cast<float,1>());
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             return tmp35;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp36 = tmp21 ? tmp22() : at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp37 = static_cast<float>(0.0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp38 = at::vec::VecMask<float,1>::from(tmp21);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp39 = at::vec::Vectorized<float>(tmp37);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp40 = decltype(tmp36)::blendv(tmp39, tmp36, tmp38.template cast<float,1>());
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp41 = decltype(tmp17)::blendv(tmp40, tmp17, tmp10.template cast<float,1>());
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         return tmp41;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     ;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp42 = tmp3 ? tmp4() : at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp43 = c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp44 = c10::convert<int64_t>(tmp43);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp45 = static_cast<int64_t>(3);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp46 = tmp44 < tmp45;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp47 = [&]
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp48 = x3;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp49 = c10::convert<int64_t>(tmp48);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp50 = at::vec::VectorizedN<int64_t,2>::arange(tmp49, 1);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp51 = at::vec::VectorizedN<int64_t,2>(tmp2);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp52 = at::vec::VecMask<int64_t,2>(tmp50 >= tmp51);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp54 = at::vec::VecMask<float,1>::from(tmp46);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp55 = tmp52 & tmp54;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp53 = [&]
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp56 = tmp55.template cast<float,1>().template loadu<float,1>(in_ptr0 + static_cast<int64_t>((-256L) + x3 + 513L*((static_cast<int64_t>(x2) % static_cast<int64_t>(256L))) + 262656L*(c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L))) + 787968L*x1 + 9455616L*x0));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             return tmp56;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp59 =
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         [&]
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             if (tmp55.all_zero())
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             else
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp57 = tmp53();
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp58 = at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return decltype(tmp57)::blendv(tmp58, tmp57, tmp55.template cast<float,1>());
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ()
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp60 = static_cast<float>(0.0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp61 = at::vec::Vectorized<float>(tmp60);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp62 = decltype(tmp59)::blendv(tmp61, tmp59, tmp52.template cast<float,1>());
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         return tmp62;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     ;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp63 = tmp46 ? tmp47() : at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp64 = static_cast<float>(0.0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp65 = at::vec::VecMask<float,1>::from(tmp46);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp66 = at::vec::Vectorized<float>(tmp64);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp67 = decltype(tmp63)::blendv(tmp66, tmp63, tmp65.template cast<float,1>());
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp68 = at::vec::VecMask<float,1>::from(tmp3);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp69 = decltype(tmp42)::blendv(tmp67, tmp42, tmp68.template cast<float,1>());
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     tmp69.store(out_ptr0 + static_cast<int64_t>(x3 + 513L*x1 + 6156L*x2 + 6303744L*x0));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 if(C10_UNLIKELY(x3 >= static_cast<int64_t>(512L) && x3 < static_cast<int64_t>(513L)))
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     for (int64_t x3_tail = static_cast<int64_t>(512L);x3_tail < static_cast<int64_t>(513L); x3_tail++)
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp0 = x2;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp1 = c10::convert<int64_t>(tmp0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp2 = static_cast<int64_t>(256);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp3 = tmp1 < tmp2;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp4 = [&]
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp5 = x3_tail;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp6 = c10::convert<int64_t>(tmp5);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp7 = static_cast<int64_t>(257);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp8 = tmp6 < tmp7;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp9 = [&]
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp10 = -std::numeric_limits<float>::infinity();
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return tmp10;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             ;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp11 = tmp8 ? tmp9() : static_cast<decltype(tmp9())>(0.0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp12 = c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp13 = c10::convert<int64_t>(tmp12);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp14 = static_cast<int64_t>(3);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp15 = tmp13 < tmp14;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp16 = [&]
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp17 = tmp6 >= tmp2;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp18 = [&]
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                     auto tmp19 = in_ptr0[static_cast<int64_t>((-256L) + x3_tail + 513L*((static_cast<int64_t>(x2) % static_cast<int64_t>(256L))) + 262656L*(c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L))) + 787968L*x1 + 9455616L*x0)];
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                     return tmp19;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 ;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp20 = tmp17 ? tmp18() : static_cast<decltype(tmp18())>(0.0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp21 = static_cast<float>(0.0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp22 = tmp17 ? tmp20 : tmp21;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return tmp22;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             ;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp23 = tmp15 ? tmp16() : static_cast<decltype(tmp16())>(0.0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp24 = static_cast<float>(0.0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp25 = tmp15 ? tmp23 : tmp24;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp26 = tmp8 ? tmp11 : tmp25;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             return tmp26;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp27 = tmp3 ? tmp4() : static_cast<decltype(tmp4())>(0.0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp28 = c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp29 = c10::convert<int64_t>(tmp28);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp30 = static_cast<int64_t>(3);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp31 = tmp29 < tmp30;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp32 = [&]
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp33 = x3_tail;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp34 = c10::convert<int64_t>(tmp33);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp35 = tmp34 >= tmp2;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp36 = [&]
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp37 = in_ptr0[static_cast<int64_t>((-256L) + x3_tail + 513L*((static_cast<int64_t>(x2) % static_cast<int64_t>(256L))) + 262656L*(c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L))) + 787968L*x1 + 9455616L*x0)];
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return tmp37;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             ;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp38 = tmp35 ? tmp36() : static_cast<decltype(tmp36())>(0.0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp39 = static_cast<float>(0.0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp40 = tmp35 ? tmp38 : tmp39;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             return tmp40;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp41 = tmp31 ? tmp32() : static_cast<decltype(tmp32())>(0.0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp42 = static_cast<float>(0.0);
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp43 = tmp31 ? tmp41 : tmp42;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp44 = tmp3 ? tmp27 : tmp43;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         out_ptr0[static_cast<int64_t>(x3_tail + 513L*x1 + 6156L*x2 + 6303744L*x0)] = tmp44;
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                             }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                         }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                 }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]         }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]         {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             #pragma omp for collapse(2)
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             for(int64_t x0=static_cast<int64_t>(0L); x0<static_cast<int64_t>(4L); x0+=static_cast<int64_t>(1L))
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                 for(int64_t x1=static_cast<int64_t>(0L); x1<static_cast<int64_t>(1024L); x1+=static_cast<int64_t>(1L))
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                 {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     #pragma GCC ivdep
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     for(int64_t x2=static_cast<int64_t>(0L); x2<static_cast<int64_t>(12L); x2+=static_cast<int64_t>(1L))
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                         for(int64_t x3=static_cast<int64_t>(0L); x3<static_cast<int64_t>(513L); x3+=static_cast<int64_t>(16L))
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                         {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                             {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 if(C10_LIKELY(x3 >= static_cast<int64_t>(0) && x3 < static_cast<int64_t>(512L)))
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + static_cast<int64_t>(x3 + 513L*x2 + 6156L*x1 + 6303744L*x0), static_cast<int64_t>(16));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     tmp0.store(out_ptr1 + static_cast<int64_t>(x3 + 513L*x1 + 525312L*x2 + 6303744L*x0));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 if(C10_UNLIKELY(x3 >= static_cast<int64_t>(512L) && x3 < static_cast<int64_t>(513L)))
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 {
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + static_cast<int64_t>(x3 + 513L*x2 + 6156L*x1 + 6303744L*x0), static_cast<int64_t>(1L));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     tmp0.store(out_ptr1 + static_cast<int64_t>(x3 + 513L*x1 + 525312L*x2 + 6303744L*x0), static_cast<int64_t>(1L));
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                             }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                         }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                 }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]         }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] }
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] ''')
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] async_compile.wait(globals())
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] del async_compile
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] def call(args):
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     arg0_1, = args
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     args.clear()
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     assert_size_stride(arg0_1, (48, 3, 512, 513), (787968, 262656, 513, 1))
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     buf0 = empty_strided_cpu((4, 1024, 12, 513), (6303744, 6156, 513, 1), torch.float32)
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     buf1 = empty_strided_cpu((4, 1024, 12, 513), (6303744, 513, 525312, 1), torch.float32)
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     cpp_fused_copy_full_like_0(arg0_1, buf0, buf1)
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     del arg0_1
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     return (buf1, )
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] def benchmark_compiled_module(times=10, repeat=10):
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     from torch._dynamo.testing import rand_strided
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     from torch._inductor.utils import print_performance
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     arg0_1 = rand_strided((48, 3, 512, 513), (787968, 262656, 513, 1), device='cpu', dtype=torch.float32)
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     fn = lambda: call([arg0_1])
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     return print_performance(fn, times=times, repeat=repeat)
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] if __name__ == "__main__":
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     from torch._inductor.wrapper_benchmark import compiled_module_main
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     compiled_module_main('None', benchmark_compiled_module)
V0530 22:17:40.617000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:40.618000 6880 site-packages/torch/_inductor/graph.py:2344] [0/0] [__output_code] Output code written to: /tmp/tmpzap5ss2p/pz/cpzx2olbvuz73ds2wvagbpgr567zum7y2ek2oe3bzuojvhqdeiqe.py
I0530 22:17:41.477000 6880 site-packages/torch/_inductor/graph.py:2309] [0/0] [__output_code] Output code written to: /tmp/tmpzap5ss2p/pz/cpzx2olbvuz73ds2wvagbpgr567zum7y2ek2oe3bzuojvhqdeiqe.py
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] Output code: 
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] # AOT ID: ['1_inference']
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from ctypes import c_void_p, c_long, c_int
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] import torch
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] import math
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] import random
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] import os
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] import tempfile
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from math import inf, nan
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from cmath import nanj
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from torch._inductor.hooks import run_intermediate_hooks
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from torch._inductor.utils import maybe_profile
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from torch._inductor.codegen.memory_planning import _align as align
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from torch import device, empty_strided
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from torch._inductor.async_compile import AsyncCompile
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from torch._inductor.select_algorithm import extern_kernels
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] from torch._inductor.codegen.multi_kernel import MultiKernelCall
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] aten = torch.ops.aten
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] inductor_ops = torch.ops.inductor
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] _quantized = torch.ops._quantized
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] assert_size_stride = torch._C._dynamo.guards.assert_size_stride
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] assert_alignment = torch._C._dynamo.guards.assert_alignment
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] empty_strided_cpu = torch._C._dynamo.guards._empty_strided_cpu
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] empty_strided_cuda = torch._C._dynamo.guards._empty_strided_cuda
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] empty_strided_xpu = torch._C._dynamo.guards._empty_strided_xpu
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] reinterpret_tensor = torch._C._dynamo.guards._reinterpret_tensor
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] alloc_from_pool = torch.ops.inductor._alloc_from_pool
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] async_compile = AsyncCompile()
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] empty_strided_p2p = torch._C._distributed_c10d._SymmetricMemory.empty_strided_p2p
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] cpp_fused_copy_full_like_0 = async_compile.cpp_pybinding(['const float*', 'float*', 'float*'], '''
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] #include <torch/csrc/inductor/cpp_prefix.h>
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] extern "C"  void kernel(const float* in_ptr0,
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                        float* out_ptr0,
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                        float* out_ptr1)
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     #pragma omp parallel num_threads(8)
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]         int tid = omp_get_thread_num();
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]         {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             #pragma omp for collapse(2)
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             for(int64_t x0=static_cast<int64_t>(0L); x0<static_cast<int64_t>(4L); x0+=static_cast<int64_t>(1L))
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                 for(int64_t x1=static_cast<int64_t>(0L); x1<static_cast<int64_t>(12L); x1+=static_cast<int64_t>(1L))
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                 {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     #pragma GCC ivdep
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     for(int64_t x2=static_cast<int64_t>(0L); x2<static_cast<int64_t>(1024L); x2+=static_cast<int64_t>(1L))
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                         for(int64_t x3=static_cast<int64_t>(0L); x3<static_cast<int64_t>(513L); x3+=static_cast<int64_t>(16L))
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                         {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                             {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 if(C10_LIKELY(x3 >= static_cast<int64_t>(0) && x3 < static_cast<int64_t>(512L)))
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp0 = x2;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp1 = c10::convert<int64_t>(tmp0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp2 = static_cast<int64_t>(256);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp3 = tmp1 < tmp2;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp4 = [&]
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp5 = x3;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp6 = c10::convert<int64_t>(tmp5);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp7 = at::vec::VectorizedN<int64_t,2>::arange(tmp6, 1);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp8 = static_cast<int64_t>(257);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp9 = at::vec::VectorizedN<int64_t,2>(tmp8);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp10 = at::vec::VecMask<int64_t,2>(tmp7 < tmp9);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp12 = at::vec::VecMask<float,1>::from(tmp3);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp13 = tmp10 & tmp12;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp11 = [&]
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp14 = -std::numeric_limits<float>::infinity();
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             return tmp14;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp17 =
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         [&]
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             if (tmp13.all_zero())
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             else
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp15 = at::vec::Vectorized<float>(tmp11());
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp16 = at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return decltype(tmp15)::blendv(tmp16, tmp15, tmp13.template cast<float,1>());
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ()
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp18 = c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp19 = c10::convert<int64_t>(tmp18);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp20 = static_cast<int64_t>(3);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp21 = tmp19 < tmp20;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp23 = tmp21 & tmp3;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp22 = [&]
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp24 = at::vec::VectorizedN<int64_t,2>(tmp2);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp25 = at::vec::VecMask<int64_t,2>(tmp7 >= tmp24);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp27 = at::vec::VecMask<float,1>::from(tmp23);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp28 = tmp25 & tmp27;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp26 = [&]
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp29 = tmp28.template cast<float,1>().template loadu<float,1>(in_ptr0 + static_cast<int64_t>((-256L) + x3 + 513L*((static_cast<int64_t>(x2) % static_cast<int64_t>(256L))) + 262656L*(c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L))) + 787968L*x1 + 9455616L*x0));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return tmp29;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             ;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp32 =
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             [&]
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 if (tmp28.all_zero())
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                     return at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 else
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                     auto tmp30 = tmp26();
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                     auto tmp31 = at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                     return decltype(tmp30)::blendv(tmp31, tmp30, tmp28.template cast<float,1>());
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             ()
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             ;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp33 = static_cast<float>(0.0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp34 = at::vec::Vectorized<float>(tmp33);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp35 = decltype(tmp32)::blendv(tmp34, tmp32, tmp25.template cast<float,1>());
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             return tmp35;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp36 = tmp21 ? tmp22() : at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp37 = static_cast<float>(0.0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp38 = at::vec::VecMask<float,1>::from(tmp21);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp39 = at::vec::Vectorized<float>(tmp37);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp40 = decltype(tmp36)::blendv(tmp39, tmp36, tmp38.template cast<float,1>());
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp41 = decltype(tmp17)::blendv(tmp40, tmp17, tmp10.template cast<float,1>());
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         return tmp41;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     ;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp42 = tmp3 ? tmp4() : at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp43 = c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp44 = c10::convert<int64_t>(tmp43);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp45 = static_cast<int64_t>(3);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp46 = tmp44 < tmp45;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp47 = [&]
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp48 = x3;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp49 = c10::convert<int64_t>(tmp48);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp50 = at::vec::VectorizedN<int64_t,2>::arange(tmp49, 1);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp51 = at::vec::VectorizedN<int64_t,2>(tmp2);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp52 = at::vec::VecMask<int64_t,2>(tmp50 >= tmp51);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp54 = at::vec::VecMask<float,1>::from(tmp46);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp55 = tmp52 & tmp54;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp53 = [&]
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp56 = tmp55.template cast<float,1>().template loadu<float,1>(in_ptr0 + static_cast<int64_t>((-256L) + x3 + 513L*((static_cast<int64_t>(x2) % static_cast<int64_t>(256L))) + 262656L*(c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L))) + 787968L*x1 + 9455616L*x0));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             return tmp56;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp59 =
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         [&]
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             if (tmp55.all_zero())
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             else
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp57 = tmp53();
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp58 = at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return decltype(tmp57)::blendv(tmp58, tmp57, tmp55.template cast<float,1>());
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ()
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp60 = static_cast<float>(0.0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp61 = at::vec::Vectorized<float>(tmp60);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp62 = decltype(tmp59)::blendv(tmp61, tmp59, tmp52.template cast<float,1>());
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         return tmp62;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     ;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp63 = tmp46 ? tmp47() : at::vec::Vectorized<float>(static_cast<float>(0.0));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp64 = static_cast<float>(0.0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp65 = at::vec::VecMask<float,1>::from(tmp46);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp66 = at::vec::Vectorized<float>(tmp64);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp67 = decltype(tmp63)::blendv(tmp66, tmp63, tmp65.template cast<float,1>());
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp68 = at::vec::VecMask<float,1>::from(tmp3);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp69 = decltype(tmp42)::blendv(tmp67, tmp42, tmp68.template cast<float,1>());
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     tmp69.store(out_ptr0 + static_cast<int64_t>(x3 + 513L*x1 + 6156L*x2 + 6303744L*x0));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 if(C10_UNLIKELY(x3 >= static_cast<int64_t>(512L) && x3 < static_cast<int64_t>(513L)))
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     for (int64_t x3_tail = static_cast<int64_t>(512L);x3_tail < static_cast<int64_t>(513L); x3_tail++)
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp0 = x2;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp1 = c10::convert<int64_t>(tmp0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp2 = static_cast<int64_t>(256);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp3 = tmp1 < tmp2;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp4 = [&]
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp5 = x3_tail;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp6 = c10::convert<int64_t>(tmp5);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp7 = static_cast<int64_t>(257);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp8 = tmp6 < tmp7;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp9 = [&]
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp10 = -std::numeric_limits<float>::infinity();
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return tmp10;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             ;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp11 = tmp8 ? tmp9() : static_cast<decltype(tmp9())>(0.0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp12 = c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp13 = c10::convert<int64_t>(tmp12);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp14 = static_cast<int64_t>(3);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp15 = tmp13 < tmp14;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp16 = [&]
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp17 = tmp6 >= tmp2;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp18 = [&]
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                     auto tmp19 = in_ptr0[static_cast<int64_t>((-256L) + x3_tail + 513L*((static_cast<int64_t>(x2) % static_cast<int64_t>(256L))) + 262656L*(c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L))) + 787968L*x1 + 9455616L*x0)];
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                     return tmp19;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 ;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp20 = tmp17 ? tmp18() : static_cast<decltype(tmp18())>(0.0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp21 = static_cast<float>(0.0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp22 = tmp17 ? tmp20 : tmp21;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return tmp22;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             ;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp23 = tmp15 ? tmp16() : static_cast<decltype(tmp16())>(0.0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp24 = static_cast<float>(0.0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp25 = tmp15 ? tmp23 : tmp24;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp26 = tmp8 ? tmp11 : tmp25;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             return tmp26;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp27 = tmp3 ? tmp4() : static_cast<decltype(tmp4())>(0.0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp28 = c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp29 = c10::convert<int64_t>(tmp28);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp30 = static_cast<int64_t>(3);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp31 = tmp29 < tmp30;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp32 = [&]
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp33 = x3_tail;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp34 = c10::convert<int64_t>(tmp33);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp35 = tmp34 >= tmp2;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp36 = [&]
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 auto tmp37 = in_ptr0[static_cast<int64_t>((-256L) + x3_tail + 513L*((static_cast<int64_t>(x2) % static_cast<int64_t>(256L))) + 262656L*(c10::div_floor_integer(static_cast<int64_t>(x2), static_cast<int64_t>(256L))) + 787968L*x1 + 9455616L*x0)];
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                                 return tmp37;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             ;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp38 = tmp35 ? tmp36() : static_cast<decltype(tmp36())>(0.0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp39 = static_cast<float>(0.0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             auto tmp40 = tmp35 ? tmp38 : tmp39;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                             return tmp40;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         ;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp41 = tmp31 ? tmp32() : static_cast<decltype(tmp32())>(0.0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp42 = static_cast<float>(0.0);
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp43 = tmp31 ? tmp41 : tmp42;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         auto tmp44 = tmp3 ? tmp27 : tmp43;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                         out_ptr0[static_cast<int64_t>(x3_tail + 513L*x1 + 6156L*x2 + 6303744L*x0)] = tmp44;
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                             }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                         }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                 }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]         }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]         {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             #pragma omp for collapse(2)
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             for(int64_t x0=static_cast<int64_t>(0L); x0<static_cast<int64_t>(4L); x0+=static_cast<int64_t>(1L))
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                 for(int64_t x1=static_cast<int64_t>(0L); x1<static_cast<int64_t>(1024L); x1+=static_cast<int64_t>(1L))
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                 {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     #pragma GCC ivdep
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     for(int64_t x2=static_cast<int64_t>(0L); x2<static_cast<int64_t>(12L); x2+=static_cast<int64_t>(1L))
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                         for(int64_t x3=static_cast<int64_t>(0L); x3<static_cast<int64_t>(513L); x3+=static_cast<int64_t>(16L))
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                         {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                             {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 if(C10_LIKELY(x3 >= static_cast<int64_t>(0) && x3 < static_cast<int64_t>(512L)))
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + static_cast<int64_t>(x3 + 513L*x2 + 6156L*x1 + 6303744L*x0), static_cast<int64_t>(16));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     tmp0.store(out_ptr1 + static_cast<int64_t>(x3 + 513L*x1 + 525312L*x2 + 6303744L*x0));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 if(C10_UNLIKELY(x3 >= static_cast<int64_t>(512L) && x3 < static_cast<int64_t>(513L)))
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 {
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + static_cast<int64_t>(x3 + 513L*x2 + 6156L*x1 + 6303744L*x0), static_cast<int64_t>(1L));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                     tmp0.store(out_ptr1 + static_cast<int64_t>(x3 + 513L*x1 + 525312L*x2 + 6303744L*x0), static_cast<int64_t>(1L));
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                                 }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                             }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                         }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                     }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]                 }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]             }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]         }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] }
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] ''')
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] async_compile.wait(globals())
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] del async_compile
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] def call(args):
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     arg0_1, = args
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     args.clear()
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     assert_size_stride(arg0_1, (48, 3, 512, 513), (787968, 262656, 513, 1))
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     buf0 = empty_strided_cpu((4, 1024, 12, 513), (6303744, 6156, 513, 1), torch.float32)
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     buf1 = empty_strided_cpu((4, 1024, 12, 513), (6303744, 513, 525312, 1), torch.float32)
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     cpp_fused_copy_full_like_0(arg0_1, buf0, buf1)
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     del arg0_1
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     return (buf1, )
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] def benchmark_compiled_module(times=10, repeat=10):
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     from torch._dynamo.testing import rand_strided
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     from torch._inductor.utils import print_performance
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     arg0_1 = rand_strided((48, 3, 512, 513), (787968, 262656, 513, 1), device='cpu', dtype=torch.float32)
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     fn = lambda: call([arg0_1])
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     return print_performance(fn, times=times, repeat=repeat)
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] if __name__ == "__main__":
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     from torch._inductor.wrapper_benchmark import compiled_module_main
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code]     compiled_module_main('None', benchmark_compiled_module)
V0530 22:17:42.067000 6880 site-packages/torch/_inductor/graph.py:2333] [0/0] [__output_code] 
V0530 22:17:42.068000 6880 site-packages/torch/_inductor/graph.py:2344] [0/0] [__output_code] Output code written to: /tmp/tmpzap5ss2p/vr/cvrqig4lzef5nmgrwtkw3se7cg6hlzx6e44qh2a767jvdvfqcwkm.py
I0530 22:17:42.071000 6880 site-packages/torch/_inductor/graph.py:2309] [0/0] [__output_code] Output code written to: /tmp/tmpzap5ss2p/vr/cvrqig4lzef5nmgrwtkw3se7cg6hlzx6e44qh2a767jvdvfqcwkm.py
frames [('total', 1), ('ok', 1)]
stats [('calls_captured', 22), ('unique_graphs', 2)]
inductor [('pattern_matcher_nodes', 10), ('pattern_matcher_count', 8), ('fxgraph_cache_miss', 2)]
aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)]
inline_call []
ok (13.617s)

----------------------------------------------------------------------
Ran 1 test in 13.618s

OK

Generating XML reports...
Generated XML report: test-reports/python-unittest/workspace.test.inductor.test_compile_subprocess/TEST-CpuTests-20250530221728.xml

pytorchmergebot added a commit that referenced this pull request May 30, 2025
This reverts commit 5b6fd27.

Reverted #153766 on behalf of https://github.com/malfet due to I want to revert this change as I'm 90+% certain it somehow broke testing ([comment](#153766 (comment)))
@pytorchmergebot
Copy link
Collaborator

@bdhirsh your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels May 30, 2025
nWEIdia pushed a commit to nWEIdia/pytorch that referenced this pull request Jun 2, 2025
I'm not entirely sure of the background for why inductor codecache code uses default python logging instead of the new TORCH_LOGS-based artifact logging, but switching it over to artifact logging makes it easier to use nice testing utils in the next PR.

Pull Request resolved: pytorch#153766
Approved by: https://github.com/oulgen, https://github.com/Skylion007
nWEIdia pushed a commit to nWEIdia/pytorch that referenced this pull request Jun 2, 2025
…torch#153672)

Fixes pytorch/torchtitan#1185

It looks like inductor's logic to include inductor configs in the cache key skips configs with a leading underscore by default. This came up in torchtitan - there's an asyncTP pipelining pass in inductor gated by a private config, and by not caching on the config we were attempting to use asyncTP when we shouldn't be.

I'm not sure how worried we should be on the blast radius of this change. On the one hand:

(1) it technically fixes any silent correctness issues in the cache around any other private inductor configs (it looks like there are a few)

(2) there is some risk that there are some "harmless" configs that we are now including in the key, which may increase false negatives. I do see that there is an explicit list for "configs we want to ignore for caching" (`_save_config_ignore`), so my hope is that all harmless configs are already encapsulated there.

Pull Request resolved: pytorch#153672
Approved by: https://github.com/oulgen
ghstack dependencies: pytorch#153766
nWEIdia pushed a commit to nWEIdia/pytorch that referenced this pull request Jun 2, 2025
…153766)"

This reverts commit 5b6fd27.

Reverted pytorch#153766 on behalf of https://github.com/malfet due to I want to revert this change as I'm 90+% certain it somehow broke testing ([comment](pytorch#153766 (comment)))
qingyi-yan pushed a commit to qingyi-yan/pytorch that referenced this pull request Jun 3, 2025
I'm not entirely sure of the background for why inductor codecache code uses default python logging instead of the new TORCH_LOGS-based artifact logging, but switching it over to artifact logging makes it easier to use nice testing utils in the next PR.

Pull Request resolved: pytorch#153766
Approved by: https://github.com/oulgen, https://github.com/Skylion007
qingyi-yan pushed a commit to qingyi-yan/pytorch that referenced this pull request Jun 3, 2025
…torch#153672)

Fixes pytorch/torchtitan#1185

It looks like inductor's logic to include inductor configs in the cache key skips configs with a leading underscore by default. This came up in torchtitan - there's an asyncTP pipelining pass in inductor gated by a private config, and by not caching on the config we were attempting to use asyncTP when we shouldn't be.

I'm not sure how worried we should be on the blast radius of this change. On the one hand:

(1) it technically fixes any silent correctness issues in the cache around any other private inductor configs (it looks like there are a few)

(2) there is some risk that there are some "harmless" configs that we are now including in the key, which may increase false negatives. I do see that there is an explicit list for "configs we want to ignore for caching" (`_save_config_ignore`), so my hope is that all harmless configs are already encapsulated there.

Pull Request resolved: pytorch#153672
Approved by: https://github.com/oulgen
ghstack dependencies: pytorch#153766
qingyi-yan pushed a commit to qingyi-yan/pytorch that referenced this pull request Jun 3, 2025
…153766)"

This reverts commit 5b6fd27.

Reverted pytorch#153766 on behalf of https://github.com/malfet due to I want to revert this change as I'm 90+% certain it somehow broke testing ([comment](pytorch#153766 (comment)))
iupaikov-amd pushed a commit to ROCm/pytorch that referenced this pull request Jun 4, 2025
I'm not entirely sure of the background for why inductor codecache code uses default python logging instead of the new TORCH_LOGS-based artifact logging, but switching it over to artifact logging makes it easier to use nice testing utils in the next PR.

Pull Request resolved: pytorch#153766
Approved by: https://github.com/oulgen, https://github.com/Skylion007
iupaikov-amd pushed a commit to ROCm/pytorch that referenced this pull request Jun 4, 2025
…torch#153672)

Fixes pytorch/torchtitan#1185

It looks like inductor's logic to include inductor configs in the cache key skips configs with a leading underscore by default. This came up in torchtitan - there's an asyncTP pipelining pass in inductor gated by a private config, and by not caching on the config we were attempting to use asyncTP when we shouldn't be.

I'm not sure how worried we should be on the blast radius of this change. On the one hand:

(1) it technically fixes any silent correctness issues in the cache around any other private inductor configs (it looks like there are a few)

(2) there is some risk that there are some "harmless" configs that we are now including in the key, which may increase false negatives. I do see that there is an explicit list for "configs we want to ignore for caching" (`_save_config_ignore`), so my hope is that all harmless configs are already encapsulated there.

Pull Request resolved: pytorch#153672
Approved by: https://github.com/oulgen
ghstack dependencies: pytorch#153766
iupaikov-amd pushed a commit to ROCm/pytorch that referenced this pull request Jun 4, 2025
…153766)"

This reverts commit 5b6fd27.

Reverted pytorch#153766 on behalf of https://github.com/malfet due to I want to revert this change as I'm 90+% certain it somehow broke testing ([comment](pytorch#153766 (comment)))
@bdhirsh
Copy link
Contributor Author

bdhirsh commented Jun 9, 2025

Looking into the error some more, it looks like inductor's handling for piping logging output between subprocesses doesn't have any knowledge of our TORCH_LOGS-based artifact logging.

I think I'm going to give up on this yak shave, and just land the PR above this one (the main reason I tried to move inductor's codecache logs to use artifact logging was to save ~20 lines of boilerplate on a test, which I'm going to add back).

@bdhirsh bdhirsh closed this Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.