Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[TRTLLM-11471][fix] Eliminate redundant serialization and MPI collectives in safe_allgather/safe_gather#13089

Merged
pcastonguay merged 3 commits intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom
chienchunhung:fix/safe-mpi-comm-perf-regressionchienchunhung/TensorRT-LLM:fix/safe-mpi-comm-perf-regressionCopy head branch name to clipboard
Apr 30, 2026
Merged

[TRTLLM-11471][fix] Eliminate redundant serialization and MPI collectives in safe_allgather/safe_gather#13089
pcastonguay merged 3 commits intoNVIDIA:mainNVIDIA/TensorRT-LLM:mainfrom
chienchunhung:fix/safe-mpi-comm-perf-regressionchienchunhung/TensorRT-LLM:fix/safe-mpi-comm-perf-regressionCopy head branch name to clipboard

Conversation

@chienchunhung
Copy link
Copy Markdown
Collaborator

@chienchunhung chienchunhung commented Apr 15, 2026

Summary by CodeRabbit

  • Performance

    • Optimized distributed communication operations with efficient data serialization, intelligent payload size handling, and a fast path for smaller data transfers to reduce overhead in multi-GPU scenarios.
  • Bug Fixes

    • Enhanced error handling in distributed operations with improved exception reporting and failure detection during collective communications.

Description

Summary

  • Fixes performance regression in safe_allgather/safe_gather introduced by PR [TRTLLM-11471][feat] Add safe version of allgather with chunking #12174: every call was doing 4 MPI collectives + 3 serializations instead of
    the original 2 + 1.
  • Rewrites both functions to serialize once, exchange lengths via buffer-based MPI_Allgather (1 collective), then transfer raw bytes via
    MPI_Allgatherv/MPI_Gatherv (1 collective) — matching what mpi4py's comm.allgather(obj) does internally.
  • Preserves the >2GB chunking safety for payloads exceeding the int32 displacement limit.

Before PR #12174 (original mpi4py)

comm.allgather(obj):                                                                                                                                                  
  pickle.dumps(obj)                     # serialization #1                                                                                                            
  MPI_Allgather(sizes)                  # internal MPI op #1                                                                                                          
  MPI_Allgatherv(pickled_bytes)         # internal MPI op #2                                                                                                          
  pickle.loads(...)                     # deserialization                                                                                                             
Total: 2 MPI collectives, 1 serialization                                                                                                                             
No protection against >2GB payloads (int32 overflow → silent corruption)                                                                                              

PR #12174 (introduced regression)

_prepare_chunked_transfer:
  pickle.dumps(obj)                     # serialization #1                                                                                                            
  comm.allgather(len(payload))          # 2 internal MPI ops (Allgather + Allgatherv)                                                                                 
fast path (num_rounds <= 1):                                                                                                                                          
  comm.allgather(obj)                   # serialization #2 + #3, 2 more internal MPI ops                                                                              
Total: 4 MPI collectives, 3 serializations                                                                                                                            
Added >2GB chunking safety, but 2x collective overhead on the hot path                                                                                                

This PR (fix)

_serialize_and_exchange_lengths:                                                                                                                                      
  pickle.dumps(obj)                     # serialization #1                                                                                                            
  comm.Allgather([local_len, INT64])    # 1 buffer-based MPI op (no pickle)                                                                                           
safe_allgather / safe_gather:                                                                                                                                         
  comm.Allgatherv([sendbuf, BYTE], ...) # 1 buffer-based MPI op (pre-serialized bytes)                                                                                
Total: 2 MPI collectives, 1 serialization
Preserves >2GB chunking safety, matches original mpi4py performance

Additional improvements

  • Preserve exception chain on serialization failures (from local_ser_error)
  • Log when entering the chunked transfer path (>2GB aggregate payloads)
  • Remove dead size > 0 guards (MPI guarantees size >= 1)
  • Fix test docstring referencing wrong filename
  • Add CommSpy-based tests verifying exact collective and serialization counts

Test coverage

Scenario Test Status
Small object fast path (2a) Most TestSafeAllgather/TestSafeGather tests + CommSpy tests Covered
Large object chunked path (2b) test_*_large_object, test_*_multi_round_chunking Covered
Asymmetric payload sizes test_*_displacement_correctness_asymmetric Covered
chunk_size validation test_*_invalid_chunk_size Covered
chunk_size auto-capping test_allgather_chunk_size_auto_capped Covered
Exact collective count (2 buffer-based, 0 Python-level) test_*_uses_exactly_two_collectives New
Exact serialization count (1 pickle.dumps) test_*_serializes_once New
None/empty payloads test_allgather_none, test_allgather_empty_collections Covered
Non-zero root for gather test_gather_non_zero_root Covered
Cross-rank consistency test_allgather_cross_rank_consistency Covered
MPIDist TP/PP/CP wiring TestMPIDistAllgather, TestMPIDistGather Covered
Total > int32 (real >2GB payloads) N/A — requires multi-GB allocations per rank Not covered (by design)

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@chienchunhung chienchunhung changed the title [TRTLLM-11471][perf] Eliminate redundant serialization and MPI collectives in safe_allgather/safe_gather [TRTLLM-11471][fix] Eliminate redundant serialization and MPI collectives in safe_allgather/safe_gather Apr 15, 2026
@chienchunhung chienchunhung changed the title [TRTLLM-11471][fix] Eliminate redundant serialization and MPI collectives in safe_allgather/safe_gather [TRTLLM-11471] [fix] Eliminate redundant serialization and MPI collectives in safe_allgather/safe_gather Apr 15, 2026
@chienchunhung chienchunhung marked this pull request as ready for review April 15, 2026 19:43
@chienchunhung chienchunhung requested a review from a team as a code owner April 15, 2026 19:43
@chienchunhung chienchunhung requested a review from HuiGao-NV April 15, 2026 19:43
@chienchunhung chienchunhung changed the title [TRTLLM-11471] [fix] Eliminate redundant serialization and MPI collectives in safe_allgather/safe_gather [TRTLLM-11471][fix] Eliminate redundant serialization and MPI collectives in safe_allgather/safe_gather Apr 15, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 729740b6-c51f-4f83-9c8f-6ab56dd39244

📥 Commits

Reviewing files that changed from the base of the PR and between 51f7956 and 56a5dfb.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/distributed/communicator.py
  • tests/unittest/_torch/distributed/test_safe_mpi_comm.py

📝 Walkthrough

Walkthrough

This change refactors the distributed communicator to optimize collective operations by separating serialization and length exchange from data transfer. A new _serialize_and_exchange_lengths() helper replaces the previous chunking preparation. Both safe_gather() and safe_allgather() now include int32-optimized fast paths, validation logic, and improved error handling. Tests verify the new behavior through call-tracking and serialization assertions.

Changes

Cohort / File(s) Summary
Core Communicator Implementation
tensorrt_llm/_torch/distributed/communicator.py
Replaced _prepare_chunked_transfer() with _serialize_and_exchange_lengths() for length exchange. Updated safe_gather() and safe_allgather() to add chunk validation, int32-fitting fast paths using lowercase collectives, and fallback to chunked transfers. Enhanced error handling for serialization failures.
Distributed Communication Tests
tests/unittest/_torch/distributed/test_safe_mpi_comm.py
Added CommSpy wrapper class to instrument and count MPI collective calls. Extended test suites with assertions verifying safe_allgather() and safe_gather() perform exactly two collectives and serialize exactly once.

Sequence Diagram(s)

sequenceDiagram
    actor Rank0 as Rank 0
    actor Rank1 as Rank 1
    participant MPI
    
    Rank0->>Rank0: Pickle serialize obj
    Rank1->>Rank1: Pickle serialize obj
    
    Note over Rank0,Rank1: Step 1: Exchange Lengths
    Rank0->>MPI: Allgather(sendbuf=[len0])
    Rank1->>MPI: Allgather(sendbuf=[len1])
    MPI-->>Rank0: recvbuf=[len0, len1]
    MPI-->>Rank1: recvbuf=[len0, len1]
    
    Note over Rank0,Rank1: Step 2: Conditional Data Transfer
    alt Fits in int32
        Rank0->>MPI: Gatherv/Allgatherv (int32 counts/displs)
        Rank1->>MPI: Gatherv/Allgatherv (int32 counts/displs)
    else Exceeds int32
        loop For each chunk
            Rank0->>MPI: Gatherv/Allgatherv (chunked)
            Rank1->>MPI: Gatherv/Allgatherv (chunked)
        end
    end
    
    Rank0->>Rank0: Unpickle deserialize
    Rank1->>Rank1: Unpickle deserialize
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 59.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main fix: eliminating redundant MPI collectives and serializations in safe_allgather/safe_gather, matching the core objective of resolving the performance regression.
Description check ✅ Passed The PR description comprehensively covers the issue (performance regression), the solution approach (reducing from 4 collectives/3 serializations to 2/1), detailed before/after comparisons, test coverage with a comprehensive table, and a completed checklist.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot help

@github-actions
Copy link
Copy Markdown

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test

@chienchunhung chienchunhung requested a review from yuxianq April 16, 2026 18:02
@chienchunhung chienchunhung force-pushed the fix/safe-mpi-comm-perf-regression branch from 56a5dfb to fc936fd Compare April 16, 2026 18:16
@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --add-multi-gpu-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43839 [ run ] triggered by Bot. Commit: fc936fd Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43839 [ run ] completed with state FAILURE. Commit: fc936fd
/LLM/main/L0_MergeRequest_PR pipeline #34303 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Copy link
Copy Markdown
Collaborator

@pcastonguay pcastonguay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@chienchunhung chienchunhung force-pushed the fix/safe-mpi-comm-perf-regression branch from fc936fd to 223cc98 Compare April 20, 2026 19:10
@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44506 [ run ] triggered by Bot. Commit: 223cc98 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44506 [ run ] completed with state SUCCESS. Commit: 223cc98
/LLM/main/L0_MergeRequest_PR pipeline #34906 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chienchunhung chienchunhung force-pushed the fix/safe-mpi-comm-perf-regression branch from 223cc98 to 28bf48a Compare April 22, 2026 18:30
@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45009 [ run ] triggered by Bot. Commit: 28bf48a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45009 [ run ] completed with state FAILURE. Commit: 28bf48a

Link to invocation

@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45025 [ run ] triggered by Bot. Commit: 28bf48a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45025 [ run ] completed with state SUCCESS. Commit: 28bf48a
/LLM/main/L0_MergeRequest_PR pipeline #35335 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chienchunhung chienchunhung force-pushed the fix/safe-mpi-comm-perf-regression branch from f062b6b to 8211f2b Compare April 24, 2026 03:08
@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45317 [ run ] triggered by Bot. Commit: 8211f2b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45317 [ run ] completed with state ABORTED. Commit: 8211f2b

Link to invocation

@chienchunhung chienchunhung force-pushed the fix/safe-mpi-comm-perf-regression branch from 8211f2b to 11370d2 Compare April 25, 2026 03:19
@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45475 [ run ] triggered by Bot. Commit: 11370d2 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45475 [ run ] completed with state SUCCESS. Commit: 11370d2
/LLM/main/L0_MergeRequest_PR pipeline #35706 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chienchunhung chienchunhung force-pushed the fix/safe-mpi-comm-perf-regression branch from 11370d2 to e2e72ca Compare April 27, 2026 05:00
@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45651 [ run ] triggered by Bot. Commit: e2e72ca Link to invocation

@chienchunhung chienchunhung force-pushed the fix/safe-mpi-comm-perf-regression branch from e2e72ca to e3dfa70 Compare April 27, 2026 18:19
@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45769 [ run ] triggered by Bot. Commit: e3dfa70 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45769 [ run ] completed with state ABORTED. Commit: e3dfa70

Link to invocation

@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45989 [ run ] triggered by Bot. Commit: e3dfa70 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45989 [ run ] completed with state ABORTED. Commit: e3dfa70

Link to invocation

… safe_allgather/safe_gather

The original implementation serialized objects and exchanged lengths
via Python-level comm.allgather (2 internal MPI collectives + 1
serialization), then called comm.allgather/gather again for the data
transfer (2 more MPI collectives + 1 more serialization) — totaling
4 MPI collectives and 3 serializations per call.

This rewrites the functions to:
1. Serialize once with pickle.dumps
2. Exchange lengths via buffer-based MPI_Allgather (1 collective)
3. Transfer raw bytes via MPI_Allgatherv/Gatherv (1 collective)

This matches the collective count that mpi4py's comm.allgather(obj)
uses internally (2 collectives, 1 serialization) while preserving
the >2GB chunking safety for payloads exceeding the int32
displacement limit.

Additional fixes:
- Preserve exception chain on serialization failures (from exc)
- Log when entering the chunked transfer path (>2GB payloads)
- Remove dead size>0 guards (MPI guarantees size>=1)
- Fix test docstring referencing wrong filename
- Add CommSpy-based tests verifying exact collective and
  serialization counts

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
…ad of backslash continuation

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
@chienchunhung chienchunhung force-pushed the fix/safe-mpi-comm-perf-regression branch from e3dfa70 to ee36d04 Compare April 29, 2026 22:26
@chienchunhung
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46219 [ run ] triggered by Bot. Commit: ee36d04 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46219 [ run ] completed with state SUCCESS. Commit: ee36d04
/LLM/main/L0_MergeRequest_PR pipeline #36330 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@pcastonguay
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@pcastonguay pcastonguay enabled auto-merge (squash) April 30, 2026 17:57
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46423 [ run ] triggered by Bot. Commit: ee36d04 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46423 [ run ] completed with state SUCCESS. Commit: ee36d04
/LLM/main/L0_MergeRequest_PR pipeline #36495 completed with status: 'SUCCESS'

CI Report

Link to invocation

@pcastonguay pcastonguay merged commit 4000e48 into NVIDIA:main Apr 30, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.