Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

jiaau/kernels

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kernels

关注点

  • reduce

    • CUDA Warp-Level Primitives
    • Parallel reduction
  • transpose

    • Memory Coalescing
    • Shared Memory
    • Bank Conflict
    • Swizzling
    • CuTe
  • sgemm

    • Tile Size Tuning
    • Shared Memory
    • Bank Conflict
    • Double Buffer
    • Warp Divergence
    • Vectorized memory access

编译与运行

编译项目

make build
make install <kernel_name>

运行测试

make run <kernel_name>

使用NVIDIA Compute Profiler进行性能分析

make ncu <kernel_name>

清理构建文件

make clean

命令行选项

运行SGEMM测试时支持以下选项:

  • --bench: 启用基准测试模式
  • --times N: 指定基准测试迭代次数(默认:3)
  • --help: 显示帮助信息

例如:

make run <kernel_name> -- --bench --times 10

Acknowledgments

About

This repository showcases common optimization techniques for kernels.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
Morty Proxy This is a proxified and sanitized view of the page, visit original site.