Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Discussion options

I have noticed that the inference speed using the GPU is slower than that of the CPU, and I am uncertain about the underlying issue.

Introduction:

  • Operating System: Windows 11
  • Memory: 32 GB
  • GPU: RTX 3060 with 12 GB Memory
  1. Firstly GPU
    I followed the method outlined in the documentation for manual compilation on Windows. The steps are as follows:
    PPLCV
git clone https://github.com/openppl-public/ppl.cv.git  
cd ppl.cv  
git checkout tags/v0.7.0 -b v0.7.0  
$env:PPLCV_DIR = "$pwd"  
mkdir pplcv-build  
cd pplcv-build  
cmake .. -G "Visual Studio 16 2019" -T v142 -A x64 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install -DPPLCV_USE_CUDA=ON -DPPLCV_USE_MSVC_STATIC_RUNTIME=OFF  
cmake --install . --config Release  
cd ../..  

CUDA + TensorRT

cd $env:MMDEPLOY_DIR  
mkdir build -ErrorAction SilentlyContinue  
cd build  
cmake .. -G "Visual Studio 16 2019" -A x64 -T v142 `  
  -DMMDEPLOY_BUILD_SDK=ON `  
  -DMMDEPLOY_BUILD_EXAMPLES=ON `  
  -DMMDEPLOY_BUILD_SDK_PYTHON_API=ON `  
  -DMMDEPLOY_TARGET_DEVICES="cuda" `  
  -DMMDEPLOY_TARGET_BACKENDS="ort;trt" `  
  -Dpplcv_DIR="$env:PPLCV_DIR/pplcv-build/install/lib/cmake/ppl" `  
  -DTENSORRT_DIR="$env:TENSORRT_DIR" `  
  -DCUDNN_DIR="$env:CUDNN_DIR"  

cmake --build . --config Release -- /m  
cmake --install . --config Release  

In this process, I modified the official documentation from -DMMDEPLOY_TARGET_BACKENDS="trt" to -DMMDEPLOY_TARGET_BACKENDS="ort;trt" because using -DMMDEPLOY_TARGET_BACKENDS="trt" resulted in an error.

[2024-07-28 20:31:48.312] [mmdeploy] [info] [model.cpp:35] [DirectoryModel] Load model: "d:/my_progarm/mmdeploy/mmdeploy_models/mmocr/dbnet/ort"
[2024-07-28 20:31:48.394] [mmdeploy] [error] [net_module.cpp:47] Net backend not found: onnxruntime, available backends: [("tensorrt", 0)]
[2024-07-28 20:31:48.397] [mmdeploy] [error] [task.cpp:99] error parsing config: {
  "context": {
    "device": "<any>",
    "model": "<any>",
    "stream": "<any>"
  },
  "input": [
    "prep_output"
  ],
  "input_map": {
    "img": "input"
  },
  "is_batched": true,
  "module": "Net",
  "name": "dbnet",
  "output": [
    "infer_output"
  ],
  "output_map": {},
  "type": "Task"
}
DBnet模型加载耗时 85 ms

Code

Below is the modified program I created to perform inference using DBNet independently.

  • ocr_test

#include <string>

#include "mmdeploy/text_detector.hpp"
#include "mmdeploy/text_recognizer.hpp"
#include "argparse.h"
#include "mediaio.h"
#include "visualize.h"

DEFINE_ARG_string(det_model, "Text detection model path");
// DEFINE_ARG_string(reg_model, "Text recognition model path");
DEFINE_ARG_string(image, "Input image path");
DEFINE_string(device, "cpu", R"(Device name, e.g. "cpu", "cuda")");
DEFINE_string(output, "text_ocr_output.jpg", "Output image path");

using mmdeploy::TextDetector;
// using mmdeploy::TextRecognizer;

int main(int argc, char* argv[]) {
    if (!utils::ParseArguments(argc, argv)) {
        return -1;
    }

    cv::Mat img = cv::imread(ARGS_image);
    if (img.empty()) {
        fprintf(stderr, "failed to load image: %s\n", ARGS_image.c_str());
        return -1;
    }

    mmdeploy::Device device(FLAGS_device);
    std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
    begin = std::chrono::steady_clock::now();
    TextDetector detector{ mmdeploy::Model(ARGS_det_model), device };
    std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
    std::cout << "DBnet模型加载耗时 " << std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count() << " ms" << std::endl;
    // TextRecognizer recognizer{ mmdeploy::Model(ARGS_reg_model), device };

    // apply the detector, the result is an array-like class holding references to
    // `mmdeploy_text_detection_t`, will be released automatically on destruction
    // 应用检测器,结果是一个类数组,持有对 mmdeploy_text_detection_t 的引用,释放将在销毁时自动进行
    
    begin = std::chrono::steady_clock::now();
    // 执行推理
    TextDetector::Result bboxes = detector.Apply(img);
    end = std::chrono::steady_clock::now();
    std::cout << "推理耗时: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - begin).count() << " ms" << std::endl;
   

    // apply recognizer, if no bboxes are provided, full image will be used; the result is an
    // array-like class holding references to `mmdeploy_text_recognition_t`, will be released
    // automatically on destruction
    // 应用识别器,如果未提供边界框,将使用整个图像;结果是一个类数组,持有对 mmdeploy_text_recognition_t 的引用,释放将在销毁时自动进行。
    // TextRecognizer::Result texts = recognizer.Apply(img, { bboxes.begin(), bboxes.size() });

    // visualize results
    // 可视化结果
    utils::Visualize v;
    auto sess = v.get_session(img);
    for (size_t i = 0; i < bboxes.size(); ++i) {
        mmdeploy_text_detection_t& bbox = bboxes[i];
        // mmdeploy_text_recognition_t& text = texts[i];
        sess.add_text_dbnet(bbox.bbox, bbox.score, i);
    }

    if (!FLAGS_output.empty()) {
        cv::imwrite(FLAGS_output, sess.get());
    }

    return 0;
}

debug d:/my_progarm/mmdeploy/mmdeploy_models/mmocr/dbnet/ortc:/Users/12994/source/repos/TextDetection/x64/Release/demo_text_det.jpg --device cuda
output

[2024-07-28 20:46:29.791] [mmdeploy] [info] [model.cpp:35] [DirectoryModel] Load model: "d:/my_progarm/mmdeploy/mmdeploy_models/mmocr/dbnet/ort"
DBnet模型加载耗时 353 ms
推理耗时: 6633 ms
[ INFO:0@7.057] global c:\build\master_winpack-build-win64-vc15\opencv\modules\core\src\parallel\registry_parallel.impl.hpp (96) cv::parallel::ParallelBackendRegistry::ParallelBackendRegistry core(parallel): Enabled backends(3, sorted by priority): ONETBB(1000); TBB(990); OPENMP(980)
[ INFO:0@7.058] global c:\build\master_winpack-build-win64-vc15\opencv\modules\core\src\utils\plugin_loader.impl.hpp (67) cv::plugin::impl::DynamicLib::libraryLoad load D:\nvidia\opencv4.5.5\opencv\build\x64\vc15\bin\opencv_core_parallel_onetbb455_64d.dll => FAILED
[ INFO:0@7.060] global c:\build\master_winpack-build-win64-vc15\opencv\modules\core\src\utils\plugin_loader.impl.hpp (67) cv::plugin::impl::DynamicLib::libraryLoad load opencv_core_parallel_onetbb455_64d.dll => FAILED
[ INFO:0@7.060] global c:\build\master_winpack-build-win64-vc15\opencv\modules\core\src\utils\plugin_loader.impl.hpp (67) cv::plugin::impl::DynamicLib::libraryLoad load D:\nvidia\opencv4.5.5\opencv\build\x64\vc15\bin\opencv_core_parallel_tbb455_64d.dll => FAILED
[ INFO:0@7.062] global c:\build\master_winpack-build-win64-vc15\opencv\modules\core\src\utils\plugin_loader.impl.hpp (67) cv::plugin::impl::DynamicLib::libraryLoad load opencv_core_parallel_tbb455_64d.dll => FAILED
[ INFO:0@7.063] global c:\build\master_winpack-build-win64-vc15\opencv\modules\core\src\utils\plugin_loader.impl.hpp (67) cv::plugin::impl::DynamicLib::libraryLoad load D:\nvidia\opencv4.5.5\opencv\build\x64\vc15\bin\opencv_core_parallel_openmp455_64d.dll => FAILED
[ INFO:0@7.065] global c:\build\master_winpack-build-win64-vc15\opencv\modules\core\src\utils\plugin_loader.impl.hpp (67) cv::plugin::impl::DynamicLib::libraryLoad load opencv_core_parallel_openmp455_64d.dll => FAILED
bbox[0]: (192.00, 38.00), (238.00, 32.00), (240.00, 44.00), (193.00, 50.00), 0.84
bbox[1]: (240.00, 45.00), (240.00, 35.00), (253.00, 35.00), (253.00, 45.00), 0.75
bbox[2]: (251.00, 44.00), (258.00, 33.00), (309.00, 64.00), (301.00, 76.00), 0.82
bbox[3]: (152.00, 67.00), (186.00, 41.00), (193.00, 51.00), (159.00, 76.00), 0.83
bbox[4]: (164.00, 98.00), (164.00, 79.00), (247.00, 79.00), (247.00, 98.00), 0.91
bbox[5]: (251.00, 76.00), (296.00, 76.00), (297.00, 98.00), (251.00, 98.00), 0.90
bbox[6]: (164.00, 121.00), (164.00, 104.00), (215.00, 105.00), (215.00, 122.00), 0.90
bbox[7]: (219.00, 121.00), (219.00, 105.00), (293.00, 105.00), (293.00, 121.00), 0.90
bbox[8]: (260.00, 141.00), (260.00, 127.00), (310.00, 127.00), (310.00, 141.00), 0.86
bbox[9]: (230.00, 140.00), (230.00, 126.00), (256.00, 126.00), (256.00, 140.00), 0.86
bbox[10]: (145.00, 141.00), (145.00, 126.00), (228.00, 126.00), (228.00, 141.00), 0.89
bbox[11]: (199.00, 156.00), (199.00, 143.00), (225.00, 143.00), (225.00, 156.00), 0.84
bbox[12]: (226.00, 157.00), (226.00, 144.00), (287.00, 146.00), (287.00, 159.00), 0.83
bbox[13]: (167.00, 156.00), (167.00, 142.00), (196.00, 142.00), (196.00, 156.00), 0.87
bbox[14]: (227.00, 188.00), (227.00, 175.00), (275.00, 175.00), (275.00, 188.00), 0.88
bbox[15]: (180.00, 174.00), (180.00, 158.00), (208.00, 158.00), (208.00, 174.00), 0.85
bbox[16]: (210.00, 160.00), (279.00, 159.00), (279.00, 173.00), (210.00, 173.00), 0.85
bbox[17]: (181.00, 189.00), (181.00, 175.00), (223.00, 175.00), (223.00, 190.00), 0.85
bbox[18]: (199.00, 206.00), (199.00, 190.00), (260.00, 190.00), (260.00, 206.00), 0.87
bbox[19]: (172.00, 212.00), (177.00, 197.00), (211.00, 210.00), (205.00, 225.00), 0.87
bbox[20]: (241.00, 214.00), (277.00, 199.00), (283.00, 212.00), (246.00, 227.00), 0.86
bbox[21]: (206.00, 227.00), (206.00, 211.00), (244.00, 212.00), (244.00, 228.00), 0.82
  1. Second CPU
    Using the CPU with ONNX Runtime as outlined in the documentation.
cd $env:MMDEPLOY_DIR
mkdir build -ErrorAction SilentlyContinue
cd build
cmake .. -G "Visual Studio 16 2019" -A x64 -T v142 `
    -DMMDEPLOY_BUILD_SDK=ON `
    -DMMDEPLOY_BUILD_EXAMPLES=ON `
    -DMMDEPLOY_BUILD_SDK_PYTHON_API=ON `
    -DMMDEPLOY_TARGET_DEVICES="cpu" `
    -DMMDEPLOY_TARGET_BACKENDS="ort" `
    -DONNXRUNTIME_DIR="$env:ONNXRUNTIME_DIR"

cmake --build . --config Release -- /m
cmake --install . --config Release

output

[2024-07-28 21:16:54.240] [mmdeploy] [info] [model.cpp:35] [DirectoryModel] Load model: "d:/my_progarm/mmdeploy/mmdeploy_models/mmocr/dbnet/ort"
DBnet模型加载耗时 201 ms
推理耗时: 510 ms
[ INFO:0@0.786] global c:\build\master_winpack-build-win64-vc15\opencv\modules\core\src\parallel\registry_parallel.impl.hpp (96) cv::parallel::ParallelBackendRegistry::ParallelBackendRegistry core(parallel): Enabled backends(3, sorted by priority): ONETBB(1000); TBB(990); OPENMP(980)
[ INFO:0@0.786] global c:\build\master_winpack-build-win64-vc15\opencv\modules\core\src\utils\plugin_loader.impl.hpp (67) cv::plugin::impl::DynamicLib::libraryLoad load D:\nvidia\opencv4.5.5\opencv\build\x64\vc15\bin\opencv_core_parallel_onetbb455_64d.dll => FAILED
[ INFO:0@0.789] global c:\build\master_winpack-build-win64-vc15\opencv\modules\core\src\utils\plugin_loader.impl.hpp (67) cv::plugin::impl::DynamicLib::libraryLoad load opencv_core_parallel_onetbb455_64d.dll => FAILED
[ INFO:0@0.789] global c:\build\master_winpack-build-win64-vc15\opencv\modules\core\src\utils\plugin_loader.impl.hpp (67) cv::plugin::impl::DynamicLib::libraryLoad load D:\nvidia\opencv4.5.5\opencv\build\x64\vc15\bin\opencv_core_parallel_tbb455_64d.dll => FAILED
[ INFO:0@0.792] global c:\build\master_winpack-build-win64-vc15\opencv\modules\core\src\utils\plugin_loader.impl.hpp (67) cv::plugin::impl::DynamicLib::libraryLoad load opencv_core_parallel_tbb455_64d.dll => FAILED
[ INFO:0@0.792] global c:\build\master_winpack-build-win64-vc15\opencv\modules\core\src\utils\plugin_loader.impl.hpp (67) cv::plugin::impl::DynamicLib::libraryLoad load D:\nvidia\opencv4.5.5\opencv\build\x64\vc15\bin\opencv_core_parallel_openmp455_64d.dll => FAILED
[ INFO:0@0.795] global c:\build\master_winpack-build-win64-vc15\opencv\modules\core\src\utils\plugin_loader.impl.hpp (67) cv::plugin::impl::DynamicLib::libraryLoad load opencv_core_parallel_openmp455_64d.dll => FAILED
bbox[0]: (208.00, 226.00), (208.00, 213.00), (242.00, 213.00), (242.00, 226.00), 0.82
bbox[1]: (241.00, 214.00), (277.00, 200.00), (282.00, 212.00), (246.00, 226.00), 0.86
bbox[2]: (173.00, 212.00), (178.00, 198.00), (210.00, 210.00), (205.00, 224.00), 0.86
bbox[3]: (200.00, 204.00), (200.00, 189.00), (260.00, 191.00), (259.00, 206.00), 0.86
bbox[4]: (181.00, 189.00), (182.00, 175.00), (223.00, 175.00), (223.00, 189.00), 0.86
bbox[5]: (228.00, 187.00), (228.00, 176.00), (274.00, 176.00), (274.00, 187.00), 0.93
bbox[6]: (210.00, 173.00), (210.00, 160.00), (279.00, 160.00), (279.00, 173.00), 0.79
bbox[7]: (180.00, 174.00), (180.00, 158.00), (207.00, 158.00), (207.00, 174.00), 0.84
bbox[8]: (227.00, 157.00), (227.00, 145.00), (287.00, 146.00), (286.00, 158.00), 0.87
bbox[9]: (199.00, 156.00), (199.00, 143.00), (225.00, 144.00), (224.00, 156.00), 0.83
bbox[10]: (167.00, 142.00), (195.00, 141.00), (196.00, 155.00), (167.00, 156.00), 0.89
bbox[11]: (260.00, 127.00), (310.00, 126.00), (310.00, 141.00), (260.00, 141.00), 0.83
bbox[12]: (145.00, 140.00), (145.00, 126.00), (227.00, 126.00), (227.00, 140.00), 0.87
bbox[13]: (230.00, 140.00), (230.00, 126.00), (256.00, 126.00), (256.00, 140.00), 0.84
bbox[14]: (220.00, 119.00), (220.00, 106.00), (292.00, 106.00), (292.00, 119.00), 0.88
bbox[15]: (164.00, 120.00), (164.00, 104.00), (215.00, 105.00), (214.00, 121.00), 0.89
bbox[16]: (165.00, 81.00), (246.00, 80.00), (246.00, 97.00), (165.00, 98.00), 0.92
bbox[17]: (251.00, 77.00), (296.00, 77.00), (296.00, 97.00), (252.00, 98.00), 0.92
bbox[18]: (153.00, 67.00), (186.00, 42.00), (192.00, 51.00), (159.00, 75.00), 0.85
bbox[19]: (253.00, 44.00), (259.00, 35.00), (307.00, 64.00), (301.00, 74.00), 0.83
bbox[20]: (241.00, 45.00), (241.00, 35.00), (253.00, 35.00), (253.00, 45.00), 0.75
bbox[21]: (192.00, 38.00), (238.00, 33.00), (239.00, 43.00), (193.00, 49.00), 0.81

cuda

You must be logged in to vote

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
🙏
Q&A
Labels
None yet
1 participant
Morty Proxy This is a proxified and sanitized view of the page, visit original site.