Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 2a8d4b8

Browse filesBrowse files
Merge pull request opencv#27000 from GenshinImpactStarts:cart_to_polar
[HAL RVV] reuse atan | impl cart_to_polar | add perf test opencv#27000 Implement through the existing `cv_hal_cartToPolar32f` and `cv_hal_cartToPolar64f` interfaces. Add `cartToPolar` performance tests. cv_hal_rvv::fast_atan is modified to make it more reusable because it's needed in cartToPolar. **UPDATE**: UI enabled. Since the vec type of RVV can't be stored in struct. UI implementation of `v_atan_f32` is modified. Both `fastAtan` and `cartToPolar` are affected so the test result for `atan` is also appended. I have tested the modified UI on RVV and AVX2 and no regressions appears. Perf test done on MUSE-PI. AVX2 test done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz. ```sh $ opencv_test_core --gtest_filter="*CartToPolar*:*Core_CartPolar_reverse*:*Phase*" $ opencv_perf_core --gtest_filter="*CartToPolar*:*phase*" --perf_min_samples=300 --perf_force_samples=300 ``` Test result between enabled UI and HAL: ``` Name of Test ui rvv rvv vs ui (x-factor) CartToPolar::CartToPolarFixture::(127x61, 32FC1) 0.106 0.059 1.80 CartToPolar::CartToPolarFixture::(127x61, 64FC1) 0.155 0.070 2.20 CartToPolar::CartToPolarFixture::(640x480, 32FC1) 4.188 2.317 1.81 CartToPolar::CartToPolarFixture::(640x480, 64FC1) 6.593 2.889 2.28 CartToPolar::CartToPolarFixture::(1280x720, 32FC1) 12.600 7.057 1.79 CartToPolar::CartToPolarFixture::(1280x720, 64FC1) 19.860 8.797 2.26 CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 28.295 15.809 1.79 CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 44.573 19.398 2.30 phase32f::VectorLength::128 0.002 0.002 1.20 phase32f::VectorLength::1000 0.008 0.006 1.32 phase32f::VectorLength::131072 1.061 0.731 1.45 phase32f::VectorLength::524288 3.997 2.976 1.34 phase32f::VectorLength::1048576 8.001 5.959 1.34 phase64f::VectorLength::128 0.002 0.002 1.33 phase64f::VectorLength::1000 0.012 0.008 1.58 phase64f::VectorLength::131072 1.648 0.931 1.77 phase64f::VectorLength::524288 6.836 3.837 1.78 phase64f::VectorLength::1048576 14.060 7.540 1.86 ``` Test result before and after enabling UI on RVV: ``` Name of Test perf perf perf ui ui ui orig pr pr vs perf ui orig (x-factor) CartToPolar::CartToPolarFixture::(127x61, 32FC1) 0.141 0.106 1.33 CartToPolar::CartToPolarFixture::(127x61, 64FC1) 0.187 0.155 1.20 CartToPolar::CartToPolarFixture::(640x480, 32FC1) 5.990 4.188 1.43 CartToPolar::CartToPolarFixture::(640x480, 64FC1) 8.370 6.593 1.27 CartToPolar::CartToPolarFixture::(1280x720, 32FC1) 18.214 12.600 1.45 CartToPolar::CartToPolarFixture::(1280x720, 64FC1) 25.365 19.860 1.28 CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 40.437 28.295 1.43 CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 56.699 44.573 1.27 phase32f::VectorLength::128 0.003 0.002 1.54 phase32f::VectorLength::1000 0.016 0.008 1.90 phase32f::VectorLength::131072 2.048 1.061 1.93 phase32f::VectorLength::524288 8.219 3.997 2.06 phase32f::VectorLength::1048576 16.426 8.001 2.05 phase64f::VectorLength::128 0.003 0.002 1.44 phase64f::VectorLength::1000 0.020 0.012 1.60 phase64f::VectorLength::131072 2.621 1.648 1.59 phase64f::VectorLength::524288 10.780 6.836 1.58 phase64f::VectorLength::1048576 22.723 14.060 1.62 ``` Test result before and after modifying UI on AVX2: ``` Name of Test perf perf perf avx2 avx2 avx2 orig pr pr vs perf avx2 orig (x-factor) CartToPolar::CartToPolarFixture::(127x61, 32FC1) 0.006 0.005 1.14 CartToPolar::CartToPolarFixture::(127x61, 64FC1) 0.010 0.009 1.08 CartToPolar::CartToPolarFixture::(640x480, 32FC1) 0.273 0.264 1.03 CartToPolar::CartToPolarFixture::(640x480, 64FC1) 0.511 0.487 1.05 CartToPolar::CartToPolarFixture::(1280x720, 32FC1) 0.760 0.723 1.05 CartToPolar::CartToPolarFixture::(1280x720, 64FC1) 2.009 1.937 1.04 CartToPolar::CartToPolarFixture::(1920x1080, 32FC1) 1.996 1.923 1.04 CartToPolar::CartToPolarFixture::(1920x1080, 64FC1) 5.721 5.509 1.04 phase32f::VectorLength::128 0.000 0.000 0.98 phase32f::VectorLength::1000 0.001 0.001 0.97 phase32f::VectorLength::131072 0.105 0.111 0.95 phase32f::VectorLength::524288 0.402 0.402 1.00 phase32f::VectorLength::1048576 0.775 0.767 1.01 phase64f::VectorLength::128 0.000 0.000 1.00 phase64f::VectorLength::1000 0.001 0.001 1.01 phase64f::VectorLength::131072 0.163 0.162 1.01 phase64f::VectorLength::524288 0.669 0.653 1.02 phase64f::VectorLength::1048576 1.660 1.634 1.02 ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
1 parent b129abf commit 2a8d4b8
Copy full SHA for 2a8d4b8

File tree

Expand file treeCollapse file tree

5 files changed

+160
-113
lines changed
Filter options
Expand file treeCollapse file tree

5 files changed

+160
-113
lines changed

‎3rdparty/hal_rvv/hal_rvv.hpp

Copy file name to clipboardExpand all lines: 3rdparty/hal_rvv/hal_rvv.hpp
+1Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
#include "hal_rvv_1p0/atan.hpp" // core
3232
#include "hal_rvv_1p0/split.hpp" // core
3333
#include "hal_rvv_1p0/magnitude.hpp" // core
34+
#include "hal_rvv_1p0/cart_to_polar.hpp" // core
3435
#include "hal_rvv_1p0/flip.hpp" // core
3536
#include "hal_rvv_1p0/lut.hpp" // core
3637
#include "hal_rvv_1p0/exp.hpp" // core

‎3rdparty/hal_rvv/hal_rvv_1p0/atan.hpp

Copy file name to clipboardExpand all lines: 3rdparty/hal_rvv/hal_rvv_1p0/atan.hpp
+59-65Lines changed: 59 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -13,67 +13,76 @@
1313

1414
#include <cfloat>
1515

16-
namespace cv::cv_hal_rvv {
16+
namespace cv { namespace cv_hal_rvv {
1717

1818
namespace detail {
1919
// ref: mathfuncs_core.simd.hpp
2020
static constexpr float pi = CV_PI;
21-
static constexpr float atan2_p1 = 0.9997878412794807F * (180 / pi);
22-
static constexpr float atan2_p3 = -0.3258083974640975F * (180 / pi);
23-
static constexpr float atan2_p5 = 0.1555786518463281F * (180 / pi);
24-
static constexpr float atan2_p7 = -0.04432655554792128F * (180 / pi);
25-
26-
__attribute__((always_inline)) inline vfloat32m4_t
27-
rvv_atan_f32(vfloat32m4_t vy, vfloat32m4_t vx, size_t vl, float p7,
28-
vfloat32m4_t vp5, vfloat32m4_t vp3, vfloat32m4_t vp1,
29-
float angle_90_deg) {
21+
22+
struct AtanParams
23+
{
24+
float p1, p3, p5, p7, angle_90;
25+
};
26+
27+
static constexpr AtanParams atan_params_rad {
28+
0.9997878412794807F,
29+
-0.3258083974640975F,
30+
0.1555786518463281F,
31+
-0.04432655554792128F,
32+
90.F * (pi / 180.F)};
33+
static constexpr AtanParams atan_params_deg {
34+
atan_params_rad.p1 * (180 / pi),
35+
atan_params_rad.p3 * (180 / pi),
36+
atan_params_rad.p5 * (180 / pi),
37+
atan_params_rad.p7 * (180 / pi),
38+
90.F};
39+
40+
template <typename VEC_T>
41+
__attribute__((always_inline)) inline VEC_T
42+
rvv_atan(VEC_T vy, VEC_T vx, size_t vl, const AtanParams& params)
43+
{
3044
const auto ax = __riscv_vfabs(vx, vl);
3145
const auto ay = __riscv_vfabs(vy, vl);
32-
const auto c = __riscv_vfdiv(
33-
__riscv_vfmin(ax, ay, vl),
34-
__riscv_vfadd(__riscv_vfmax(ax, ay, vl), FLT_EPSILON, vl), vl);
46+
// Reciprocal Estimate (vfrec7) is not accurate enough to pass the test of cartToPolar.
47+
const auto c = __riscv_vfdiv(__riscv_vfmin(ax, ay, vl),
48+
__riscv_vfadd(__riscv_vfmax(ax, ay, vl), FLT_EPSILON, vl),
49+
vl);
3550
const auto c2 = __riscv_vfmul(c, c, vl);
3651

37-
auto a = __riscv_vfmadd(c2, p7, vp5, vl);
38-
a = __riscv_vfmadd(a, c2, vp3, vl);
39-
a = __riscv_vfmadd(a, c2, vp1, vl);
52+
// Using vfmadd only results in about a 2% performance improvement, but it occupies 3 additional
53+
// M4 registers. (Performance test on phase32f::VectorLength::1048576: time decreased
54+
// from 5.952ms to 5.805ms on Muse Pi)
55+
// Additionally, when registers are nearly fully utilized (though not yet exhausted), the
56+
// compiler is likely to fail to optimize and may introduce slower memory access (e.g., in
57+
// cv::cv_hal_rvv::fast_atan_64).
58+
// Saving registers can also make this function more reusable in other contexts.
59+
// Therefore, vfmadd is not used here.
60+
auto a = __riscv_vfadd(__riscv_vfmul(c2, params.p7, vl), params.p5, vl);
61+
a = __riscv_vfadd(__riscv_vfmul(c2, a, vl), params.p3, vl);
62+
a = __riscv_vfadd(__riscv_vfmul(c2, a, vl), params.p1, vl);
4063
a = __riscv_vfmul(a, c, vl);
4164

42-
const auto mask = __riscv_vmflt(ax, ay, vl);
43-
a = __riscv_vfrsub_mu(mask, a, a, angle_90_deg, vl);
44-
45-
a = __riscv_vfrsub_mu(__riscv_vmflt(vx, 0.F, vl), a, a, angle_90_deg * 2,
46-
vl);
47-
a = __riscv_vfrsub_mu(__riscv_vmflt(vy, 0.F, vl), a, a, angle_90_deg * 4,
48-
vl);
65+
a = __riscv_vfrsub_mu(__riscv_vmflt(ax, ay, vl), a, a, params.angle_90, vl);
66+
a = __riscv_vfrsub_mu(__riscv_vmflt(vx, 0.F, vl), a, a, params.angle_90 * 2, vl);
67+
a = __riscv_vfrsub_mu(__riscv_vmflt(vy, 0.F, vl), a, a, params.angle_90 * 4, vl);
4968

5069
return a;
5170
}
5271

53-
} // namespace detail
54-
55-
inline int fast_atan_32(const float *y, const float *x, float *dst, size_t n,
56-
bool angle_in_deg) {
57-
const float scale = angle_in_deg ? 1.f : CV_PI / 180.f;
58-
const float p1 = detail::atan2_p1 * scale;
59-
const float p3 = detail::atan2_p3 * scale;
60-
const float p5 = detail::atan2_p5 * scale;
61-
const float p7 = detail::atan2_p7 * scale;
62-
const float angle_90_deg = 90.F * scale;
72+
} // namespace detail
6373

64-
static size_t vlmax = __riscv_vsetvlmax_e32m4();
65-
auto vp1 = __riscv_vfmv_v_f_f32m4(p1, vlmax);
66-
auto vp3 = __riscv_vfmv_v_f_f32m4(p3, vlmax);
67-
auto vp5 = __riscv_vfmv_v_f_f32m4(p5, vlmax);
74+
inline int fast_atan_32(const float* y, const float* x, float* dst, size_t n, bool angle_in_deg)
75+
{
76+
auto atan_params = angle_in_deg ? detail::atan_params_deg : detail::atan_params_rad;
6877

69-
for (size_t vl{}; n > 0; n -= vl) {
78+
for (size_t vl = 0; n > 0; n -= vl)
79+
{
7080
vl = __riscv_vsetvl_e32m4(n);
7181

7282
auto vy = __riscv_vle32_v_f32m4(y, vl);
7383
auto vx = __riscv_vle32_v_f32m4(x, vl);
7484

75-
auto a =
76-
detail::rvv_atan_f32(vy, vx, vl, p7, vp5, vp3, vp1, angle_90_deg);
85+
auto a = detail::rvv_atan(vy, vx, vl, atan_params);
7786

7887
__riscv_vse32(dst, a, vl);
7988

@@ -85,37 +94,22 @@ inline int fast_atan_32(const float *y, const float *x, float *dst, size_t n,
8594
return CV_HAL_ERROR_OK;
8695
}
8796

88-
inline int fast_atan_64(const double *y, const double *x, double *dst, size_t n,
89-
bool angle_in_deg) {
97+
inline int fast_atan_64(const double* y, const double* x, double* dst, size_t n, bool angle_in_deg)
98+
{
9099
// this also uses float32 version, ref: mathfuncs_core.simd.hpp
91100

92-
const float scale = angle_in_deg ? 1.f : CV_PI / 180.f;
93-
const float p1 = detail::atan2_p1 * scale;
94-
const float p3 = detail::atan2_p3 * scale;
95-
const float p5 = detail::atan2_p5 * scale;
96-
const float p7 = detail::atan2_p7 * scale;
97-
const float angle_90_deg = 90.F * scale;
101+
auto atan_params = angle_in_deg ? detail::atan_params_deg : detail::atan_params_rad;
98102

99-
static size_t vlmax = __riscv_vsetvlmax_e32m4();
100-
auto vp1 = __riscv_vfmv_v_f_f32m4(p1, vlmax);
101-
auto vp3 = __riscv_vfmv_v_f_f32m4(p3, vlmax);
102-
auto vp5 = __riscv_vfmv_v_f_f32m4(p5, vlmax);
103-
104-
for (size_t vl{}; n > 0; n -= vl) {
103+
for (size_t vl = 0; n > 0; n -= vl)
104+
{
105105
vl = __riscv_vsetvl_e64m8(n);
106106

107-
auto wy = __riscv_vle64_v_f64m8(y, vl);
108-
auto wx = __riscv_vle64_v_f64m8(x, vl);
109-
110-
auto vy = __riscv_vfncvt_f_f_w_f32m4(wy, vl);
111-
auto vx = __riscv_vfncvt_f_f_w_f32m4(wx, vl);
112-
113-
auto a =
114-
detail::rvv_atan_f32(vy, vx, vl, p7, vp5, vp3, vp1, angle_90_deg);
107+
auto vy = __riscv_vfncvt_f(__riscv_vle64_v_f64m8(y, vl), vl);
108+
auto vx = __riscv_vfncvt_f(__riscv_vle64_v_f64m8(x, vl), vl);
115109

116-
auto wa = __riscv_vfwcvt_f_f_v_f64m8(a, vl);
110+
auto a = detail::rvv_atan(vy, vx, vl, atan_params);
117111

118-
__riscv_vse64(dst, wa, vl);
112+
__riscv_vse64(dst, __riscv_vfwcvt_f(a, vl), vl);
119113

120114
x += vl;
121115
y += vl;
@@ -125,4 +119,4 @@ inline int fast_atan_64(const double *y, const double *x, double *dst, size_t n,
125119
return CV_HAL_ERROR_OK;
126120
}
127121

128-
} // namespace cv::cv_hal_rvv
122+
}} // namespace cv::cv_hal_rvv
+48Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
// This file is part of OpenCV project.
2+
// It is subject to the license terms in the LICENSE file found in the top-level directory
3+
// of this distribution and at http://opencv.org/license.html.
4+
5+
// Copyright (C) 2025, Institute of Software, Chinese Academy of Sciences.
6+
7+
#ifndef OPENCV_HAL_RVV_CART_TO_POLAR_HPP_INCLUDED
8+
#define OPENCV_HAL_RVV_CART_TO_POLAR_HPP_INCLUDED
9+
10+
#include <riscv_vector.h>
11+
12+
#include "hal_rvv_1p0/atan.hpp"
13+
#include "hal_rvv_1p0/sqrt.hpp"
14+
#include "hal_rvv_1p0/types.hpp"
15+
16+
namespace cv { namespace cv_hal_rvv {
17+
18+
#undef cv_hal_cartToPolar32f
19+
#define cv_hal_cartToPolar32f cv::cv_hal_rvv::cartToPolar<cv::cv_hal_rvv::RVV_F32M4>
20+
#undef cv_hal_cartToPolar64f
21+
#define cv_hal_cartToPolar64f cv::cv_hal_rvv::cartToPolar<cv::cv_hal_rvv::RVV_F64M8>
22+
23+
template <typename RVV_T, typename T = typename RVV_T::ElemType>
24+
inline int cartToPolar(const T* x, const T* y, T* mag, T* angle, int len, bool angleInDegrees)
25+
{
26+
using CalType = RVV_SameLen<float, RVV_T>;
27+
auto atan_params = angleInDegrees ? detail::atan_params_deg : detail::atan_params_rad;
28+
size_t vl;
29+
for (; len > 0; len -= (int)vl, x += vl, y += vl, mag += vl, angle += vl)
30+
{
31+
vl = RVV_T::setvl(len);
32+
33+
auto vx = CalType::cast(RVV_T::vload(x, vl), vl);
34+
auto vy = CalType::cast(RVV_T::vload(y, vl), vl);
35+
36+
auto vmag = detail::sqrt<2>(__riscv_vfmadd(vx, vx, __riscv_vfmul(vy, vy, vl), vl), vl);
37+
RVV_T::vstore(mag, RVV_T::cast(vmag, vl), vl);
38+
39+
auto vangle = detail::rvv_atan(vy, vx, vl, atan_params);
40+
RVV_T::vstore(angle, RVV_T::cast(vangle, vl), vl);
41+
}
42+
43+
return CV_HAL_ERROR_OK;
44+
}
45+
46+
}} // namespace cv::cv_hal_rvv
47+
48+
#endif // OPENCV_HAL_RVV_CART_TO_POLAR_HPP_INCLUDED

‎modules/core/perf/perf_math.cpp

Copy file name to clipboardExpand all lines: modules/core/perf/perf_math.cpp
+22Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,28 @@ PERF_TEST_P(MagnitudeFixture, Magnitude,
5757
SANITY_CHECK_NOTHING();
5858
}
5959

60+
///////////// Cart to Polar /////////////
61+
62+
typedef Size_MatType CartToPolarFixture;
63+
64+
PERF_TEST_P(CartToPolarFixture, CartToPolar,
65+
testing::Combine(testing::Values(TYPICAL_MAT_SIZES), testing::Values(CV_32F, CV_64F)))
66+
{
67+
cv::Size size = std::get<0>(GetParam());
68+
int type = std::get<1>(GetParam());
69+
70+
cv::Mat x(size, type);
71+
cv::Mat y(size, type);
72+
cv::Mat magnitude(size, type);
73+
cv::Mat angle(size, type);
74+
75+
declare.in(x, y, WARMUP_RNG).out(magnitude, angle);
76+
77+
TEST_CYCLE() cv::cartToPolar(x, y, magnitude, angle);
78+
79+
SANITY_CHECK_NOTHING();
80+
}
81+
6082
// generates random vectors, performs Gram-Schmidt orthogonalization on them
6183
Mat randomOrtho(int rows, int ftype, RNG& rng)
6284
{

‎modules/core/src/mathfuncs_core.simd.hpp

Copy file name to clipboardExpand all lines: modules/core/src/mathfuncs_core.simd.hpp
+30-48Lines changed: 30 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -73,48 +73,30 @@ static inline float atan_f32(float y, float x)
7373
}
7474
#endif
7575

76-
#if CV_SIMD
76+
#if (CV_SIMD || CV_SIMD_SCALABLE)
7777

78-
struct v_atan_f32
78+
v_float32 v_atan_f32(const v_float32& y, const v_float32& x)
7979
{
80-
explicit v_atan_f32(const float& scale)
81-
{
82-
eps = vx_setall_f32((float)DBL_EPSILON);
83-
z = vx_setzero_f32();
84-
p7 = vx_setall_f32(atan2_p7);
85-
p5 = vx_setall_f32(atan2_p5);
86-
p3 = vx_setall_f32(atan2_p3);
87-
p1 = vx_setall_f32(atan2_p1);
88-
val90 = vx_setall_f32(90.f);
89-
val180 = vx_setall_f32(180.f);
90-
val360 = vx_setall_f32(360.f);
91-
s = vx_setall_f32(scale);
92-
}
93-
94-
v_float32 compute(const v_float32& y, const v_float32& x)
95-
{
96-
v_float32 ax = v_abs(x);
97-
v_float32 ay = v_abs(y);
98-
v_float32 c = v_div(v_min(ax, ay), v_add(v_max(ax, ay), this->eps));
99-
v_float32 cc = v_mul(c, c);
100-
v_float32 a = v_mul(v_fma(v_fma(v_fma(cc, this->p7, this->p5), cc, this->p3), cc, this->p1), c);
101-
a = v_select(v_ge(ax, ay), a, v_sub(this->val90, a));
102-
a = v_select(v_lt(x, this->z), v_sub(this->val180, a), a);
103-
a = v_select(v_lt(y, this->z), v_sub(this->val360, a), a);
104-
return v_mul(a, this->s);
105-
}
106-
107-
v_float32 eps;
108-
v_float32 z;
109-
v_float32 p7;
110-
v_float32 p5;
111-
v_float32 p3;
112-
v_float32 p1;
113-
v_float32 val90;
114-
v_float32 val180;
115-
v_float32 val360;
116-
v_float32 s;
117-
};
80+
v_float32 eps = vx_setall_f32((float)DBL_EPSILON);
81+
v_float32 z = vx_setzero_f32();
82+
v_float32 p7 = vx_setall_f32(atan2_p7);
83+
v_float32 p5 = vx_setall_f32(atan2_p5);
84+
v_float32 p3 = vx_setall_f32(atan2_p3);
85+
v_float32 p1 = vx_setall_f32(atan2_p1);
86+
v_float32 val90 = vx_setall_f32(90.f);
87+
v_float32 val180 = vx_setall_f32(180.f);
88+
v_float32 val360 = vx_setall_f32(360.f);
89+
90+
v_float32 ax = v_abs(x);
91+
v_float32 ay = v_abs(y);
92+
v_float32 c = v_div(v_min(ax, ay), v_add(v_max(ax, ay), eps));
93+
v_float32 cc = v_mul(c, c);
94+
v_float32 a = v_mul(v_fma(v_fma(v_fma(cc, p7, p5), cc, p3), cc, p1), c);
95+
a = v_select(v_ge(ax, ay), a, v_sub(val90, a));
96+
a = v_select(v_lt(x, z), v_sub(val180, a), a);
97+
a = v_select(v_lt(y, z), v_sub(val360, a), a);
98+
return a;
99+
}
118100

119101
#endif
120102

@@ -124,9 +106,9 @@ static void cartToPolar32f_(const float *X, const float *Y, float *mag, float *a
124106
{
125107
float scale = angleInDegrees ? 1.f : (float)(CV_PI/180);
126108
int i = 0;
127-
#if CV_SIMD
109+
#if (CV_SIMD || CV_SIMD_SCALABLE)
128110
const int VECSZ = VTraits<v_float32>::vlanes();
129-
v_atan_f32 v(scale);
111+
v_float32 s = vx_setall_f32(scale);
130112

131113
for( ; i < len; i += VECSZ*2 )
132114
{
@@ -148,8 +130,8 @@ static void cartToPolar32f_(const float *X, const float *Y, float *mag, float *a
148130
v_float32 m0 = v_sqrt(v_muladd(x0, x0, v_mul(y0, y0)));
149131
v_float32 m1 = v_sqrt(v_muladd(x1, x1, v_mul(y1, y1)));
150132

151-
v_float32 r0 = v.compute(y0, x0);
152-
v_float32 r1 = v.compute(y1, x1);
133+
v_float32 r0 = v_mul(v_atan_f32(y0, x0), s);
134+
v_float32 r1 = v_mul(v_atan_f32(y1, x1), s);
153135

154136
v_store(mag + i, m0);
155137
v_store(mag + i + VECSZ, m1);
@@ -200,9 +182,9 @@ static void fastAtan32f_(const float *Y, const float *X, float *angle, int len,
200182
{
201183
float scale = angleInDegrees ? 1.f : (float)(CV_PI/180);
202184
int i = 0;
203-
#if CV_SIMD
185+
#if (CV_SIMD || CV_SIMD_SCALABLE)
204186
const int VECSZ = VTraits<v_float32>::vlanes();
205-
v_atan_f32 v(scale);
187+
v_float32 s = vx_setall_f32(scale);
206188

207189
for( ; i < len; i += VECSZ*2 )
208190
{
@@ -221,8 +203,8 @@ static void fastAtan32f_(const float *Y, const float *X, float *angle, int len,
221203
v_float32 y1 = vx_load(Y + i + VECSZ);
222204
v_float32 x1 = vx_load(X + i + VECSZ);
223205

224-
v_float32 r0 = v.compute(y0, x0);
225-
v_float32 r1 = v.compute(y1, x1);
206+
v_float32 r0 = v_mul(v_atan_f32(y0, x0), s);
207+
v_float32 r1 = v_mul(v_atan_f32(y1, x1), s);
226208

227209
v_store(angle + i, r0);
228210
v_store(angle + i + VECSZ, r1);

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.