Open
Description
Describe the workflow you want to enable
For some algorithms of kNN it is desired to use other calculations at the end of this method besides using the mean. In some cases using a geometric average or arithmetic-geometric average is needed instead of the regular average.
Describe your proposed solution
Add a parameter to the KNeighborsRegressor class, for computing the average at the end of the predict() method - something like:
# Use the default way to compute the average with a new parameter and string value of 'mean' which would be the default
KNeighborsRegressor(n_neighbors=5, metric=manhattan_distance, weights='uniform', average='mean')
# Allow callables for the new average parameter...
def geometric_mean(l):
k=1
for i in l:
k*=i
return pow(k, 1/len(l))
KNeighborsRegressor(n_neighbors=5, metric=manhattan_distance, weights='uniform', average=geometric_mean)
# Allow callables for the new average parameter...
def arithmetic_geometric_mean(l):
tolerance=1e-10
a0 = mean(l)
g0 = geometric_mean(l)
an, gn = (a0 + g0) / 2.0, math.sqrt(a0 * g0)
while abs(an - gn) > tolerance:
an, gn = (an + gn) / 2.0, math.sqrt(an * gn)
return an
KNeighborsRegressor(n_neighbors=5, metric=manhattan_distance, weights='uniform', average=arithmetic_geometric_mean)
Describe alternatives you've considered, if relevant
Not use scikit-learn's KNeighborsRegressor and build my own:
#
# Here avg_func is the callable method to compute the average
#
def kNN_regressor(q, R, truth, k, avg_func=my_mean, distance_func=euclidean_distance, **kwargs):
idx = [0]*k
dist = [0]*k
for i in range(k):
idx[i] = i
if 'minkowski_param' in kwargs:
mink_p = kwargs['minkowski_param']
dist[i] = distance_func(q, R[i], mink_p)
else:
dist[i] = distance_func(q, R[i])
max_idx = dist.index(max(dist))
for i in range(k, len(R)):
if 'minkowski_param' in kwargs:
mink_p = kwargs['minkowski_param']
d = distance_func(q, R[i], mink_p)
else:
d = distance_func(q, R[i])
if d < dist[max_idx]:
dist[max_idx] = d
idx[max_idx] = i
max_idx = dist.index(max(dist))
t = []
for i in range(k):
t.append(truth[idx[i]])
return avg_func(t)
Additional context
No response