Open
Description
Describe the workflow you want to enable
I see that KNeighborsClassifier handles tiebreakers by taking the mode of the classes contained with the tie. What about using the sum of the distances as the tiebreaker instead (of using the mode.)
I feel that taking the least sum of distances is a better educated guess with minimal changes to the code.
Describe your proposed solution
Create two dictionaries to keep a count of the Class count and the sum of Class distances.
If there is no tie return the Class value that won the vote (as normal).
If there is a tie return the Class that has the minimum sum of distances (instead of what scipy.stats.mode yields which I believe will always be the smallest Class value from the Classes within the tie.)
dict_count = {}
dict_dist = {}
for i in range(num_neighbors):
# Get the class from the i-th neighbor - here my Class value is at offset 13
c = neighbors[i][13]
# Keep a dictionary of Class counts
if c in dict_count:
dict_count[c] = dict_count[c] + 1
else:
dict_count[c] = 1
# Keep a dictionary of total Class distances
if c in dict_dist:
dict_dist[c] = dict_dist[c] + distances[i]
else:
dict_dist[c] = distances[i]
# Return the key within the dictionary that has the largest value
best_keys = [key for key, value in dict_count.items() if value == max(dict_count.values())]
if len(best_keys) == 1:
prediction = best_keys[0]
else: # There was a tie
tie_dict = {}
for b in best_keys:
tie_dict[b] = dict_dist[b]
prediction = min(tie_dict, key=tie_dict.get)
return prediction
Describe alternatives you've considered, if relevant
Writing my own k-NN algorithm 🙁
Additional context
No response