-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
FIX : fix SVC pickle with callable kernel #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looks good :) |
I would put the self.kernel = kernel down further with the rest of the instance variables. |
Also using callable(self.kernel) is preferable to checking for call. |
which line? can you comment on the line? |
I agree with the callable but it's for compat with old python. I found it elsewhere in the code base |
I tried your code, it works. One issue with this is that previously pickled estimators will break, so we may need to add a fallback that we check both self.kernel and self.kernel_function. In the future this could be deprecated but it would be a bit harsh to break them now. |
Move self.kernel = kernel between "impl" and "degree" below so that it follows the style of the other variables. |
There is a problem |
Before all of the libsvm commands just make a local variable called kernel and have that be "precomputed" if there is a user defined function. Then call predict/decision_function/etc with that. |
In general, if you want your object to pickle with a callable kernel, you need to do this: class MyKernel(object):
def __call__(self, X, Y):
return np.dot(X, Y.T) |
I still need to fix the sparse case .... |
This pull-request made me realize that callable kernels are not terribly useful in scikit-learn since they are handled internally like a precomputed kernel. Normally, you want to store only the support vectors in the model but the precomputed kernel approach forces us to store the entire dataset... |
@@ -661,6 +661,15 @@ def test_linearsvc_verbose(): | ||
os.dup2(stdout, 1) # restore original stdout | ||
|
||
|
||
def test_svc_pickle_with_callable_kernel(): | ||
a = svm.SVC(kernel=lambda x, y: np.dot(x, y.T), probability=True) | ||
b = base.clone(a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, cloning is currently not implemented with pickle. Would be better rename this test to avoid any confusion. A test that does real pickling would be nice too but I am afraid that lambdas are not picklable hence the callable kernel would have to be defined in some python class wrapper instead.
works now with sparse data also I see C=None by default in SVC which means that default C produces a crash. Shall I put it to 1 by default? Finally we've experimented on the scale_C last week with @jaquesgrobler and @GaelVaroquaux . We'll summarize on the ML next week. |
I agree but at least the callable argument allows to use grid search |
This looks good. I am rebasing and merging. The comments still open are absolutely relevant (including the fact that the kernel architecture in the scikit is a hack) but call for further work in a separate pull request. Thanks @bwhite for raising the problem, and @agramfort for fixing it. |
# More detailed explanatory text, if necessary. Wrap it to about 72 # characters or so. In some contexts, the first line is treated as the # subject of the commit and the rest of the text as the body. The # blank line separating the summary from the body is critical (unless # you omit the body entirely); various tools like `log`, `shortlog` # and `rebase` can get confused if you run the two together. # Explain the problem that this commit is solving. Focus on why you # are making this change as opposed to how (the code explains that). # Are there side effects or other unintuitive consequences of this # change? Here's the place to explain them. # Further paragraphs come after blank lines. # - Bullet points are okay, too # - Typically a hyphen or asterisk is used for the bullet, preceded # by a single space, with blank lines in between, but conventions # vary here # If you use an issue tracker, put references to them at the bottom, # like this: # Resolves: scikit-learn#123 # See also: scikit-learn#456, scikit-learn#789
# More detailed explanatory text, if necessary. Wrap it to about 72 # characters or so. In some contexts, the first line is treated as the # subject of the commit and the rest of the text as the body. The # blank line separating the summary from the body is critical (unless # you omit the body entirely); various tools like `log`, `shortlog` # and `rebase` can get confused if you run the two together. # Explain the problem that this commit is solving. Focus on why you # are making this change as opposed to how (the code explains that). # Are there side effects or other unintuitive consequences of this # change? Here's the place to explain them. # Further paragraphs come after blank lines. # - Bullet points are okay, too # - Typically a hyphen or asterisk is used for the bullet, preceded # by a single space, with blank lines in between, but conventions # vary here # If you use an issue tracker, put references to them at the bottom, # like this: # Resolves: scikit-learn#123 # See also: scikit-learn#456, scikit-learn#789
# More detailed explanatory text, if necessary. Wrap it to about 72 # characters or so. In some contexts, the first line is treated as the # subject of the commit and the rest of the text as the body. The # blank line separating the summary from the body is critical (unless # you omit the body entirely); various tools like `log`, `shortlog` # and `rebase` can get confused if you run the two together. # Explain the problem that this commit is solving. Focus on why you # are making this change as opposed to how (the code explains that). # Are there side effects or other unintuitive consequences of this # change? Here's the place to explain them. # Further paragraphs come after blank lines. # - Bullet points are okay, too # - Typically a hyphen or asterisk is used for the bullet, preceded # by a single space, with blank lines in between, but conventions # vary here # If you use an issue tracker, put references to them at the bottom, # like this: # Resolves: scikit-learn#123 # See also: scikit-learn#456, scikit-learn#789
# More detailed explanatory text, if necessary. Wrap it to about 72 # characters or so. In some contexts, the first line is treated as the # subject of the commit and the rest of the text as the body. The # blank line separating the summary from the body is critical (unless # you omit the body entirely); various tools like `log`, `shortlog` # and `rebase` can get confused if you run the two together. # Explain the problem that this commit is solving. Focus on why you # are making this change as opposed to how (the code explains that). # Are there side effects or other unintuitive consequences of this # change? Here's the place to explain them. # Further paragraphs come after blank lines. # - Bullet points are okay, too # - Typically a hyphen or asterisk is used for the bullet, preceded # by a single space, with blank lines in between, but conventions # vary here # If you use an issue tracker, put references to them at the bottom, # like this: # Resolves: scikit-learn#123 # See also: scikit-learn#456, scikit-learn#789
# More detailed explanatory text, if necessary. Wrap it to about 72 # characters or so. In some contexts, the first line is treated as the # subject of the commit and the rest of the text as the body. The # blank line separating the summary from the body is critical (unless # you omit the body entirely); various tools like `log`, `shortlog` # and `rebase` can get confused if you run the two together. # Explain the problem that this commit is solving. Focus on why you # are making this change as opposed to how (the code explains that). # Are there side effects or other unintuitive consequences of this # change? Here's the place to explain them. # Further paragraphs come after blank lines. # - Bullet points are okay, too # - Typically a hyphen or asterisk is used for the bullet, preceded # by a single space, with blank lines in between, but conventions # vary here # If you use an issue tracker, put references to them at the bottom, # like this: # Resolves: scikit-learn#123 # See also: scikit-learn#456, scikit-learn#789
for #787
@bwhite tell me if it fixes your problem