Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

FIX : fix SVC pickle with callable kernel #789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 22, 2012

Conversation

agramfort
Copy link
Member

for #787

@bwhite tell me if it fixes your problem

@amueller
Copy link
Member

Looks good :)

@bwhite
Copy link

bwhite commented Apr 20, 2012

I would put the self.kernel = kernel down further with the rest of the instance variables.

@bwhite
Copy link

bwhite commented Apr 20, 2012

Also using callable(self.kernel) is preferable to checking for call.

@agramfort
Copy link
Member Author

which line? can you comment on the line?

@agramfort
Copy link
Member Author

I agree with the callable but it's for compat with old python. I found it elsewhere in the code base

@bwhite
Copy link

bwhite commented Apr 20, 2012

I tried your code, it works. One issue with this is that previously pickled estimators will break, so we may need to add a fallback that we check both self.kernel and self.kernel_function. In the future this could be deprecated but it would be a bit harsh to break them now.

@bwhite
Copy link

bwhite commented Apr 20, 2012

https://github.com/agramfort/scikit-learn/blob/43571bc254f53164e4e7d5e3293615516be11def/sklearn/svm/base.py#L83

Move self.kernel = kernel between "impl" and "degree" below so that it follows the style of the other variables.

@bwhite
Copy link

bwhite commented Apr 20, 2012

There is a problem
File "libsvm.pyx", line 392, in sklearn.svm.libsvm.decision_function (sklearn/svm/libsvm.c:4608)
TypeError: Argument 'kernel' has incorrect type (expected str, got builtin_function_or_method)

@bwhite
Copy link

bwhite commented Apr 20, 2012

Before all of the libsvm commands just make a local variable called kernel and have that be "precomputed" if there is a user defined function. Then call predict/decision_function/etc with that.

@mblondel
Copy link
Member

In general, if you want your object to pickle with a callable kernel, you need to do this:

class MyKernel(object):
    def __call__(self, X, Y):
        return np.dot(X, Y.T)

@agramfort
Copy link
Member Author

@bwhite fixed ... I think

@bwhite regarding your issue of previously pickled estimators, we cannot support backward compat on pickled objects. I suggest you use the hack you suggested in the first place on your side and the problem will be fixed for next time.

@agramfort
Copy link
Member Author

I still need to fix the sparse case ....

@mblondel
Copy link
Member

This pull-request made me realize that callable kernels are not terribly useful in scikit-learn since they are handled internally like a precomputed kernel. Normally, you want to store only the support vectors in the model but the precomputed kernel approach forces us to store the entire dataset...

@@ -661,6 +661,15 @@ def test_linearsvc_verbose():
os.dup2(stdout, 1) # restore original stdout


def test_svc_pickle_with_callable_kernel():
a = svm.SVC(kernel=lambda x, y: np.dot(x, y.T), probability=True)
b = base.clone(a)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, cloning is currently not implemented with pickle. Would be better rename this test to avoid any confusion. A test that does real pickling would be nice too but I am afraid that lambdas are not picklable hence the callable kernel would have to be defined in some python class wrapper instead.

@agramfort
Copy link
Member Author

works now with sparse data

also I see C=None by default in SVC which means that default C produces a crash.

Shall I put it to 1 by default?

Finally we've experimented on the scale_C last week with @jaquesgrobler and @GaelVaroquaux . We'll summarize on the ML next week.

@agramfort
Copy link
Member Author

This pull-request made me realize that callable kernels are not terribly useful in scikit-learn since they are handled internally like a precomputed kernel. Normally, you want to store only the support vectors in the model but the precomputed kernel approach forces us to store the entire dataset...

I agree but at least the callable argument allows to use grid search
and cv objects as we pass a real X. What I've done in the past is to
cache the kernel function to avoid some recomputations

@GaelVaroquaux
Copy link
Member

This looks good. I am rebasing and merging. The comments still open are absolutely relevant (including the fact that the kernel architecture in the scikit is a hack) but call for further work in a separate pull request.

Thanks @bwhite for raising the problem, and @agramfort for fixing it.

@GaelVaroquaux GaelVaroquaux merged commit 89d73c7 into scikit-learn:master Apr 22, 2012
gbolmier added a commit to gbolmier/scikit-learn that referenced this pull request May 29, 2020
# More detailed explanatory text, if necessary. Wrap it to about 72
# characters or so. In some contexts, the first line is treated as the
# subject of the commit and the rest of the text as the body. The
# blank line separating the summary from the body is critical (unless
# you omit the body entirely); various tools like `log`, `shortlog`
# and `rebase` can get confused if you run the two together.

# Explain the problem that this commit is solving. Focus on why you
# are making this change as opposed to how (the code explains that).
# Are there side effects or other unintuitive consequences of this
# change? Here's the place to explain them.

# Further paragraphs come after blank lines.

#  - Bullet points are okay, too

#  - Typically a hyphen or asterisk is used for the bullet, preceded
#    by a single space, with blank lines in between, but conventions
#    vary here

# If you use an issue tracker, put references to them at the bottom,
# like this:

# Resolves: scikit-learn#123
# See also: scikit-learn#456, scikit-learn#789
gbolmier added a commit to gbolmier/scikit-learn that referenced this pull request May 30, 2020
# More detailed explanatory text, if necessary. Wrap it to about 72
# characters or so. In some contexts, the first line is treated as the
# subject of the commit and the rest of the text as the body. The
# blank line separating the summary from the body is critical (unless
# you omit the body entirely); various tools like `log`, `shortlog`
# and `rebase` can get confused if you run the two together.

# Explain the problem that this commit is solving. Focus on why you
# are making this change as opposed to how (the code explains that).
# Are there side effects or other unintuitive consequences of this
# change? Here's the place to explain them.

# Further paragraphs come after blank lines.

#  - Bullet points are okay, too

#  - Typically a hyphen or asterisk is used for the bullet, preceded
#    by a single space, with blank lines in between, but conventions
#    vary here

# If you use an issue tracker, put references to them at the bottom,
# like this:

# Resolves: scikit-learn#123
# See also: scikit-learn#456, scikit-learn#789
gbolmier added a commit to gbolmier/scikit-learn that referenced this pull request Aug 9, 2020
# More detailed explanatory text, if necessary. Wrap it to about 72
# characters or so. In some contexts, the first line is treated as the
# subject of the commit and the rest of the text as the body. The
# blank line separating the summary from the body is critical (unless
# you omit the body entirely); various tools like `log`, `shortlog`
# and `rebase` can get confused if you run the two together.

# Explain the problem that this commit is solving. Focus on why you
# are making this change as opposed to how (the code explains that).
# Are there side effects or other unintuitive consequences of this
# change? Here's the place to explain them.

# Further paragraphs come after blank lines.

#  - Bullet points are okay, too

#  - Typically a hyphen or asterisk is used for the bullet, preceded
#    by a single space, with blank lines in between, but conventions
#    vary here

# If you use an issue tracker, put references to them at the bottom,
# like this:

# Resolves: scikit-learn#123
# See also: scikit-learn#456, scikit-learn#789
gbolmier added a commit to gbolmier/scikit-learn that referenced this pull request Aug 9, 2020
# More detailed explanatory text, if necessary. Wrap it to about 72
# characters or so. In some contexts, the first line is treated as the
# subject of the commit and the rest of the text as the body. The
# blank line separating the summary from the body is critical (unless
# you omit the body entirely); various tools like `log`, `shortlog`
# and `rebase` can get confused if you run the two together.

# Explain the problem that this commit is solving. Focus on why you
# are making this change as opposed to how (the code explains that).
# Are there side effects or other unintuitive consequences of this
# change? Here's the place to explain them.

# Further paragraphs come after blank lines.

#  - Bullet points are okay, too

#  - Typically a hyphen or asterisk is used for the bullet, preceded
#    by a single space, with blank lines in between, but conventions
#    vary here

# If you use an issue tracker, put references to them at the bottom,
# like this:

# Resolves: scikit-learn#123
# See also: scikit-learn#456, scikit-learn#789
gbolmier added a commit to gbolmier/scikit-learn that referenced this pull request Aug 9, 2020
# More detailed explanatory text, if necessary. Wrap it to about 72
# characters or so. In some contexts, the first line is treated as the
# subject of the commit and the rest of the text as the body. The
# blank line separating the summary from the body is critical (unless
# you omit the body entirely); various tools like `log`, `shortlog`
# and `rebase` can get confused if you run the two together.

# Explain the problem that this commit is solving. Focus on why you
# are making this change as opposed to how (the code explains that).
# Are there side effects or other unintuitive consequences of this
# change? Here's the place to explain them.

# Further paragraphs come after blank lines.

#  - Bullet points are okay, too

#  - Typically a hyphen or asterisk is used for the bullet, preceded
#    by a single space, with blank lines in between, but conventions
#    vary here

# If you use an issue tracker, put references to them at the bottom,
# like this:

# Resolves: scikit-learn#123
# See also: scikit-learn#456, scikit-learn#789
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.