-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
WIP: Sparse coder #456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Sparse coder #456
Conversation
@@ -708,6 +708,77 @@ def transform(self, X, y=None): | ||
return code | ||
|
||
|
||
class SparseCoder(BaseDictionaryLearning): | ||
""" Sparse coding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please:
"""Sparse coding
PEP257 :)
With regards to precomputing wavelets dictionnaries, I don't not think that this is a good idea, because a Wavelet transform can be implemented much faster than a dot product. In addition, it is a orthogonal basis, thus sparse coding should preferably be done using soft thresholding. Finally, this would be fairly image or signal specific, and I don't like the idea of such application-specific code creeping in a general-purpose object. |
@@ -708,8 +708,79 @@ def transform(self, X, y=None): | ||
return code | ||
|
||
|
||
class SparseCoder(BaseDictionaryLearning): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This object should be imported in the init of decomposition.
It should also be added in docs/modules/classes.rst
My biggest comment is that it is missing a narrative documentation. In Gael |
@@ -129,7 +129,8 @@ def sparse_encode(X, Y, gram=None, cov=None, algorithm='lasso_lars', | ||
max_iter=1000) | ||
for k in xrange(n_features): | ||
# A huge amount of time is spent in this loop. It needs to be | ||
# tight. | ||
# tight | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pep8 whitespace ;)
I think it would be good if the SparseCoder would be referenced in the DictionaryLearning "see also" section. |
|
||
if algorithm == 'lasso_lars': | ||
if alpha is None: | ||
alpha = 1. | ||
alpha /= n_features # account for scaling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that before, dict_learning and dict_learning_online would perform very different on exactly the same data and for the same alpha. This was indeed caused by sometimes forgetting to divide by n_features and this fixes it.
…into sparse-coder Conflicts: sklearn/linear_model/least_angle.py
merged, thanks |
# More detailed explanatory text, if necessary. Wrap it to about 72 # characters or so. In some contexts, the first line is treated as the # subject of the commit and the rest of the text as the body. The # blank line separating the summary from the body is critical (unless # you omit the body entirely); various tools like `log`, `shortlog` # and `rebase` can get confused if you run the two together. # Explain the problem that this commit is solving. Focus on why you # are making this change as opposed to how (the code explains that). # Are there side effects or other unintuitive consequences of this # change? Here's the place to explain them. # Further paragraphs come after blank lines. # - Bullet points are okay, too # - Typically a hyphen or asterisk is used for the bullet, preceded # by a single space, with blank lines in between, but conventions # vary here # If you use an issue tracker, put references to them at the bottom, # like this: # Resolves: scikit-learn#123 # See also: scikit-learn#456, scikit-learn#789
# More detailed explanatory text, if necessary. Wrap it to about 72 # characters or so. In some contexts, the first line is treated as the # subject of the commit and the rest of the text as the body. The # blank line separating the summary from the body is critical (unless # you omit the body entirely); various tools like `log`, `shortlog` # and `rebase` can get confused if you run the two together. # Explain the problem that this commit is solving. Focus on why you # are making this change as opposed to how (the code explains that). # Are there side effects or other unintuitive consequences of this # change? Here's the place to explain them. # Further paragraphs come after blank lines. # - Bullet points are okay, too # - Typically a hyphen or asterisk is used for the bullet, preceded # by a single space, with blank lines in between, but conventions # vary here # If you use an issue tracker, put references to them at the bottom, # like this: # Resolves: scikit-learn#123 # See also: scikit-learn#456, scikit-learn#789
# More detailed explanatory text, if necessary. Wrap it to about 72 # characters or so. In some contexts, the first line is treated as the # subject of the commit and the rest of the text as the body. The # blank line separating the summary from the body is critical (unless # you omit the body entirely); various tools like `log`, `shortlog` # and `rebase` can get confused if you run the two together. # Explain the problem that this commit is solving. Focus on why you # are making this change as opposed to how (the code explains that). # Are there side effects or other unintuitive consequences of this # change? Here's the place to explain them. # Further paragraphs come after blank lines. # - Bullet points are okay, too # - Typically a hyphen or asterisk is used for the bullet, preceded # by a single space, with blank lines in between, but conventions # vary here # If you use an issue tracker, put references to them at the bottom, # like this: # Resolves: scikit-learn#123 # See also: scikit-learn#456, scikit-learn#789
# More detailed explanatory text, if necessary. Wrap it to about 72 # characters or so. In some contexts, the first line is treated as the # subject of the commit and the rest of the text as the body. The # blank line separating the summary from the body is critical (unless # you omit the body entirely); various tools like `log`, `shortlog` # and `rebase` can get confused if you run the two together. # Explain the problem that this commit is solving. Focus on why you # are making this change as opposed to how (the code explains that). # Are there side effects or other unintuitive consequences of this # change? Here's the place to explain them. # Further paragraphs come after blank lines. # - Bullet points are okay, too # - Typically a hyphen or asterisk is used for the bullet, preceded # by a single space, with blank lines in between, but conventions # vary here # If you use an issue tracker, put references to them at the bottom, # like this: # Resolves: scikit-learn#123 # See also: scikit-learn#456, scikit-learn#789
# More detailed explanatory text, if necessary. Wrap it to about 72 # characters or so. In some contexts, the first line is treated as the # subject of the commit and the rest of the text as the body. The # blank line separating the summary from the body is critical (unless # you omit the body entirely); various tools like `log`, `shortlog` # and `rebase` can get confused if you run the two together. # Explain the problem that this commit is solving. Focus on why you # are making this change as opposed to how (the code explains that). # Are there side effects or other unintuitive consequences of this # change? Here's the place to explain them. # Further paragraphs come after blank lines. # - Bullet points are okay, too # - Typically a hyphen or asterisk is used for the bullet, preceded # by a single space, with blank lines in between, but conventions # vary here # If you use an issue tracker, put references to them at the bottom, # like this: # Resolves: scikit-learn#123 # See also: scikit-learn#456, scikit-learn#789
This quick and dirty pull request aims to add an estimator (transformer) object that implements sparse coding against a fixed dictionary, in the form of the SparseCoder object.
At the same time this will address the small inconsistencies and missing info in docs that came up.