Feature/balanced random forest #2

potash · Apr 11, 2017

Reference Issue

Failing flake8 tests.

What does this implement/fix? Explain your changes.

Breaks long lines, uses range instead of xrange for python 3 compatibility.

Any other comments?

* resurrect quantile scaler * move the code in the pre-processing module * first draft * Add tests. * Fix bug in QuantileNormalizer. * Add quantile_normalizer. * Implement pickling * create a specific function for dense transform * Create a fit function for the dense case * Create a toy examples * First draft with sparse matrices * remove useless functions and non-negative sparse compatibility * fix slice call * Fix tests of QuantileNormalizer. * Fix estimator compatibility * List of functions became tuple of functions * Check X consistency at transform and inverse transform time * fix doc * Add negative ValueError tests for QuantileNormalizer. * Fix cosmetics * Fix compatibility numpy <= 1.8 * Add n_features tests and correct ValueError. * PEP8 * fix fill_value for early scipy compatibility * simplify sampling * Fix tests. * removing last pring * Change choice for permutation * cosmetics * fix remove remaining choice * DOC * Fix inconsistencies * pep8 * Add checker for init parameters. * hack bounds and make a test * FIX/TST bounds are provided by the fitting and not X at transform * PEP8 * FIX/TST axis should be <= 1 * PEP8 * ENH Add parameter ignore_implicit_zeros * ENH match output distribution * ENH clip the data to avoid infinity due to output PDF * FIX ENH restraint to uniform and norm * [MRG] ENH Add example comparing the distribution of all scaling preprocessor (#2) * ENH Add example comparing the distribution of all scaling preprocessor * Remove Jupyter notebook convert * FIX/ENH Select feat before not after; Plot interquantile data range for all * Add heatmap legend * Remove comment maybe? * Move doc from robust_scaling to plot_all_scaling; Need to update doc * Update the doc * Better aesthetics; Better spacing and plot colormap only at end * Shameless author re-ordering ;P * Use env python for she-bang * TST Validity of output_pdf * EXA Use OrderedDict; Make it easier to add more transformations * FIX PEP8 and replace scipy.stats by str in example * FIX remove useless import * COSMET change variable names * FIX change output_pdf occurence to output_distribution * FIX partial fixies from comments * COMIT change class name and code structure * COSMIT change direction to inverse * FIX factorize transform in _transform_col * PEP8 * FIX change the magic 10 * FIX add interp1d to fixes * FIX/TST allow negative entries when ignore_implicit_zeros is True * FIX use np.interp instead of sp.interpolate.interp1d * FIX/TST fix tests * DOC start checking doc * TST add test to check the behaviour of interp numpy * TST/EHN Add the possibility to add noise to compute quantile * FIX factorize quantile computation * FIX fixes issues * PEP8 * FIX/DOC correct doc * TST/DOC improve doc and add random state * EXA add examples to illustrate the use of smoothing_noise * FIX/DOC fix some grammar * DOC fix example * DOC/EXA make plot titles more succint * EXA improve explanation * EXA improve the docstring * DOC add a bit more documentation * FIX advance review * TST add subsampling test * DOC/TST better example for the docstring * DOC add ellipsis to docstring * FIX address olivier comments * FIX remove random_state in sparse.rand * FIX spelling doc * FIX cite example in user guide and docstring * FIX olivier comments * EHN improve the example comparing all the pre-processing methods * FIX/DOC remove title * FIX change the scaling of the figure * FIX plotting layout * FIX ratio w/h * Reorder and reword the plot_all_scaling example * Fix aspect ratio and better explanations in the plot_all_scaling.py example * Fix broken link and remove useless sentence * FIX fix couples of spelling * FIX comments joel * FIX/DOC address documentation comments * FIX address comments joel * FIX inline sparse and dense transform * PEP8 * TST/DOC temporary skipping test * FIX raise an error if n_quantiles > subsample * FIX wording in smoothing_noise example * EXA Denis comments * FIX rephrasing * FIX make smoothing_noise to be a boolearn and change doc * FIX address comments * FIX verbose the doc slightly more * PEP8/DOC * ENH: 2-ways interpolation to avoid smoothing_noise Simplifies also the code, examples, and documentation

initial PR commit seq_dataset.pyx generated from template seq_dataset.pyx generated from template #2 rename variables fused types consistency test for seq_dataset a sklearn/utils/tests/test_seq_dataset.py new if statement add doc sklearn/utils/seq_dataset.pyx.tp minor changes minor changes typo fix check numeric accuracy only up 5th decimal Address oliver's request for changing test name add test for make_dataset and rename a variable in test_seq_dataset

* add test for _preprocess_data and make it consistent * fix pep8 * add doc, cast systematically y in X.dtype and update test_coordinate_descent.py * test if input values don't change with copy=True * test if input values don't change with copy=True #2 * fix doc * fix doc #2 * fix doc #3

…y calculation (scikit-learn#11464) * Fix to allow M * Updated MAE test to consider sample_weights in calculation * Removed comment * Fixed: E501 line too long (82 > 79 characters) * syntax correction * Added fix details * Changed to use consistent datatypes during calculaions * Corrected formatting * Requested Changes * removed explicit casts * Removed unnecessary explicits * Removed unnecessary explicit casts * added additional test * updated comments * Requested changes incl additional unit test * fix mistake * formatting * removed whitespace * added test notes * formatting * Requested changes * Trailing space fix attempt * Trailing whitespace fix attempt #2 * Remove trailing whitespace

* Add averaging option to AMI and NMI Leave current behavior unchanged * Flake8 fixes * Incorporate tests of means for AMI and NMI * Add note about `average_method` in NMI * Update docs from AMI, NMI changes (#1) * Correct the NMI and AMI descriptions in docs * Update docstrings due to averaging changes - V-measure - Homogeneity - Completeness - NMI - AMI * Update documentation and remove nose tests (#2) * Update v0.20.rst * Update test_supervised.py * Update clustering.rst * Fix multiple spaces after operator * Rename all arguments * No more arbitrary values! * Improve handling of floating-point imprecision * Clearly state when the change occurs * Update AMI/NMI docs * Update v0.20.rst * Catch FutureWarnings in AMI and NMI

initial PR commit seq_dataset.pyx generated from template seq_dataset.pyx generated from template #2 rename variables fused types consistency test for seq_dataset a sklearn/utils/tests/test_seq_dataset.py new if statement add doc sklearn/utils/seq_dataset.pyx.tp minor changes minor changes typo fix check numeric accuracy only up 5th decimal Address oliver's request for changing test name add test for make_dataset and rename a variable in test_seq_dataset

…13243) * Remove unused code * Squash all the PR 9040 commits initial PR commit seq_dataset.pyx generated from template seq_dataset.pyx generated from template #2 rename variables fused types consistency test for seq_dataset a sklearn/utils/tests/test_seq_dataset.py new if statement add doc sklearn/utils/seq_dataset.pyx.tp minor changes minor changes typo fix check numeric accuracy only up 5th decimal Address oliver's request for changing test name add test for make_dataset and rename a variable in test_seq_dataset * FIX tests * TST more numerically stable test_sgd.test_tol_parameter * Added benchmarks to compare SAGA 32b and 64b * Fixing gael's comments * fix * solve some issues * PEP8 * Address lesteve comments * fix merging * avoid using assert_equal * use all_close * use explicit ArrayDataset64 and CSRDataset64 * fix: remove unused import * Use parametrized to cover ArrayDaset-CSRDataset-32-64 matrix * for consistency use 32 first then 64 + add 64 suffix to variables * it would be cool if this worked !!! * more verbose version * revert SGD changes as much as possible. * Add solvers back to bench_saga * make 64 explicit in the naming * remove checking native python type + add comparison between 32 64 * Add whatsnew with everyone with commits * simplify a bit the testing * simplify the parametrize * update whatsnew * fix pep8

* initial commit * used random class * fixed failing testcases, reverted __init__.py * fixed failing testcases #2 - passed rng as parameter to ParameterSampler class - changed seed from 0 to 42 (as original) * fixed failing testcases #2 - passed rng as parameter to SparseRandomProjection class * fixed failing testcases #4 - passed rng as parameter to GaussianRandomProjection class * fixed failing test case because of flake 8

potash added 3 commits April 11, 2017 18:04

fix flake8

e01667e

xrange -> range

9dd2907

flake8

59f7c85

potash mentioned this pull request Apr 11, 2017

[WIP] Balanced Random Forest scikit-learn/scikit-learn#8728

Closed

massich merged commit 9894bd5 into massich:is/8607 Apr 12, 2017

massich pushed a commit that referenced this pull request Jun 10, 2017

seq_dataset.pyx generated from template #2

1e9d3d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/balanced random forest #2

Feature/balanced random forest #2

Uh oh!

potash commented Apr 11, 2017

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Feature/balanced random forest #2

Feature/balanced random forest #2

Uh oh!

Conversation

potash commented Apr 11, 2017

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!