Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

DecisionTreeClassifier having unexpected behaviour with 'min_weight_fraction_leaf=0.5' #30917

Copy link
Copy link
Open
@snath-xoc

Description

@snath-xoc
Issue body actions

Describe the bug

When fitting DecisionTreeClassifier on a duplicated sample set (i.e. each sample repeated by two), the result is not the same as when fitting on the original sample set. This only happens for 'min_weight_fraction_leaf' specified as <0.5. This also effects ExtraTreesClassifier and ExtraTreeClassifier.

Steps/Code to Reproduce

from sklearn.tree import DecisionTreeClassifier
from scipy.stats import kstest
import numpy as np

rng = np.random.RandomState(0)
    
n_samples = 20
X = rng.rand(n_samples, n_samples * 2)
y = rng.randint(0, 3, size=n_samples)

X_repeated = np.repeat(X,2,axis=0)
y_repeated = np.repeat(y,2)

predictions = []
predictions_dup = []

## Fit estimator
for seed in range(100):
    est = DecisionTreeClassifier(random_state=seed, max_features=0.5, min_weight_fraction_leaf=0.5).fit(X,y)
    est_dup = DecisionTreeClassifier(random_state=seed, max_features=0.5, min_weight_fraction_leaf=0.5).fit(X_repeated,y_repeated)

    ##Get predictions
    predictions.append(est.predict_proba(X)[:,:-1])
    predictions_dup.append(est_dup.predict_proba(X)[:,:-1])

predictions = np.vstack(predictions)
predictions_dup = np.vstack(predictions_dup)

for pred, pred_dup in (predictions.T,predictions_dup.T):
    print(kstest(pred,pred_dup).pvalue)

Expected Results

p-values are more than ˜0.05

Actual Results

p-values = 2.0064970441275627e-69

Versions

System:
    python: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:13:44) [Clang 16.0.6 ]
executable: /Users/shrutinath/micromamba/envs/scikit-learn/bin/python
   machine: macOS-14.3-arm64-arm-64bit

Python dependencies:
      sklearn: 1.7.dev0
          pip: 24.0
   setuptools: 75.8.0
        numpy: 2.0.0
        scipy: 1.14.0
       Cython: 3.0.10
       pandas: 2.2.2
   matplotlib: 3.9.0
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
...
    num_threads: 8
         prefix: libomp
       filepath: /Users/shrutinath/micromamba/envs/scikit-learn/lib/libomp.dylib
        version: None
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Morty Proxy This is a proxified and sanitized view of the page, visit original site.