lux.samplers.UncertainSMOTE

class lux.samplers.UncertainSMOTE(*, predict_proba, process_input=None, sampling_strategy='all', random_state=None, k_neighbors=5, n_jobs=None, sigma=1, m_neighbors=10, min_samples=0.1, instance_to_explain=None, kind='borderline-1')

An implementation of Synthetic Minority Over-sampling Technique (SMOTE) with handling of uncertain samples. Parameters: ———– :param sampling_strategy: float, str, dict, or callable, default=’all’

The sampling strategy to use. Can be a float representing the desired ratio of minority class samples over the majority class samples after resampling, or one of {‘all’, ‘not minority’, ‘minority’}. Alternatively, it can be a dictionary where the keys represent the class labels and the values represent the desired number of samples for each class, or a callable function returning a dictionary.

Parameters:
  • random_state (int, RandomState instance or None) – int, RandomState instance or None, default=None Controls the randomness of the algorithm.

  • k_neighbors (int) – int, default=5 Number of nearest neighbors to used to construct synthetic samples.

  • n_jobs (int or None) – int or None, default=None Number of CPU cores used during the computation.

  • sigma (float) – float, default=1 Parameter controlling the thresholding of confidence intervals for identifying uncertain samples.

  • m_neighbors (int) – int, default=10 Number of nearest neighbors to consider when estimating if a sample is in danger.

  • min_samples (float) – float, default=0.1 Fraction of the maximum class samples to be added as additional synthetic samples.

  • instance_to_explain (array-like of shape (n_features,) or None) – array-like of shape (n_features,) or None, default=None An instance to be used for generating samples around.

  • kind ({'borderline-1', 'borderline-2'}) – {‘borderline-1’, ‘borderline-2’}, default=’borderline-1’ The kind of borderline samples to detect. If ‘borderline-1’, it identifies samples that are borderline to a single class. If ‘borderline-2’, it identifies samples that are borderline to multiple classes.

Attributes:

sampling_strategy_:

dict A dictionary containing the actual number of samples for each class after resampling.

__init__(*, predict_proba, process_input=None, sampling_strategy='all', random_state=None, k_neighbors=5, n_jobs=None, sigma=1, m_neighbors=10, min_samples=0.1, instance_to_explain=None, kind='borderline-1')

An implementation of Synthetic Minority Over-sampling Technique (SMOTE) with handling of uncertain samples. Parameters: ———– :param sampling_strategy: float, str, dict, or callable, default=’all’

The sampling strategy to use. Can be a float representing the desired ratio of minority class samples over the majority class samples after resampling, or one of {‘all’, ‘not minority’, ‘minority’}. Alternatively, it can be a dictionary where the keys represent the class labels and the values represent the desired number of samples for each class, or a callable function returning a dictionary.

Parameters:
  • random_state (int, RandomState instance or None) – int, RandomState instance or None, default=None Controls the randomness of the algorithm.

  • k_neighbors (int) – int, default=5 Number of nearest neighbors to used to construct synthetic samples.

  • n_jobs (int or None) – int or None, default=None Number of CPU cores used during the computation.

  • sigma (float) – float, default=1 Parameter controlling the thresholding of confidence intervals for identifying uncertain samples.

  • m_neighbors (int) – int, default=10 Number of nearest neighbors to consider when estimating if a sample is in danger.

  • min_samples (float) – float, default=0.1 Fraction of the maximum class samples to be added as additional synthetic samples.

  • instance_to_explain (array-like of shape (n_features,) or None) – array-like of shape (n_features,) or None, default=None An instance to be used for generating samples around.

  • kind ({'borderline-1', 'borderline-2'}) – {‘borderline-1’, ‘borderline-2’}, default=’borderline-1’ The kind of borderline samples to detect. If ‘borderline-1’, it identifies samples that are borderline to a single class. If ‘borderline-2’, it identifies samples that are borderline to multiple classes.

Attributes:

sampling_strategy_:

dict A dictionary containing the actual number of samples for each class after resampling.

Methods

__init__(*, predict_proba[, process_input, ...])

An implementation of Synthetic Minority Over-sampling Technique (SMOTE) with handling of uncertain samples. Parameters: ----------- :param sampling_strategy: float, str, dict, or callable, default='all' The sampling strategy to use. Can be a float representing the desired ratio of minority class samples over the majority class samples after resampling, or one of {'all', 'not minority', 'minority'}. Alternatively, it can be a dictionary where the keys represent the class labels and the values represent the desired number of samples for each class, or a callable function returning a dictionary. :type sampling_strategy: float, str, dict, or callable :param random_state: int, RandomState instance or None, default=None Controls the randomness of the algorithm. :type random_state: int, RandomState instance or None :param k_neighbors: int, default=5 Number of nearest neighbors to used to construct synthetic samples. :type k_neighbors: int :param n_jobs: int or None, default=None Number of CPU cores used during the computation. :type n_jobs: int or None :param sigma: float, default=1 Parameter controlling the thresholding of confidence intervals for identifying uncertain samples. :type sigma: float :param m_neighbors: int, default=10 Number of nearest neighbors to consider when estimating if a sample is in danger. :type m_neighbors: int :param min_samples: float, default=0.1 Fraction of the maximum class samples to be added as additional synthetic samples. :type min_samples: float :param instance_to_explain: array-like of shape (n_features,) or None, default=None An instance to be used for generating samples around. :type instance_to_explain: array-like of shape (n_features,) or None :param kind: {'borderline-1', 'borderline-2'}, default='borderline-1' The kind of borderline samples to detect. If 'borderline-1', it identifies samples that are borderline to a single class. If 'borderline-2', it identifies samples that are borderline to multiple classes. :type kind: {'borderline-1', 'borderline-2'}.

fit(X, y, **params)

Check inputs and statistics of the sampler.

fit_resample(X, y, **params)

Resample the dataset.

get_feature_names_out([input_features])

Get output feature names for transformation.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

set_params(**params)

Set the parameters of this estimator.

fit(X, y, **params)

Check inputs and statistics of the sampler.

You should use fit_resample in all cases.

Parameters

X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features)

Data array.

yarray-like of shape (n_samples,)

Target array.

Returns

selfobject

Return the instance itself.

fit_resample(X, y, **params)

Resample the dataset.

Parameters

X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features)

Matrix containing the data which have to be sampled.

yarray-like of shape (n_samples,)

Corresponding label for each sample in X.

Returns

X_resampled{array-like, dataframe, sparse matrix} of shape (n_samples_new, n_features)

The array containing the resampled data.

y_resampledarray-like of shape (n_samples_new,)

The corresponding label of X_resampled.

get_feature_names_out(input_features=None)

Get output feature names for transformation.

Parameters

input_featuresarray-like of str or None, default=None

Input features.

  • If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: [“x0”, “x1”, …, “x(n_features_in_ - 1)”].

  • If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns

feature_names_outndarray of str objects

Same as input features.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns

routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict

Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**paramsdict

Estimator parameters.

Returns

selfestimator instance

Estimator instance.