lux.samplers.UncertainSMOTE

class lux.samplers.UncertainSMOTE(*, predict_proba, process_input=None, sampling_strategy='all', random_state=None, k_neighbors=5, n_jobs=None, sigma=1, m_neighbors=10, min_samples=0.1, instance_to_explain=None, kind='borderline-1')

An implementation of Synthetic Minority Over-sampling Technique (SMOTE) with handling of uncertain samples. Parameters: ———– :param sampling_strategy: float, str, dict, or callable, default=’all’

The sampling strategy to use. Can be a float representing the desired ratio of minority class samples over the majority class samples after resampling, or one of {‘all’, ‘not minority’, ‘minority’}. Alternatively, it can be a dictionary where the keys represent the class labels and the values represent the desired number of samples for each class, or a callable function returning a dictionary.

Parameters:

random_state (int, RandomState instance or None) – int, RandomState instance or None, default=None Controls the randomness of the algorithm.
k_neighbors (int) – int, default=5 Number of nearest neighbors to used to construct synthetic samples.
n_jobs (int or None) – int or None, default=None Number of CPU cores used during the computation.
sigma (float) – float, default=1 Parameter controlling the thresholding of confidence intervals for identifying uncertain samples.
m_neighbors (int) – int, default=10 Number of nearest neighbors to consider when estimating if a sample is in danger.
min_samples (float) – float, default=0.1 Fraction of the maximum class samples to be added as additional synthetic samples.
instance_to_explain (array-like of shape (n_features,) or None) – array-like of shape (n_features,) or None, default=None An instance to be used for generating samples around.
kind ({'borderline-1', 'borderline-2'}) – {‘borderline-1’, ‘borderline-2’}, default=’borderline-1’ The kind of borderline samples to detect. If ‘borderline-1’, it identifies samples that are borderline to a single class. If ‘borderline-2’, it identifies samples that are borderline to multiple classes.

Attributes:

sampling_strategy_:: dict A dictionary containing the actual number of samples for each class after resampling.

__init__(*, predict_proba, process_input=None, sampling_strategy='all', random_state=None, k_neighbors=5, n_jobs=None, sigma=1, m_neighbors=10, min_samples=0.1, instance_to_explain=None, kind='borderline-1')

An implementation of Synthetic Minority Over-sampling Technique (SMOTE) with handling of uncertain samples. Parameters: ———– :param sampling_strategy: float, str, dict, or callable, default=’all’

The sampling strategy to use. Can be a float representing the desired ratio of minority class samples over the majority class samples after resampling, or one of {‘all’, ‘not minority’, ‘minority’}. Alternatively, it can be a dictionary where the keys represent the class labels and the values represent the desired number of samples for each class, or a callable function returning a dictionary.

Parameters:

random_state (int, RandomState instance or None) – int, RandomState instance or None, default=None Controls the randomness of the algorithm.
k_neighbors (int) – int, default=5 Number of nearest neighbors to used to construct synthetic samples.
n_jobs (int or None) – int or None, default=None Number of CPU cores used during the computation.
sigma (float) – float, default=1 Parameter controlling the thresholding of confidence intervals for identifying uncertain samples.
m_neighbors (int) – int, default=10 Number of nearest neighbors to consider when estimating if a sample is in danger.
min_samples (float) – float, default=0.1 Fraction of the maximum class samples to be added as additional synthetic samples.
instance_to_explain (array-like of shape (n_features,) or None) – array-like of shape (n_features,) or None, default=None An instance to be used for generating samples around.
kind ({'borderline-1', 'borderline-2'}) – {‘borderline-1’, ‘borderline-2’}, default=’borderline-1’ The kind of borderline samples to detect. If ‘borderline-1’, it identifies samples that are borderline to a single class. If ‘borderline-2’, it identifies samples that are borderline to multiple classes.

Attributes:

sampling_strategy_:: dict A dictionary containing the actual number of samples for each class after resampling.

Methods

`__init__`(*, predict_proba[, process_input, ...])	An implementation of Synthetic Minority Over-sampling Technique (SMOTE) with handling of uncertain samples. Parameters: ----------- :param sampling_strategy: float, str, dict, or callable, default='all' The sampling strategy to use. Can be a float representing the desired ratio of minority class samples over the majority class samples after resampling, or one of {'all', 'not minority', 'minority'}. Alternatively, it can be a dictionary where the keys represent the class labels and the values represent the desired number of samples for each class, or a callable function returning a dictionary. :type sampling_strategy: float, str, dict, or callable :param random_state: int, RandomState instance or None, default=None Controls the randomness of the algorithm. :type random_state: int, RandomState instance or None :param k_neighbors: int, default=5 Number of nearest neighbors to used to construct synthetic samples. :type k_neighbors: int :param n_jobs: int or None, default=None Number of CPU cores used during the computation. :type n_jobs: int or None :param sigma: float, default=1 Parameter controlling the thresholding of confidence intervals for identifying uncertain samples. :type sigma: float :param m_neighbors: int, default=10 Number of nearest neighbors to consider when estimating if a sample is in danger. :type m_neighbors: int :param min_samples: float, default=0.1 Fraction of the maximum class samples to be added as additional synthetic samples. :type min_samples: float :param instance_to_explain: array-like of shape (n_features,) or None, default=None An instance to be used for generating samples around. :type instance_to_explain: array-like of shape (n_features,) or None :param kind: {'borderline-1', 'borderline-2'}, default='borderline-1' The kind of borderline samples to detect. If 'borderline-1', it identifies samples that are borderline to a single class. If 'borderline-2', it identifies samples that are borderline to multiple classes. :type kind: {'borderline-1', 'borderline-2'}.
`fit`(X, y, **params)	Check inputs and statistics of the sampler.
`fit_resample`(X, y, **params)	Resample the dataset.
`get_feature_names_out`([input_features])	Get output feature names for transformation.
`get_metadata_routing`()	Get metadata routing of this object.
`get_params`([deep])	Get parameters for this estimator.
`set_params`(**params)	Set the parameters of this estimator.

fit(X, y, **params)

Check inputs and statistics of the sampler.

You should use fit_resample in all cases.

Parameters

X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features): Data array.
yarray-like of shape (n_samples,): Target array.

Returns

selfobject: Return the instance itself.

fit_resample(X, y, **params)

Resample the dataset.

Parameters

X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features): Matrix containing the data which have to be sampled.
yarray-like of shape (n_samples,): Corresponding label for each sample in X.

Returns

X_resampled{array-like, dataframe, sparse matrix} of shape (n_samples_new, n_features): The array containing the resampled data.
y_resampledarray-like of shape (n_samples_new,): The corresponding label of X_resampled.

get_feature_names_out(input_features=None)

Get output feature names for transformation.

Parameters

input_featuresarray-like of str or None, default=None

Input features.

If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: [“x0”, “x1”, …, “x(n_features_in_ - 1)”].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns

feature_names_outndarray of str objects: Same as input features.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict: Parameter names mapped to their values.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**paramsdict: Estimator parameters.

Returns

selfestimator instance: Estimator instance.