lux.samplers.UncertainSMOTE
- class lux.samplers.UncertainSMOTE(*, predict_proba, process_input=None, sampling_strategy='all', random_state=None, k_neighbors=5, n_jobs=None, sigma=1, m_neighbors=10, min_samples=0.1, instance_to_explain=None, kind='borderline-1')
An implementation of Synthetic Minority Over-sampling Technique (SMOTE) with handling of uncertain samples. Parameters: ———– :param sampling_strategy: float, str, dict, or callable, default=’all’
The sampling strategy to use. Can be a float representing the desired ratio of minority class samples over the majority class samples after resampling, or one of {‘all’, ‘not minority’, ‘minority’}. Alternatively, it can be a dictionary where the keys represent the class labels and the values represent the desired number of samples for each class, or a callable function returning a dictionary.
- Parameters:
random_state (int, RandomState instance or None) – int, RandomState instance or None, default=None Controls the randomness of the algorithm.
k_neighbors (int) – int, default=5 Number of nearest neighbors to used to construct synthetic samples.
n_jobs (int or None) – int or None, default=None Number of CPU cores used during the computation.
sigma (float) – float, default=1 Parameter controlling the thresholding of confidence intervals for identifying uncertain samples.
m_neighbors (int) – int, default=10 Number of nearest neighbors to consider when estimating if a sample is in danger.
min_samples (float) – float, default=0.1 Fraction of the maximum class samples to be added as additional synthetic samples.
instance_to_explain (array-like of shape (n_features,) or None) – array-like of shape (n_features,) or None, default=None An instance to be used for generating samples around.
kind ({'borderline-1', 'borderline-2'}) – {‘borderline-1’, ‘borderline-2’}, default=’borderline-1’ The kind of borderline samples to detect. If ‘borderline-1’, it identifies samples that are borderline to a single class. If ‘borderline-2’, it identifies samples that are borderline to multiple classes.
Attributes:
- sampling_strategy_:
dict A dictionary containing the actual number of samples for each class after resampling.
- __init__(*, predict_proba, process_input=None, sampling_strategy='all', random_state=None, k_neighbors=5, n_jobs=None, sigma=1, m_neighbors=10, min_samples=0.1, instance_to_explain=None, kind='borderline-1')
An implementation of Synthetic Minority Over-sampling Technique (SMOTE) with handling of uncertain samples. Parameters: ———– :param sampling_strategy: float, str, dict, or callable, default=’all’
The sampling strategy to use. Can be a float representing the desired ratio of minority class samples over the majority class samples after resampling, or one of {‘all’, ‘not minority’, ‘minority’}. Alternatively, it can be a dictionary where the keys represent the class labels and the values represent the desired number of samples for each class, or a callable function returning a dictionary.
- Parameters:
random_state (int, RandomState instance or None) – int, RandomState instance or None, default=None Controls the randomness of the algorithm.
k_neighbors (int) – int, default=5 Number of nearest neighbors to used to construct synthetic samples.
n_jobs (int or None) – int or None, default=None Number of CPU cores used during the computation.
sigma (float) – float, default=1 Parameter controlling the thresholding of confidence intervals for identifying uncertain samples.
m_neighbors (int) – int, default=10 Number of nearest neighbors to consider when estimating if a sample is in danger.
min_samples (float) – float, default=0.1 Fraction of the maximum class samples to be added as additional synthetic samples.
instance_to_explain (array-like of shape (n_features,) or None) – array-like of shape (n_features,) or None, default=None An instance to be used for generating samples around.
kind ({'borderline-1', 'borderline-2'}) – {‘borderline-1’, ‘borderline-2’}, default=’borderline-1’ The kind of borderline samples to detect. If ‘borderline-1’, it identifies samples that are borderline to a single class. If ‘borderline-2’, it identifies samples that are borderline to multiple classes.
Attributes:
- sampling_strategy_:
dict A dictionary containing the actual number of samples for each class after resampling.
Methods
__init__(*, predict_proba[, process_input, ...])An implementation of Synthetic Minority Over-sampling Technique (SMOTE) with handling of uncertain samples. Parameters: ----------- :param sampling_strategy: float, str, dict, or callable, default='all' The sampling strategy to use. Can be a float representing the desired ratio of minority class samples over the majority class samples after resampling, or one of {'all', 'not minority', 'minority'}. Alternatively, it can be a dictionary where the keys represent the class labels and the values represent the desired number of samples for each class, or a callable function returning a dictionary. :type sampling_strategy: float, str, dict, or callable :param random_state: int, RandomState instance or None, default=None Controls the randomness of the algorithm. :type random_state: int, RandomState instance or None :param k_neighbors: int, default=5 Number of nearest neighbors to used to construct synthetic samples. :type k_neighbors: int :param n_jobs: int or None, default=None Number of CPU cores used during the computation. :type n_jobs: int or None :param sigma: float, default=1 Parameter controlling the thresholding of confidence intervals for identifying uncertain samples. :type sigma: float :param m_neighbors: int, default=10 Number of nearest neighbors to consider when estimating if a sample is in danger. :type m_neighbors: int :param min_samples: float, default=0.1 Fraction of the maximum class samples to be added as additional synthetic samples. :type min_samples: float :param instance_to_explain: array-like of shape (n_features,) or None, default=None An instance to be used for generating samples around. :type instance_to_explain: array-like of shape (n_features,) or None :param kind: {'borderline-1', 'borderline-2'}, default='borderline-1' The kind of borderline samples to detect. If 'borderline-1', it identifies samples that are borderline to a single class. If 'borderline-2', it identifies samples that are borderline to multiple classes. :type kind: {'borderline-1', 'borderline-2'}.
fit(X, y, **params)Check inputs and statistics of the sampler.
fit_resample(X, y, **params)Resample the dataset.
get_feature_names_out([input_features])Get output feature names for transformation.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
set_params(**params)Set the parameters of this estimator.
- fit(X, y, **params)
Check inputs and statistics of the sampler.
You should use
fit_resamplein all cases.Parameters
- X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features)
Data array.
- yarray-like of shape (n_samples,)
Target array.
Returns
- selfobject
Return the instance itself.
- fit_resample(X, y, **params)
Resample the dataset.
Parameters
- X{array-like, dataframe, sparse matrix} of shape (n_samples, n_features)
Matrix containing the data which have to be sampled.
- yarray-like of shape (n_samples,)
Corresponding label for each sample in X.
Returns
- X_resampled{array-like, dataframe, sparse matrix} of shape (n_samples_new, n_features)
The array containing the resampled data.
- y_resampledarray-like of shape (n_samples_new,)
The corresponding label of X_resampled.
- get_feature_names_out(input_features=None)
Get output feature names for transformation.
Parameters
- input_featuresarray-like of str or None, default=None
Input features.
If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then the following input feature names are generated: [“x0”, “x1”, …, “x(n_features_in_ - 1)”].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.
Returns
- feature_names_outndarray of str objects
Same as input features.
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
Returns
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)
Get parameters for this estimator.
Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- paramsdict
Parameter names mapped to their values.
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.Parameters
- **paramsdict
Estimator parameters.
Returns
- selfestimator instance
Estimator instance.