lux.lux.LUX
- class lux.lux.LUX(predict_proba, classifier=None, neighborhood_size=0.1, max_depth=None, node_size_limit=1, grow_confidence_threshold=0, min_impurity_decrease=0, min_samples=5, min_generate_samples=0.02, uncertainty_sigma=2, oversampling_strategy='both')
This class contains functions that implement generation of local rule-based model-agnostic explanations. (np.array(iris_instance))
Initialize the LUX explainer model.
- Parameters:
predict_proba (callable) – callable The predict_proba function of the balckbox classifier.
classifier – object, optional The underlying classifier. If it is provided the SHAP-based sampling can be used.
neighborhood_size (float) – float, optional The neighborhood size for generating explanations. Default is 0.1.
max_depth (int) – int, optional The maximum depth of the decision tree. Default is None meaning no limit.
node_size_limit (int) – int, optional The minimum number of samples required to split an internal node. Default is 1.
grow_confidence_threshold (float) – float, optional The threshold for growing decision tree nodes. Default is 0.
min_impurity_decrease (float) – float, optional A node will be split if this split induces a decrease of the impurity greater than or equal to this value. Default is 0.
min_samples (int) – int, optional The minimum number of samples required to be at a leaf node. Default is 5.
min_generate_samples (float) – float, optional The minimum proportion of the dataset size to generate perturbed instances. This is used by the UncertainSMOTE algotrothm. Default is 0.02.
uncertainty_sigma (float) – float, optional The uncertainty parameter sigma used in the filtering uncertain samples. Every sample that is 2*uncertainty_sigma away from the mean will be removed. Default is 2.
oversampling_strategy (str) – str, optional The strategy for oversampling. It can be ‘smote’, ‘importance’, or ‘both’. Default is ‘both’.
- __init__(predict_proba, classifier=None, neighborhood_size=0.1, max_depth=None, node_size_limit=1, grow_confidence_threshold=0, min_impurity_decrease=0, min_samples=5, min_generate_samples=0.02, uncertainty_sigma=2, oversampling_strategy='both')
Initialize the LUX explainer model.
- Parameters:
predict_proba (callable) – callable The predict_proba function of the balckbox classifier.
classifier – object, optional The underlying classifier. If it is provided the SHAP-based sampling can be used.
neighborhood_size (float) – float, optional The neighborhood size for generating explanations. Default is 0.1.
max_depth (int) – int, optional The maximum depth of the decision tree. Default is None meaning no limit.
node_size_limit (int) – int, optional The minimum number of samples required to split an internal node. Default is 1.
grow_confidence_threshold (float) – float, optional The threshold for growing decision tree nodes. Default is 0.
min_impurity_decrease (float) – float, optional A node will be split if this split induces a decrease of the impurity greater than or equal to this value. Default is 0.
min_samples (int) – int, optional The minimum number of samples required to be at a leaf node. Default is 5.
min_generate_samples (float) – float, optional The minimum proportion of the dataset size to generate perturbed instances. This is used by the UncertainSMOTE algotrothm. Default is 0.02.
uncertainty_sigma (float) – float, optional The uncertainty parameter sigma used in the filtering uncertain samples. Every sample that is 2*uncertainty_sigma away from the mean will be removed. Default is 2.
oversampling_strategy (str) – str, optional The strategy for oversampling. It can be ‘smote’, ‘importance’, or ‘both’. Default is ‘both’.
Methods
__init__(predict_proba[, classifier, ...])Initialize the LUX explainer model.
counterfactual(instance_to_explain, background)Generates a counterfactual for a given instance and background data
create_sample_bb(X, y, bounding_box_points)Create a sample for the LUX explainer to be fitted to, based on the provided data.
fit(X, y, instance_to_explain[, ...])Fit the LUX explainer model.
fit_bounding_boxes(X, y, bounding_box_points)Fit LUX explainer model for the neighbourhood data defined by the bounding box constructed of several points.
Get metadata routing of this object.
get_params([deep])Get parameters for this estimator.
justify(X[, to_dict, reduce])Traverse down the path for given x.
predict(X[, y])Predicts the outcome with an explainable model previously fitted
Process the input data and predict the probabilities.
The main goal is to change the type of categorical values, so they fit algorithms that require categories as integers :param X: data that will be passed to predict_proba function after preprocessing.
set_fit_request(*[, X_importances, beta, ...])Configure whether metadata should be requested to be passed to the
fitmethod.set_params(**params)Set the parameters of this estimator.
to_HMR()Exports to HMR format that can be executed by the HeaRTDroid rule-engine
to_dot([filename, fmt])visualize(data[, target_column_name, fmt, ...])Attributes
CF_REPRESENTATIVE_MEDOID (
str): A constant representing medoid as the counterfactual representative strategy.CF_REPRESENTATIVE_NEAREST (
str): A constant representing nearest as the counterfactual representative strategy.OS_STRATEGY_BOTH (
str): A constant representing both SMOTE and importance sampling as the oversampling strategy.OS_STRATEGY_IMPORTANCE (
str): A constant representing importance sampling as the oversampling strategy.OS_STRATEGY_SMOTE (
str): A constant representing SMOTE as the oversampling strategy.REPRESENTATIVE_CENTROID (
str): A constant representing centroid as the representative strategy.REPRESENTATIVE_NEAREST (
str): A constant representing nearest as the representative strategy.- CF_REPRESENTATIVE_MEDOID = 'medoid'
CF_REPRESENTATIVE_MEDOID (
str): A constant representing medoid as the counterfactual representative strategy.
- CF_REPRESENTATIVE_NEAREST = 'nearest'
CF_REPRESENTATIVE_NEAREST (
str): A constant representing nearest as the counterfactual representative strategy.
- OS_STRATEGY_BOTH = 'both'
OS_STRATEGY_BOTH (
str): A constant representing both SMOTE and importance sampling as the oversampling strategy.
- OS_STRATEGY_IMPORTANCE = 'importance'
OS_STRATEGY_IMPORTANCE (
str): A constant representing importance sampling as the oversampling strategy.
- OS_STRATEGY_SMOTE = 'smote'
OS_STRATEGY_SMOTE (
str): A constant representing SMOTE as the oversampling strategy.
- REPRESENTATIVE_CENTROID = 'centroid'
REPRESENTATIVE_CENTROID (
str): A constant representing centroid as the representative strategy.
- REPRESENTATIVE_NEAREST = 'nearest'
REPRESENTATIVE_NEAREST (
str): A constant representing nearest as the representative strategy.
- counterfactual(instance_to_explain, background, counterfactual_representative='medoid', reduce=True, topn=None, n_jobs=None)
Generates a counterfactual for a given instance and background data
- Parameters:
instance_to_explain –
background –
counterfactual_representative –
reduce –
topn –
n_jobs –
- Returns:
- create_sample_bb(X, y, bounding_box_points, X_importances=None, exclude_neighbourhood=False, use_parity=True, parity_strategy='global', inverse_sampling=False, class_names=None, representative='centroid', density_sampling=False, radius_sampling=False, radius=None, oversampling=False, categorical=None, n_jobs=None)
Create a sample for the LUX explainer to be fitted to, based on the provided data.
- Parameters:
X (array-like or sparse matrix of shape (n_samples, n_features)) – Input features.
y – Target values.
bounding_box_points (array-like of shape (n_points, n_dimensions)) – Points defining the bounding box.
X_importances – Importance matrix for features. Default is None.
exclude_neighbourhood – Whether to exclude neighborhood points. Default is False.
use_parity – Whether to use parity. Default is True.
parity_strategy – Strategy for parity. Default is ‘global’.
inverse_sampling – Whether to use inverse sampling. Default is False.
class_names – Names of classes. Default is None.
representative – Representative strategy. Default is ‘centroid’.
density_sampling – Whether to use density sampling. Default is False.
radius_sampling – Whether to use radius sampling. Default is False.
radius – Radius for radius sampling. Default is None.
oversampling – Whether to use oversampling. Default is False.
categorical – Categorical information. Default is None.
n_jobs – Number of jobs to run in parallel. Default is None.
Returns: :return: X_train_sample:
Sampled input features.
- Return type:
array-like or sparse matrix of shape (n_samples, n_features)
- Returns:
X_train_sample_importances: Sampled importance matrix for features.
- Return type:
pd.DataFrame or None
- fit(X, y, instance_to_explain, X_importances=None, exclude_neighbourhood=False, use_parity=True, parity_strategy='global', inverse_sampling=True, class_names=None, discount_importance=False, uncertain_entropy_evaluator=<lux.pyuid3.entropy_evaluator.UncertainEntropyEvaluator object>, beta=1, representative='centroid', density_sampling=False, radius_sampling=False, oversampling=True, categorical=None, prune=True, oblique=True, tree_with_shap=True, n_jobs=None)
Fit the LUX explainer model.
- Parameters:
X (pandas.DataFrame) – The input data used to train the model.
y (array-like) – The target values corresponding to the input data.
instance_to_explain (array-like or list) – The instance(s) to explain. Can be a single instance or a list/array of instances. The instances are not explained one after another, but the neighbourhood is created for the whole set of instances. Hence, they form so called bounding box for the explanation.
X_importances (array-like or pandas.DataFrame or None) – optional The importances of features in the input data. If provided as a DataFrame, column names should match feature names.
exclude_neighbourhood (bool) – optional Whether to exclude the neighborhood of the instance(s) being explained. Default is False.
use_parity (bool) – optional Whether to use parity constraints in explanation generation. Default is True.
parity_strategy (str) – optional The strategy for applying parity constraints. It can be ‘global’ or ‘local’. Default is ‘global’.
inverse_sampling (bool) – optional Whether to use inverse sampling for feature selection. Default is True.
class_names (array-like or None) – optional The names of the classes. If not provided, inferred from the target values.
discount_importance (bool) – optional Whether to discount feature importance. Default is False.
uncertain_entropy_evaluator (object) – optional The evaluator for uncertain entropy. Default is UncertainEntropyEvaluator().
beta (float) – optional The beta parameter for F-beta score used in uncertain entropy computation. Default is 1.
representative (str) – optional The representative method for selecting representative instances. It can be ‘centroid’ or ‘prototype’. Default is ‘centroid’.
density_sampling (bool) – optional Whether to use density-based sampling for instance selection. Default is False.
radius_sampling (bool) – optional Whether to use radius-based sampling for instance selection. Default is False.
oversampling (bool) – optional Whether to perform oversampling of instances. Default is False.
categorical (array-like or None) – optional A list indicating whether each feature is categorical or not.
prune (bool) – optional Whether to prune branches in decision tree that produces splits which do not change classification result. Default is True.
oblique (bool) – optional Whether to use oblique decision rules. Default is False.
tree_with_shap – Whether to use build decision tre using shap guided splits. Default is True.
n_jobs (int or None) – optional The number of parallel jobs to run. Default is None.
- Returns:
The trained LUX explainer model.
- Return type:
- fit_bounding_boxes(X, y, bounding_box_points, X_importances=None, exclude_neighbourhood=False, use_parity=True, parity_strategy='global', inverse_sampling=False, class_names=None, discount_importance=False, uncertain_entropy_evaluator=<lux.pyuid3.entropy_evaluator.UncertainEntropyEvaluator object>, beta=1, representative='centroid', density_sampling=False, radius_sampling=False, oversampling=False, categorical=None, prune=False, oblique=False, tree_with_shap=True, n_jobs=None)
Fit LUX explainer model for the neighbourhood data defined by the bounding box constructed of several points. Usually only one point is provided.
- Parameters:
X (array-like or sparse matrix of shape (n_samples, n_features)) – Input features.
y – Target values.
bounding_box_points (array-like of shape (n_points, n_dimensions)) – Points defining the bounding box.
X_importances – Importance matrix for features. Default is None.
exclude_neighbourhood – Whether to exclude neighborhood points. Default is False.
use_parity – Whether to use parity. Default is True.
parity_strategy – Strategy for parity. Default is ‘global’.
inverse_sampling – Whether to use inverse sampling. Default is False.
class_names – Names of classes. Default is None.
discount_importance – Whether to discount importance. Default is False.
uncertain_entropy_evaluator – Evaluator for uncertain entropy. Default is UncertainEntropyEvaluator().
beta – Beta value for fitting. Default is 1.
representative – Representative strategy. Default is ‘centroid’.
density_sampling – Whether to use density sampling. Default is False.
radius_sampling – Whether to use radius sampling. Default is False.
oversampling – Whether to use oversampling. Default is False.
categorical – Categorical information. Default is None.
prune – Whether to prune. Default is False.
oblique – Whether to use oblique splits. Default is False.
tree_with_shap – Whether to use build decision tre using shap guided splits. Default is True.
n_jobs – Number of jobs to run in parallel. Default is None.
Raises: :raises ValueError:
- If the length of class_names does not match the number of classes in y,
or if bounding_box_points is not 2D.
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
Returns
- routingMetadataRequest
A
MetadataRequestencapsulating routing information.
- get_params(deep=True)
Get parameters for this estimator.
Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns
- paramsdict
Parameter names mapped to their values.
- justify(X, to_dict=False, reduce=True)
Traverse down the path for given x. :param X: :param to_dict: :param reduce: :return:
- predict(X, y=None)
Predicts the outcome with an explainable model previously fitted
- Parameters:
X –
y –
- Returns:
- process_and_predict_proba(X)
Process the input data and predict the probabilities.
- Parameters:
X – data that will be passed to predict_proba function after preprocessing.
- Returns:
probabilities, in the same format as prodict_proba
- process_input(X)
The main goal is to change the type of categorical values, so they fit algorithms that require categories as integers :param X: data that will be passed to predict_proba function after preprocessing. :return: preprocessed data with the same dimensions as the input one, but with changed categorical columns to integers
- set_fit_request(*, X_importances: bool | None | str = '$UNCHANGED$', beta: bool | None | str = '$UNCHANGED$', categorical: bool | None | str = '$UNCHANGED$', class_names: bool | None | str = '$UNCHANGED$', density_sampling: bool | None | str = '$UNCHANGED$', discount_importance: bool | None | str = '$UNCHANGED$', exclude_neighbourhood: bool | None | str = '$UNCHANGED$', instance_to_explain: bool | None | str = '$UNCHANGED$', inverse_sampling: bool | None | str = '$UNCHANGED$', n_jobs: bool | None | str = '$UNCHANGED$', oblique: bool | None | str = '$UNCHANGED$', oversampling: bool | None | str = '$UNCHANGED$', parity_strategy: bool | None | str = '$UNCHANGED$', prune: bool | None | str = '$UNCHANGED$', radius_sampling: bool | None | str = '$UNCHANGED$', representative: bool | None | str = '$UNCHANGED$', tree_with_shap: bool | None | str = '$UNCHANGED$', uncertain_entropy_evaluator: bool | None | str = '$UNCHANGED$', use_parity: bool | None | str = '$UNCHANGED$') LUX
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Parameters
- X_importancesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
X_importancesparameter infit.- betastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
betaparameter infit.- categoricalstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
categoricalparameter infit.- class_namesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
class_namesparameter infit.- density_samplingstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
density_samplingparameter infit.- discount_importancestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
discount_importanceparameter infit.- exclude_neighbourhoodstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
exclude_neighbourhoodparameter infit.- instance_to_explainstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
instance_to_explainparameter infit.- inverse_samplingstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
inverse_samplingparameter infit.- n_jobsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
n_jobsparameter infit.- obliquestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
obliqueparameter infit.- oversamplingstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
oversamplingparameter infit.- parity_strategystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
parity_strategyparameter infit.- prunestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
pruneparameter infit.- radius_samplingstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
radius_samplingparameter infit.- representativestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
representativeparameter infit.- tree_with_shapstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
tree_with_shapparameter infit.- uncertain_entropy_evaluatorstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
uncertain_entropy_evaluatorparameter infit.- use_paritystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
use_parityparameter infit.
Returns
- selfobject
The updated object.
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.Parameters
- **paramsdict
Estimator parameters.
Returns
- selfestimator instance
Estimator instance.
- to_HMR()
Exports to HMR format that can be executed by the HeaRTDroid rule-engine
- Returns: