lux.lux.LUX

class lux.lux.LUX(predict_proba, classifier=None, neighborhood_size=0.1, max_depth=None, node_size_limit=1, grow_confidence_threshold=0, min_impurity_decrease=0, min_samples=5, min_generate_samples=0.02, uncertainty_sigma=2, oversampling_strategy='both')

This class contains functions that implement generation of local rule-based model-agnostic explanations. (np.array(iris_instance))

Initialize the LUX explainer model.

Parameters:
  • predict_proba (callable) – callable The predict_proba function of the balckbox classifier.

  • classifier – object, optional The underlying classifier. If it is provided the SHAP-based sampling can be used.

  • neighborhood_size (float) – float, optional The neighborhood size for generating explanations. Default is 0.1.

  • max_depth (int) – int, optional The maximum depth of the decision tree. Default is None meaning no limit.

  • node_size_limit (int) – int, optional The minimum number of samples required to split an internal node. Default is 1.

  • grow_confidence_threshold (float) – float, optional The threshold for growing decision tree nodes. Default is 0.

  • min_impurity_decrease (float) – float, optional A node will be split if this split induces a decrease of the impurity greater than or equal to this value. Default is 0.

  • min_samples (int) – int, optional The minimum number of samples required to be at a leaf node. Default is 5.

  • min_generate_samples (float) – float, optional The minimum proportion of the dataset size to generate perturbed instances. This is used by the UncertainSMOTE algotrothm. Default is 0.02.

  • uncertainty_sigma (float) – float, optional The uncertainty parameter sigma used in the filtering uncertain samples. Every sample that is 2*uncertainty_sigma away from the mean will be removed. Default is 2.

  • oversampling_strategy (str) – str, optional The strategy for oversampling. It can be ‘smote’, ‘importance’, or ‘both’. Default is ‘both’.

__init__(predict_proba, classifier=None, neighborhood_size=0.1, max_depth=None, node_size_limit=1, grow_confidence_threshold=0, min_impurity_decrease=0, min_samples=5, min_generate_samples=0.02, uncertainty_sigma=2, oversampling_strategy='both')

Initialize the LUX explainer model.

Parameters:
  • predict_proba (callable) – callable The predict_proba function of the balckbox classifier.

  • classifier – object, optional The underlying classifier. If it is provided the SHAP-based sampling can be used.

  • neighborhood_size (float) – float, optional The neighborhood size for generating explanations. Default is 0.1.

  • max_depth (int) – int, optional The maximum depth of the decision tree. Default is None meaning no limit.

  • node_size_limit (int) – int, optional The minimum number of samples required to split an internal node. Default is 1.

  • grow_confidence_threshold (float) – float, optional The threshold for growing decision tree nodes. Default is 0.

  • min_impurity_decrease (float) – float, optional A node will be split if this split induces a decrease of the impurity greater than or equal to this value. Default is 0.

  • min_samples (int) – int, optional The minimum number of samples required to be at a leaf node. Default is 5.

  • min_generate_samples (float) – float, optional The minimum proportion of the dataset size to generate perturbed instances. This is used by the UncertainSMOTE algotrothm. Default is 0.02.

  • uncertainty_sigma (float) – float, optional The uncertainty parameter sigma used in the filtering uncertain samples. Every sample that is 2*uncertainty_sigma away from the mean will be removed. Default is 2.

  • oversampling_strategy (str) – str, optional The strategy for oversampling. It can be ‘smote’, ‘importance’, or ‘both’. Default is ‘both’.

Methods

__init__(predict_proba[, classifier, ...])

Initialize the LUX explainer model.

counterfactual(instance_to_explain, background)

Generates a counterfactual for a given instance and background data

create_sample_bb(X, y, bounding_box_points)

Create a sample for the LUX explainer to be fitted to, based on the provided data.

fit(X, y, instance_to_explain[, ...])

Fit the LUX explainer model.

fit_bounding_boxes(X, y, bounding_box_points)

Fit LUX explainer model for the neighbourhood data defined by the bounding box constructed of several points.

get_metadata_routing()

Get metadata routing of this object.

get_params([deep])

Get parameters for this estimator.

justify(X[, to_dict, reduce])

Traverse down the path for given x.

predict(X[, y])

Predicts the outcome with an explainable model previously fitted

process_and_predict_proba(X)

Process the input data and predict the probabilities.

process_input(X)

The main goal is to change the type of categorical values, so they fit algorithms that require categories as integers :param X: data that will be passed to predict_proba function after preprocessing.

set_fit_request(*[, X_importances, beta, ...])

Configure whether metadata should be requested to be passed to the fit method.

set_params(**params)

Set the parameters of this estimator.

to_HMR()

Exports to HMR format that can be executed by the HeaRTDroid rule-engine

to_dot([filename, fmt])

visualize(data[, target_column_name, fmt, ...])

Attributes

CF_REPRESENTATIVE_MEDOID

CF_REPRESENTATIVE_MEDOID (str): A constant representing medoid as the counterfactual representative strategy.

CF_REPRESENTATIVE_NEAREST

CF_REPRESENTATIVE_NEAREST (str): A constant representing nearest as the counterfactual representative strategy.

OS_STRATEGY_BOTH

OS_STRATEGY_BOTH (str): A constant representing both SMOTE and importance sampling as the oversampling strategy.

OS_STRATEGY_IMPORTANCE

OS_STRATEGY_IMPORTANCE (str): A constant representing importance sampling as the oversampling strategy.

OS_STRATEGY_SMOTE

OS_STRATEGY_SMOTE (str): A constant representing SMOTE as the oversampling strategy.

REPRESENTATIVE_CENTROID

REPRESENTATIVE_CENTROID (str): A constant representing centroid as the representative strategy.

REPRESENTATIVE_NEAREST

REPRESENTATIVE_NEAREST (str): A constant representing nearest as the representative strategy.

CF_REPRESENTATIVE_MEDOID = 'medoid'

CF_REPRESENTATIVE_MEDOID (str): A constant representing medoid as the counterfactual representative strategy.

CF_REPRESENTATIVE_NEAREST = 'nearest'

CF_REPRESENTATIVE_NEAREST (str): A constant representing nearest as the counterfactual representative strategy.

OS_STRATEGY_BOTH = 'both'

OS_STRATEGY_BOTH (str): A constant representing both SMOTE and importance sampling as the oversampling strategy.

OS_STRATEGY_IMPORTANCE = 'importance'

OS_STRATEGY_IMPORTANCE (str): A constant representing importance sampling as the oversampling strategy.

OS_STRATEGY_SMOTE = 'smote'

OS_STRATEGY_SMOTE (str): A constant representing SMOTE as the oversampling strategy.

REPRESENTATIVE_CENTROID = 'centroid'

REPRESENTATIVE_CENTROID (str): A constant representing centroid as the representative strategy.

REPRESENTATIVE_NEAREST = 'nearest'

REPRESENTATIVE_NEAREST (str): A constant representing nearest as the representative strategy.

counterfactual(instance_to_explain, background, counterfactual_representative='medoid', reduce=True, topn=None, n_jobs=None)

Generates a counterfactual for a given instance and background data

Parameters:
  • instance_to_explain

  • background

  • counterfactual_representative

  • reduce

  • topn

  • n_jobs

Returns:

create_sample_bb(X, y, bounding_box_points, X_importances=None, exclude_neighbourhood=False, use_parity=True, parity_strategy='global', inverse_sampling=False, class_names=None, representative='centroid', density_sampling=False, radius_sampling=False, radius=None, oversampling=False, categorical=None, n_jobs=None)

Create a sample for the LUX explainer to be fitted to, based on the provided data.

Parameters:
  • X (array-like or sparse matrix of shape (n_samples, n_features)) – Input features.

  • y – Target values.

  • bounding_box_points (array-like of shape (n_points, n_dimensions)) – Points defining the bounding box.

  • X_importances – Importance matrix for features. Default is None.

  • exclude_neighbourhood – Whether to exclude neighborhood points. Default is False.

  • use_parity – Whether to use parity. Default is True.

  • parity_strategy – Strategy for parity. Default is ‘global’.

  • inverse_sampling – Whether to use inverse sampling. Default is False.

  • class_names – Names of classes. Default is None.

  • representative – Representative strategy. Default is ‘centroid’.

  • density_sampling – Whether to use density sampling. Default is False.

  • radius_sampling – Whether to use radius sampling. Default is False.

  • radius – Radius for radius sampling. Default is None.

  • oversampling – Whether to use oversampling. Default is False.

  • categorical – Categorical information. Default is None.

  • n_jobs – Number of jobs to run in parallel. Default is None.

Returns: :return: X_train_sample:

Sampled input features.

Return type:

array-like or sparse matrix of shape (n_samples, n_features)

Returns:

X_train_sample_importances: Sampled importance matrix for features.

Return type:

pd.DataFrame or None

fit(X, y, instance_to_explain, X_importances=None, exclude_neighbourhood=False, use_parity=True, parity_strategy='global', inverse_sampling=True, class_names=None, discount_importance=False, uncertain_entropy_evaluator=<lux.pyuid3.entropy_evaluator.UncertainEntropyEvaluator object>, beta=1, representative='centroid', density_sampling=False, radius_sampling=False, oversampling=True, categorical=None, prune=True, oblique=True, tree_with_shap=True, n_jobs=None)

Fit the LUX explainer model.

Parameters:
  • X (pandas.DataFrame) – The input data used to train the model.

  • y (array-like) – The target values corresponding to the input data.

  • instance_to_explain (array-like or list) – The instance(s) to explain. Can be a single instance or a list/array of instances. The instances are not explained one after another, but the neighbourhood is created for the whole set of instances. Hence, they form so called bounding box for the explanation.

  • X_importances (array-like or pandas.DataFrame or None) – optional The importances of features in the input data. If provided as a DataFrame, column names should match feature names.

  • exclude_neighbourhood (bool) – optional Whether to exclude the neighborhood of the instance(s) being explained. Default is False.

  • use_parity (bool) – optional Whether to use parity constraints in explanation generation. Default is True.

  • parity_strategy (str) – optional The strategy for applying parity constraints. It can be ‘global’ or ‘local’. Default is ‘global’.

  • inverse_sampling (bool) – optional Whether to use inverse sampling for feature selection. Default is True.

  • class_names (array-like or None) – optional The names of the classes. If not provided, inferred from the target values.

  • discount_importance (bool) – optional Whether to discount feature importance. Default is False.

  • uncertain_entropy_evaluator (object) – optional The evaluator for uncertain entropy. Default is UncertainEntropyEvaluator().

  • beta (float) – optional The beta parameter for F-beta score used in uncertain entropy computation. Default is 1.

  • representative (str) – optional The representative method for selecting representative instances. It can be ‘centroid’ or ‘prototype’. Default is ‘centroid’.

  • density_sampling (bool) – optional Whether to use density-based sampling for instance selection. Default is False.

  • radius_sampling (bool) – optional Whether to use radius-based sampling for instance selection. Default is False.

  • oversampling (bool) – optional Whether to perform oversampling of instances. Default is False.

  • categorical (array-like or None) – optional A list indicating whether each feature is categorical or not.

  • prune (bool) – optional Whether to prune branches in decision tree that produces splits which do not change classification result. Default is True.

  • oblique (bool) – optional Whether to use oblique decision rules. Default is False.

  • tree_with_shap – Whether to use build decision tre using shap guided splits. Default is True.

  • n_jobs (int or None) – optional The number of parallel jobs to run. Default is None.

Returns:

The trained LUX explainer model.

Return type:

lux.lux.LUX

fit_bounding_boxes(X, y, bounding_box_points, X_importances=None, exclude_neighbourhood=False, use_parity=True, parity_strategy='global', inverse_sampling=False, class_names=None, discount_importance=False, uncertain_entropy_evaluator=<lux.pyuid3.entropy_evaluator.UncertainEntropyEvaluator object>, beta=1, representative='centroid', density_sampling=False, radius_sampling=False, oversampling=False, categorical=None, prune=False, oblique=False, tree_with_shap=True, n_jobs=None)

Fit LUX explainer model for the neighbourhood data defined by the bounding box constructed of several points. Usually only one point is provided.

Parameters:
  • X (array-like or sparse matrix of shape (n_samples, n_features)) – Input features.

  • y – Target values.

  • bounding_box_points (array-like of shape (n_points, n_dimensions)) – Points defining the bounding box.

  • X_importances – Importance matrix for features. Default is None.

  • exclude_neighbourhood – Whether to exclude neighborhood points. Default is False.

  • use_parity – Whether to use parity. Default is True.

  • parity_strategy – Strategy for parity. Default is ‘global’.

  • inverse_sampling – Whether to use inverse sampling. Default is False.

  • class_names – Names of classes. Default is None.

  • discount_importance – Whether to discount importance. Default is False.

  • uncertain_entropy_evaluator – Evaluator for uncertain entropy. Default is UncertainEntropyEvaluator().

  • beta – Beta value for fitting. Default is 1.

  • representative – Representative strategy. Default is ‘centroid’.

  • density_sampling – Whether to use density sampling. Default is False.

  • radius_sampling – Whether to use radius sampling. Default is False.

  • oversampling – Whether to use oversampling. Default is False.

  • categorical – Categorical information. Default is None.

  • prune – Whether to prune. Default is False.

  • oblique – Whether to use oblique splits. Default is False.

  • tree_with_shap – Whether to use build decision tre using shap guided splits. Default is True.

  • n_jobs – Number of jobs to run in parallel. Default is None.

Raises: :raises ValueError:

If the length of class_names does not match the number of classes in y,

or if bounding_box_points is not 2D.

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns

routingMetadataRequest

A MetadataRequest encapsulating routing information.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict

Parameter names mapped to their values.

justify(X, to_dict=False, reduce=True)

Traverse down the path for given x. :param X: :param to_dict: :param reduce: :return:

predict(X, y=None)

Predicts the outcome with an explainable model previously fitted

Parameters:
  • X

  • y

Returns:

process_and_predict_proba(X)

Process the input data and predict the probabilities.

Parameters:

X – data that will be passed to predict_proba function after preprocessing.

Returns:

probabilities, in the same format as prodict_proba

process_input(X)

The main goal is to change the type of categorical values, so they fit algorithms that require categories as integers :param X: data that will be passed to predict_proba function after preprocessing. :return: preprocessed data with the same dimensions as the input one, but with changed categorical columns to integers

set_fit_request(*, X_importances: bool | None | str = '$UNCHANGED$', beta: bool | None | str = '$UNCHANGED$', categorical: bool | None | str = '$UNCHANGED$', class_names: bool | None | str = '$UNCHANGED$', density_sampling: bool | None | str = '$UNCHANGED$', discount_importance: bool | None | str = '$UNCHANGED$', exclude_neighbourhood: bool | None | str = '$UNCHANGED$', instance_to_explain: bool | None | str = '$UNCHANGED$', inverse_sampling: bool | None | str = '$UNCHANGED$', n_jobs: bool | None | str = '$UNCHANGED$', oblique: bool | None | str = '$UNCHANGED$', oversampling: bool | None | str = '$UNCHANGED$', parity_strategy: bool | None | str = '$UNCHANGED$', prune: bool | None | str = '$UNCHANGED$', radius_sampling: bool | None | str = '$UNCHANGED$', representative: bool | None | str = '$UNCHANGED$', tree_with_shap: bool | None | str = '$UNCHANGED$', uncertain_entropy_evaluator: bool | None | str = '$UNCHANGED$', use_parity: bool | None | str = '$UNCHANGED$') LUX

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Parameters

X_importancesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for X_importances parameter in fit.

betastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for beta parameter in fit.

categoricalstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for categorical parameter in fit.

class_namesstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for class_names parameter in fit.

density_samplingstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for density_sampling parameter in fit.

discount_importancestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for discount_importance parameter in fit.

exclude_neighbourhoodstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for exclude_neighbourhood parameter in fit.

instance_to_explainstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for instance_to_explain parameter in fit.

inverse_samplingstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for inverse_sampling parameter in fit.

n_jobsstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for n_jobs parameter in fit.

obliquestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for oblique parameter in fit.

oversamplingstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for oversampling parameter in fit.

parity_strategystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for parity_strategy parameter in fit.

prunestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for prune parameter in fit.

radius_samplingstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for radius_sampling parameter in fit.

representativestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for representative parameter in fit.

tree_with_shapstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for tree_with_shap parameter in fit.

uncertain_entropy_evaluatorstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for uncertain_entropy_evaluator parameter in fit.

use_paritystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for use_parity parameter in fit.

Returns

selfobject

The updated object.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**paramsdict

Estimator parameters.

Returns

selfestimator instance

Estimator instance.

to_HMR()

Exports to HMR format that can be executed by the HeaRTDroid rule-engine

Returns: