lux.pyuid3.data.Data

class lux.pyuid3.data.Data(name: str = None, attributes: List[Dict] = None, instances: List[Dict] = None)

Initialize a Data object.

Parameters:

param name:

str, optional The name of the dataset.

param attributes:

List[Dict], optional List of attribute Dict defining the dataset’s attributes.

param instances:

List[Dict], optional List of instance Dict containing the dataset’s instances.

__init__(name: str = None, attributes: List[Dict] = None, instances: List[Dict] = None)

Initialize a Data object.

Parameters:

param name:

str, optional The name of the dataset.

param attributes:

List[Dict], optional List of attribute Dict defining the dataset’s attributes.

param instances:

List[Dict], optional List of instance Dict containing the dataset’s instances.

Methods

__init__([name, attributes, instances])

Initialize a Data object.

calculate_statistics(att)

Calculate statistics for a specific attribute in the dataset.

filter_nominal_attribute_value(at, value[, copy])

Filter the dataset based on the given nominal attribute value.

filter_numeric_attribute_value(at, value[, copy])

Filter the dataset based on the given numeric attribute value.

filter_numeric_attribute_value_expr(at, expr)

Filter the dataset based on the given expression involving a numeric attribute value.

get_attribute_of_name(att_name)

Get the attribute Dict corresponding to the given attribute name.

get_attributes()

get_class_attribute()

get_instances()

get_name()

parse_dataframe(df[, df_imps, name, categorical])

Parse pd.DataFrame to Data object

parse_ucsv(filename)

Parse DataFrame from csv to Data object

reduce_importance_for_attribute(att, ...[, ...])

Reduce the importance of a specific attribute by a given discount factor.

set_importances(importances, expected_values)

Set importances for each attribute based on the provided DataFrame of importances and expected values.

to_dataframe([most_probable])

Convert the dataset to a pandas DataFrame.

to_dataframe_importances([average_absolute])

Convert the dataset's importances to a pandas DataFrame.

update_attribute_domains()

Set attribute domains for numerical values

calculate_statistics(att: Dict) AttStats

Calculate statistics for a specific attribute in the dataset.

Parameters:

param att:

Dict The attribute Dict for which statistics are to be calculated.

Returns:

return:

AttStats An object containing statistics for the specified attribute.

filter_nominal_attribute_value(at: Dict, value: str, copy: bool = False) Data

Filter the dataset based on the given nominal attribute value.

Parameters:

param at:

Dict The attribute Dict to filter.

param value:

str The value to filter.

param copy:

bool, optional Whether to create a copy of the filtered dataset. Defaults to False.

Returns:

return:

Data The filtered dataset.

filter_numeric_attribute_value(at: Dict, value: str, copy: bool = False) Tuple[Data, Data]

Filter the dataset based on the given numeric attribute value.

Parameters:

param at:

Dict The attribute Dict to filter.

param value:

str The value to filter.

param copy:

bool, optional Whether to create a copy of the filtered dataset. Defaults to False.

Returns:

return:

Tuple[Data, Data] A tuple containing two filtered datasets: - The first dataset contains instances where the attribute value is less than the given value. - The second dataset contains instances where the attribute value is greater than or equal to the given value.

filter_numeric_attribute_value_expr(at: Dict, expr: str, copy: bool = False) Tuple[Data, Data]

Filter the dataset based on the given expression involving a numeric attribute value.

Parameters:

param at:

Dict The attribute Dict to filter.

param expr:

str The expression to evaluate. It can involve comparisons and arithmetic operations with the attribute value.

param copy:

bool, optional Whether to create a copy of the filtered dataset. Defaults to False.

Returns:

return:

Tuple[Data, Data] A tuple containing two filtered datasets: - The first dataset contains instances where the attribute value satisfies the expression. - The second dataset contains instances where the attribute value does not satisfy the expression.

get_attribute_of_name(att_name: str) Dict

Get the attribute Dict corresponding to the given attribute name.

Parameters:

param att_name:

str The name of the attribute to retrieve.

Returns:

return:

Dict The attribute Dict corresponding to the given attribute name. Returns None if the attribute name is not found in the dataset.

static parse_dataframe(df: DataFrame, df_imps=None, name='uarff_data', categorical: List[bool] = None) Data

Parse pd.DataFrame to Data object

static parse_ucsv(filename: str) Data

Parse DataFrame from csv to Data object

reduce_importance_for_attribute(att: Dict, discount_factor: float, for_class: str = None) Data

Reduce the importance of a specific attribute by a given discount factor.

Parameters:

param att:

Dict The attribute Dict for which importance needs to be reduced.

param discount_factor:

float The discount factor by which to reduce the importance.

param for_class:

str, optional (default=None) If provided, reduce the importance only for the specified class.

Returns:

return:

Data A new Data object with reduced importance for the specified attribute.

set_importances(importances: DataFrame, expected_values: Dict) Data

Set importances for each attribute based on the provided DataFrame of importances and expected values.

Parameters:

param importances:

pd.DataFrame DataFrame containing importances for each attribute.

param expected_values:

Dict Dictionary containing expected values.

Returns:

return:

Data A new Data object with updated importances.

to_dataframe(most_probable=True) DataFrame

Convert the dataset to a pandas DataFrame.

Parameters:

param most_probable:

bool, optional (default=True) Whether to use the most probable values for each attribute. In current version there is no other option than True.

Returns:

return:

pd.DataFrame A pandas DataFrame representing the dataset.

to_dataframe_importances(average_absolute=False)

Convert the dataset’s importances to a pandas DataFrame.

Parameters:

param average_absolute:

bool, optional (default=False) Whether to calculate the average absolute importances.

Returns:

return:

pd.DataFrame A pandas DataFrame representing the importances of each attribute.

update_attribute_domains()

Set attribute domains for numerical values