lux.pyuid3.data.Data

class lux.pyuid3.data.Data(name: str = None, attributes: List[Dict] = None, instances: List[Dict] = None)

Initialize a Data object.

Parameters:

param name:: str, optional The name of the dataset.
param attributes:: List[Dict], optional List of attribute Dict defining the dataset’s attributes.
param instances:: List[Dict], optional List of instance Dict containing the dataset’s instances.

__init__(name: str = None, attributes: List[Dict] = None, instances: List[Dict] = None)

Initialize a Data object.

Parameters:

param name:: str, optional The name of the dataset.
param attributes:: List[Dict], optional List of attribute Dict defining the dataset’s attributes.
param instances:: List[Dict], optional List of instance Dict containing the dataset’s instances.

Methods

`__init__`([name, attributes, instances])	Initialize a Data object.
`calculate_statistics`(att)	Calculate statistics for a specific attribute in the dataset.
`filter_nominal_attribute_value`(at, value[, copy])	Filter the dataset based on the given nominal attribute value.
`filter_numeric_attribute_value`(at, value[, copy])	Filter the dataset based on the given numeric attribute value.
`filter_numeric_attribute_value_expr`(at, expr)	Filter the dataset based on the given expression involving a numeric attribute value.
`get_attribute_of_name`(att_name)	Get the attribute Dict corresponding to the given attribute name.
`get_attributes`()
`get_class_attribute`()
`get_instances`()
`get_name`()
`parse_dataframe`(df[, df_imps, name, categorical])	Parse pd.DataFrame to Data object
`parse_ucsv`(filename)	Parse DataFrame from csv to Data object
`reduce_importance_for_attribute`(att, ...[, ...])	Reduce the importance of a specific attribute by a given discount factor.
`set_importances`(importances, expected_values)	Set importances for each attribute based on the provided DataFrame of importances and expected values.
`to_dataframe`([most_probable])	Convert the dataset to a pandas DataFrame.
`to_dataframe_importances`([average_absolute])	Convert the dataset's importances to a pandas DataFrame.
`update_attribute_domains`()	Set attribute domains for numerical values

calculate_statistics(att: Dict) → AttStats

Calculate statistics for a specific attribute in the dataset.

Parameters:

param att:: Dict The attribute Dict for which statistics are to be calculated.

Returns:

return:: AttStats An object containing statistics for the specified attribute.

filter_nominal_attribute_value(at: Dict, value: str, copy: bool = False) → Data

Filter the dataset based on the given nominal attribute value.

Parameters:

param at:: Dict The attribute Dict to filter.
param value:: str The value to filter.
param copy:: bool, optional Whether to create a copy of the filtered dataset. Defaults to False.

Returns:

return:: Data The filtered dataset.

filter_numeric_attribute_value(at: Dict, value: str, copy: bool = False) → Tuple[Data, Data]

Filter the dataset based on the given numeric attribute value.

Parameters:

param at:: Dict The attribute Dict to filter.
param value:: str The value to filter.
param copy:: bool, optional Whether to create a copy of the filtered dataset. Defaults to False.

Returns:

return:: Tuple[Data, Data] A tuple containing two filtered datasets: - The first dataset contains instances where the attribute value is less than the given value. - The second dataset contains instances where the attribute value is greater than or equal to the given value.

filter_numeric_attribute_value_expr(at: Dict, expr: str, copy: bool = False) → Tuple[Data, Data]

Filter the dataset based on the given expression involving a numeric attribute value.

Parameters:

param at:: Dict The attribute Dict to filter.
param expr:: str The expression to evaluate. It can involve comparisons and arithmetic operations with the attribute value.
param copy:: bool, optional Whether to create a copy of the filtered dataset. Defaults to False.

Returns:

return:: Tuple[Data, Data] A tuple containing two filtered datasets: - The first dataset contains instances where the attribute value satisfies the expression. - The second dataset contains instances where the attribute value does not satisfy the expression.

get_attribute_of_name(att_name: str) → Dict

Get the attribute Dict corresponding to the given attribute name.

Parameters:

param att_name:: str The name of the attribute to retrieve.

Returns:

return:: Dict The attribute Dict corresponding to the given attribute name. Returns None if the attribute name is not found in the dataset.

static parse_dataframe(df: DataFrame, df_imps=None, name='uarff_data', categorical: List[bool] = None) → Data: Parse pd.DataFrame to Data object

static parse_ucsv(filename: str) → Data: Parse DataFrame from csv to Data object

reduce_importance_for_attribute(att: Dict, discount_factor: float, for_class: str = None) → Data

Reduce the importance of a specific attribute by a given discount factor.

Parameters:

param att:: Dict The attribute Dict for which importance needs to be reduced.
param discount_factor:: float The discount factor by which to reduce the importance.
param for_class:: str, optional (default=None) If provided, reduce the importance only for the specified class.

Returns:

return:: Data A new Data object with reduced importance for the specified attribute.

set_importances(importances: DataFrame, expected_values: Dict) → Data

Set importances for each attribute based on the provided DataFrame of importances and expected values.

Parameters:

param importances:: pd.DataFrame DataFrame containing importances for each attribute.
param expected_values:: Dict Dictionary containing expected values.

Returns:

return:: Data A new Data object with updated importances.

to_dataframe(most_probable=True) → DataFrame

Convert the dataset to a pandas DataFrame.

Parameters:

param most_probable:: bool, optional (default=True) Whether to use the most probable values for each attribute. In current version there is no other option than True.

Returns:

return:: pd.DataFrame A pandas DataFrame representing the dataset.

to_dataframe_importances(average_absolute=False)

Convert the dataset’s importances to a pandas DataFrame.

Parameters:

param average_absolute:: bool, optional (default=False) Whether to calculate the average absolute importances.

Returns:

return:: pd.DataFrame A pandas DataFrame representing the importances of each attribute.

update_attribute_domains(): Set attribute domains for numerical values