lux.pyuid3.data.Data
- class lux.pyuid3.data.Data(name: str = None, attributes: List[Dict] = None, instances: List[Dict] = None)
Initialize a Data object.
Parameters:
- param name:
str, optional The name of the dataset.
- param attributes:
List[Dict], optional List of attribute Dict defining the dataset’s attributes.
- param instances:
List[Dict], optional List of instance Dict containing the dataset’s instances.
- __init__(name: str = None, attributes: List[Dict] = None, instances: List[Dict] = None)
Initialize a Data object.
Parameters:
- param name:
str, optional The name of the dataset.
- param attributes:
List[Dict], optional List of attribute Dict defining the dataset’s attributes.
- param instances:
List[Dict], optional List of instance Dict containing the dataset’s instances.
Methods
__init__([name, attributes, instances])Initialize a Data object.
calculate_statistics(att)Calculate statistics for a specific attribute in the dataset.
filter_nominal_attribute_value(at, value[, copy])Filter the dataset based on the given nominal attribute value.
filter_numeric_attribute_value(at, value[, copy])Filter the dataset based on the given numeric attribute value.
filter_numeric_attribute_value_expr(at, expr)Filter the dataset based on the given expression involving a numeric attribute value.
get_attribute_of_name(att_name)Get the attribute Dict corresponding to the given attribute name.
get_attributes()get_class_attribute()get_instances()get_name()parse_dataframe(df[, df_imps, name, categorical])Parse pd.DataFrame to Data object
parse_ucsv(filename)Parse DataFrame from csv to Data object
reduce_importance_for_attribute(att, ...[, ...])Reduce the importance of a specific attribute by a given discount factor.
set_importances(importances, expected_values)Set importances for each attribute based on the provided DataFrame of importances and expected values.
to_dataframe([most_probable])Convert the dataset to a pandas DataFrame.
to_dataframe_importances([average_absolute])Convert the dataset's importances to a pandas DataFrame.
Set attribute domains for numerical values
- calculate_statistics(att: Dict) AttStats
Calculate statistics for a specific attribute in the dataset.
Parameters:
- param att:
Dict The attribute Dict for which statistics are to be calculated.
Returns:
- return:
AttStats An object containing statistics for the specified attribute.
- filter_nominal_attribute_value(at: Dict, value: str, copy: bool = False) Data
Filter the dataset based on the given nominal attribute value.
Parameters:
- param at:
Dict The attribute Dict to filter.
- param value:
str The value to filter.
- param copy:
bool, optional Whether to create a copy of the filtered dataset. Defaults to False.
Returns:
- return:
Data The filtered dataset.
- filter_numeric_attribute_value(at: Dict, value: str, copy: bool = False) Tuple[Data, Data]
Filter the dataset based on the given numeric attribute value.
Parameters:
- param at:
Dict The attribute Dict to filter.
- param value:
str The value to filter.
- param copy:
bool, optional Whether to create a copy of the filtered dataset. Defaults to False.
Returns:
- return:
Tuple[Data, Data] A tuple containing two filtered datasets: - The first dataset contains instances where the attribute value is less than the given value. - The second dataset contains instances where the attribute value is greater than or equal to the given value.
- filter_numeric_attribute_value_expr(at: Dict, expr: str, copy: bool = False) Tuple[Data, Data]
Filter the dataset based on the given expression involving a numeric attribute value.
Parameters:
- param at:
Dict The attribute Dict to filter.
- param expr:
str The expression to evaluate. It can involve comparisons and arithmetic operations with the attribute value.
- param copy:
bool, optional Whether to create a copy of the filtered dataset. Defaults to False.
Returns:
- return:
Tuple[Data, Data] A tuple containing two filtered datasets: - The first dataset contains instances where the attribute value satisfies the expression. - The second dataset contains instances where the attribute value does not satisfy the expression.
- get_attribute_of_name(att_name: str) Dict
Get the attribute Dict corresponding to the given attribute name.
Parameters:
- param att_name:
str The name of the attribute to retrieve.
Returns:
- return:
Dict The attribute Dict corresponding to the given attribute name. Returns None if the attribute name is not found in the dataset.
- static parse_dataframe(df: DataFrame, df_imps=None, name='uarff_data', categorical: List[bool] = None) Data
Parse pd.DataFrame to Data object
- reduce_importance_for_attribute(att: Dict, discount_factor: float, for_class: str = None) Data
Reduce the importance of a specific attribute by a given discount factor.
Parameters:
- param att:
Dict The attribute Dict for which importance needs to be reduced.
- param discount_factor:
float The discount factor by which to reduce the importance.
- param for_class:
str, optional (default=None) If provided, reduce the importance only for the specified class.
Returns:
- return:
Data A new Data object with reduced importance for the specified attribute.
- set_importances(importances: DataFrame, expected_values: Dict) Data
Set importances for each attribute based on the provided DataFrame of importances and expected values.
Parameters:
- param importances:
pd.DataFrame DataFrame containing importances for each attribute.
- param expected_values:
Dict Dictionary containing expected values.
Returns:
- return:
Data A new Data object with updated importances.
- to_dataframe(most_probable=True) DataFrame
Convert the dataset to a pandas DataFrame.
Parameters:
- param most_probable:
bool, optional (default=True) Whether to use the most probable values for each attribute. In current version there is no other option than True.
Returns:
- return:
pd.DataFrame A pandas DataFrame representing the dataset.
- to_dataframe_importances(average_absolute=False)
Convert the dataset’s importances to a pandas DataFrame.
Parameters:
- param average_absolute:
bool, optional (default=False) Whether to calculate the average absolute importances.
Returns:
- return:
pd.DataFrame A pandas DataFrame representing the importances of each attribute.
- update_attribute_domains()
Set attribute domains for numerical values