pycat.data#

Data Management#

Data Module for PyCAT

This module contains classes and functions for managing and processing data within a biological image analysis context, using napari. The primary components include BaseDataClass, which provides basic functionalities for data handling and management, and AnalysisDataClass, which extends BaseDataClass to cater specifically to puncta and cell data analysis.

This module is designed to be integrated with napari viewers to facilitate real-time data manipulation and analysis, enhancing the workflow in biological research settings.

Author#

Christian Neureuter, GitHub: cneureuter

Date#

4-20-2024

class BaseDataClass(base_data_repository=None)[source]#

Bases: object

A base class for managing data related to image analysis or similar scientific data processing applications. It encapsulates operations for storing, retrieving, updating, and resetting data, especially focusing on handling pandas DataFrames for analysis results alongside other types of metadata and parameters.

data_repository#

A dictionary acting as a central repository for all data managed by instances of this class. This includes pandas DataFrames for storing analysis results, numeric parameters for analysis, and metadata.

Type:

dict

set_data(self, key, data):

Stores or updates data in the data repository under the specified key.

get_data(self, key):

Retrieves data stored in the data repository under the specified key.

append_to_df(self, key, data):

Appends a new row of data to a DataFrame stored under the specified key in the data repository.

update_df(self, key, index, column, value):

Updates a specific value in a DataFrame stored under the specified key in the data repository.

add_column_to_df(self, key, column_name, default_value=None):

Adds a new column to a DataFrame stored under the specified key in the data repository, initializing all rows in this column to a default value.

calculate_length(self, viewer):

Calculates and updates the size parameters for objects and cells based on annotations made in the napari viewer.

get_dataframes(self):

Retrieves all pandas DataFrames stored in the data repository.

get_all_variables(self):

Returns a list of all keys currently stored in the data repository.

reset_values(self, df_names_to_reset=None, clear_all=False):

Resets specified DataFrames or all data within the class to their default initialization values.

__init__(base_data_repository=None)[source]#

Initializes the BaseDataClass with a default data repository containing empty pandas DataFrames for storing analysis results, default analysis parameters, and an empty metadata dictionary. It can be with initialized with optional existing repository data.

add_column_to_df(key, column_name, default_value=None)[source]#

Adds a new column to the DataFrame identified by the given key, initializing it with the specified default value for all rows.

Parameters:
  • key (str) – The key identifying the DataFrame to which the new column should be added.

  • column_name (str) – The name of the new column to be added.

  • default_value (object, optional) – The default value to be assigned to all rows in the new column. If not provided, the default value is None.

append_to_df(key, data)[source]#

Appends a new row of data to the DataFrame identified by the given key. The new row of data should be provided as a dictionary where keys correspond to column names.

Parameters:
  • key (str) – The key identifying the DataFrame to which the new row of data should be appended.

  • data (dict) – A dictionary where keys represent column names and values represent the data to be appended.

calculate_length(viewer)[source]#

Utilizes annotations made in a napari viewer to calculate and update size parameters for objects and cells, such as diameters and radii. This method assumes specific naming conventions for annotation layers within the viewer.

Parameters:

viewer (napari.Viewer) – A napari viewer instance containing annotations for calculating object and cell sizes.

Notes

This method is designed to work with specific annotation layers named ‘Cell Diameter’ and ‘Object Diameter’ in the viewer, which are assumed to contain line annotations representing the diameters of cells and objects, respectively. The calculated sizes are stored in the data repository under the keys ‘cell_diameter’, ‘object_size’, and ‘ball_radius’.

get_all_variables()[source]#

Returns a list of all keys representing the data currently stored in the data repository.

get_data(key, default_value=None)[source]#

Retrieves the data stored under the specified key from the data repository. If the key does not exist, None is returned. An optional default value can be provided to return when the key is not found.

Parameters:
  • key (str) – The unique identifier for the data to be retrieved.

  • default_value (object, optional) – The default value to return if the key does not exist in the data repository. The default is None.

Returns:

The data stored in the data repository under the specified key, or the default value if the key does not exist. If no default value is provided, None is returned when the key is not found.

Return type:

object

get_dataframes()[source]#

Retrieves and returns a dictionary of all pandas DataFrames currently stored in the data repository.

reset_values(df_names_to_reset=None, clear_all=False)[source]#

Resets specific DataFrames to empty DataFrames or clears all data within the class back to default initialization values. This method can target specific DataFrames for resetting, or reinitialize the class data repository entirely based on the parameters provided.

Parameters:
  • df_names_to_reset (list of str, optional) – A list of keys (string) identifying which DataFrames within the data repository should be reset to their default empty state. This parameter is ignored if clear_all is True.

  • clear_all (bool, optional) – A flag indicating whether to reset all data within the class to their default values. If True, it overrides df_names_to_reset and reinitializes the entire data repository to default values specified in the class constructor.

Note

This method selectively resets data based on provided parameters, allowing for flexible data management within the class instance.

set_data(key, data)[source]#

Stores or updates the specified data under the given key within the data repository. This method is flexible and can be used to store various types of data, from numeric values to complex objects.

Parameters:
  • key (str) – A unique identifier for the data being stored.

  • data (object) – The data to be stored in the data repository under the specified key.

Notes

This method can be used to store a wide range of data types, including pandas DataFrames, numpy arrays, dictionaries, and other objects. The key should be a string that uniquely identifies the data being stored.

update_df(key, index, column, value)[source]#

Updates the value at a specified index and column in the DataFrame identified by the given key.

Parameters:
  • key (str) – The key identifying the DataFrame to be updated.

  • index (int) – The index of the row to be updated.

  • column (str) – The column name where the value should be updated.

  • value (object) – The new value to be set at the specified index and column.

update_metadata(image)[source]#

Update metadata while preserving other state.

Parameters:

image (AICSImage) – Image object containing metadata to extract