bca_survival.analyzer module

BCA Survival Analyzer Module.

This module provides a class for performing survival analysis on body composition assessment (BCA) data. It combines clinical/demographic data with body measurement data and provides methods for univariate and multivariate Cox regression analysis, as well as Kaplan-Meier survival curves.

Requires: pandas, numpy, and custom preprocessing and models modules

class bca_survival.analyzer.BCASurvivalAnalyzer(df_main, df_measurements, main_id_col, measurement_id_col, start_date_col, event_date_col, event_col, standardize=False)[source]

Bases: object

A class for analyzing the relationship between body composition measurements and survival outcomes.

This class combines clinical/demographic data with body composition measurements, preprocesses the data for survival analysis, and provides methods for performing Cox regression and Kaplan-Meier survival analysis.

df

The merged and preprocessed dataframe.

Type:

pd.DataFrame

df_negative_days

Dataframe containing records with negative or NaN ‘days’ values for evaluation.

Type:

pd.DataFrame

start_date_col

Name of the column containing start dates.

Type:

str

event_date_col

Name of the column containing event dates.

Type:

str

event_col

Name of the column containing event indicators.

Type:

str

standardize

Whether to standardize variables for analysis.

Type:

bool

Initializes the BCASurvivalAnalyzer with clinical and measurement data.

Parameters:
  • df_main (pd.DataFrame) – Dataframe containing clinical/demographic data.

  • df_measurements (pd.DataFrame) – Dataframe containing body composition measurements.

  • main_id_col (str) – Name of the PID column in df_main.

  • measurement_id_col (str) – Name of the PID column in df_measurements.

  • start_date_col (str) – Name of the column containing start dates.

  • event_date_col (str) – Name of the column containing event dates.

  • event_col (str) – Name of the column containing event indicators.

  • standardize (bool, optional) – Whether to standardize variables for analysis. Defaults to False.

Note

The function renames ID columns to ‘PID’ for consistency, replaces ‘nd’ with NaN, and handles infinite values. It also checks for and warns about missing measurements.

preprocess_data()[source]

Preprocesses the data for survival analysis.

This method calculates time-to-event days and removes records with negative days.

Returns:

The preprocessed dataframe.

Return type:

pd.DataFrame

Note

This method is called automatically during initialization but can be called again if the underlying data changes.

univariate_cox_regression(columns, verbose=False, penalizer=0.0, correction_values=None, nan_threshold=0.7, significant_only=True)[source]

Performs univariate Cox proportional hazards regression for each specified variable.

Parameters:
  • columns (list) – List of predictor column names to test individually.

  • verbose (bool, optional) – Whether to print detailed progress information. Defaults to False.

  • penalizer (float, optional) – L2 penalizer value to apply to the regression. Defaults to 0.0.

  • correction_values (list, optional) – List of column names to include as correction terms in each univariate model. Defaults to None.

  • nan_threshold (float, optional) – Threshold for NaN values if standardizing. Defaults to 0.7.

  • significant_only (bool, optional) – Whether to only include significant observations. Defaults to True.

Returns:

DataFrame containing significant variables and their statistics.

Return type:

pd.DataFrame

Note

This method tests each variable individually in a Cox regression model, and returns only statistically significant variables (p < 0.05). If correction_values is provided, those variables will be included in each model to adjust for their effects.

kaplan_meier_plot(column, split_strategy='median', fixed_value=None, output_path=None, percentage=None, custom_title=None, dpi=400, custom_high_low_names=('low', 'high'))[source]

Generates a Kaplan-Meier survival plot for a specified variable.

Parameters:
  • column (str) – Column name to use for grouping.

  • split_strategy (str, optional) – Strategy for splitting data into high/low groups. Options: ‘mean’, ‘median’, ‘percentage’, ‘fixed’. Defaults to ‘median’.

  • fixed_value (float, optional) – Fixed threshold value when split_strategy is ‘fixed’. Defaults to None.

  • output_path (str, optional) – Directory path to save the plot. If None, saves in current directory. Defaults to None.

  • percentage (float, optional) – Percentile threshold when split_strategy is ‘percentage’. Defaults to None.

  • custom_title (str, optional) – Custom title for the plot. If None, a default title will

  • None. (be generated based on the column and split strategy. Defaults to)

  • dpi (int, optional) – Resolution of the output image in dots per inch. Higher values

  • 400. (result in better quality but larger file sizes. Defaults to)

  • custom_high_low_names (Tuple[str, str], optional) – Custom high and low variable names. Defaults to (“low”, “high”).

Returns:

Dictionary containing the log-rank test p-value, plot filename, and test statistic.

Return type:

dict

Note

This method splits the data into “high” and “low” groups based on the specified variable and strategy, then generates a Kaplan-Meier survival plot comparing the two groups. It also performs a log-rank test to compare the survival curves.

multivariate_cox_regression(columns, penalizer=0.1)[source]

Performs multivariate Cox proportional hazards regression.

Parameters:
  • columns (list) – List of predictor column names.

  • penalizer (float, optional) – L2 penalizer value to apply to the regression. Defaults to 0.1.

Returns:

Fitted Cox proportional hazards model.

Return type:

lifelines.CoxPHFitter

Note

This method fits a Cox regression model with all specified variables simultaneously. It handles multicollinearity by iteratively removing variables with high VIF values. The standardize parameter from the class initialization determines whether variables are standardized before analysis.