bca_survival.utils module

bca_survival.utils.make_quantile_split(df, column)[source]

Splits a DataFrame column into quantile-based groups (“low”, “high”, or missing). This function computes the 25th (Q1) and 75th (Q3) percentiles for the specified column and assigns each row to one of three groups based on the value in the column: - “low” for values less than or equal to Q1. - “high” for values greater than or equal to Q3. - Missing (pd.NA) for values between Q1 and Q3.

Parameters:
  • df (pd.DataFrame) – The input DataFrame containing the data to split.

  • column (str) – The name of the column to be analyzed and split into quantile-based groups.

Returns:

A new DataFrame with an additional “group” column that indicates the assigned group.

Rows with missing values in the specified column are excluded from the result.

Return type:

pd.DataFrame

Example

>>> import pandas as pd
>>> data = {'values': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]}
>>> df = pd.DataFrame(data)
>>> result = make_quantile_split(df, column="values")
>>> print(result)
bca_survival.utils.make_quantile_split_outter_vs_middle(df, column)[source]

Splits a DataFrame column into quantile-based groups (“outter”, “middle”).

This function computes the 25th (Q1) and 75th (Q3) percentiles for the specified column and assigns each row to one of three groups based on the value in the column: - “outter” for values less than or equal to Q1 or greater than or equal to Q3. - “middle” for values in between these two quantiles

Parameters:
  • df (pd.DataFrame) – The input DataFrame containing the data to split.

  • column (str) – The name of the column to be analyzed and split into quantile-based groups.

Returns:

A new DataFrame with an additional “group” column that indicates the assigned group.

Rows with missing values in the specified column are excluded from the result.

Return type:

pd.DataFrame

bca_survival.utils.calculate_age(df, birth_date_col, current_date_col, age_col_name='Age')[source]

Calculate age in years from two columns storing dates in a pandas DataFrame.

Parameters:
  • df (pd.DataFrame) – The DataFrame containing the date columns.

  • birth_date_col (str) – The name of the column with the birth date.

  • current_date_col (str) – The name of the column with the current/reference date.

  • age_col_name (str) – The name of the new column to store calculated age. Default is ‘Age’.

Returns:

DataFrame with the new column for age.

Return type:

pd.DataFrame

bca_survival.utils.clean_dates(df, date_column, date_format=None)[source]

Cleans a DataFrame by removing rows with invalid date values.

Parameters:
  • df (pandas.DataFrame) – The DataFrame to be cleaned.

  • date_column (str) – The name of the column containing date values.

  • date_format (str, optional) – The format of the date values (e.g., ‘%Y-%m-%d’). If None, various formats will be automatically detected.

Returns:

Cleaned DataFrame without invalid date values. dict: Statistics about the cleaning process

Return type:

pandas.DataFrame