API Reference

Below is a concise reference for DiNetxify’s classes and functions, summarizing their signatures and parameters. This is useful when writing your own scripts or if you need to quickly recall how to call a function.

Class `DiseaseNetworkData`

class DiseaseNetworkData(  
    study_design: str = 'cohort',  
    phecode_level: int = 1,  
    min_required_icd_codes: int = 1,  
    date_fmt: str = '%Y-%m-%d',  
    phecode_version: str = '1.2'  
)  

A class for handling disease network data creation and operations, for use in DiNetxify module.

Parameters:

study_design (str): Specify the type of study design, either “cohort”, “matched cohort”, or “exposed-only cohort”. Defaults to 'cohort'.
phecode_level (int): The level of phecode to use for analysis, where level 1 (with a total of 585 medical conditions) corresponds to 3-digit ICD-10 codes and level 2 (with a total of 1257 medical conditions) to 4-digit ICD-10 codes. Level 2 phecodes offer a more granular analysis with potentially smaller sample sizes per disease category. For larger studies, level 2 phecodes may enhance result interpretation. For smaller studies, level 1 is recommended to maintain statistical power. Defaults to 1.
min_required_icd_codes (int): The minimum number of ICD codes mapping to a specific phecode required for the phecode to be considered valid. For example, if set to 2, a single diagnosis record will not be sufficient to count as an occurrence. Ensure that your medical record are complete (i.e., not limited to only the first occurrence for each code) when using this parameter. Defaults to 1.
date_fmt (str): The format of the date fields in your phenotype and medical record data. Defaults to '%Y-%m-%d'.
phecode_version (str): The version of the phecode system used for converting diagnosis codes. Version 1.2 is the official version of the phecode system, with mapping files available for ICD-9-CM, ICD-9-WHO, ICD-10-CM, and ICD-10-WHO codes. While option 1.3a is provided, it’s an unofficial version and not recommended for general use. Defaults to '1.2'.

Instance Methods

`phenotype_data`

phenotype_data(
    self,
    phenotype_data_path: str,
    column_names: dict,
    covariates: list,
    is_single_sex: bool = False,
    force: bool = False
) -> None

Load phenotype data into the object.

Parameters:

phenotype_data_path (str): Path to CSV/TSV phenotype file with header row.
column_names (dict): Mapping of dataset column names. Required keys: 'Participant ID', 'Index date', 'End date', 'Exposure', 'Sex', 'Match ID'.
covariates (list): List of additional covariate names (e.g., ['age', 'BMI']).
is_single_sex (bool): True if dataset contains only one sex. Defaults to False.
force (bool): If True, overwrite existing data attributes. Defaults to False.

Returns:

None

`Table1`

Table1(
    self,
    continuous_stat_mode: str = 'auto'
) -> pd.DataFrame

Generate a descriptive summary table of phenotype data.

Parameters:

continuous_stat_mode (str): Method for continuous variable statistics. Choices:
- auto: Automatic normality-based choice.
- normal: Mean and standard deviation.
- nonnormal: Median and interquartile range. Defaults to 'auto'.

Returns:

pd.DataFrame

`merge_medical_records`

merge_medical_records(
    self,
    medical_records_data_path: str,
    diagnosis_code: str,
    column_names: dict,
    date_fmt: str = None,
    chunksize: int = 1000000
) -> None

Load one or more medical record datasets.

Parameters:

medical_records_data_path (str): Path to CSV/TSV medical record file.
diagnosis_code (str): Code type: 'ICD-9-CM', 'ICD-9-WHO', 'ICD-10-CM', or 'ICD-10-WHO'.
column_names (dict): Mapping for dataset columns. Required keys: 'Participant ID', 'Diagnosis code', 'Date of diagnosis'.
date_fmt (str): Date format (defaults to phenotype data format). Defaults to None.
chunksize (int): Rows per chunk for large files. Defaults to 1000000.

Returns:

None

`get_attribute`

get_attribute(
    self,
    attr_name: str
) -> any

Retrieve the value of a private or protected attribute.

Parameters:

attr_name (str): Name of the attribute to retrieve.

Returns:

Attribute value (any)

`medical_records_to_dataframe`

concat(
    self, 
    phecode_list: list,
    medical_history: bool=False
) -> DiseaseNetworkData

Convert stored medical record into a tidy pandas DataFrame.

Parameters:

phecode_list (list): List of phecodes to extract from the medical record. Only phecodes valid for the current phecode_level are accepted.
medical_history (bool): Include a binary history column for each phecode if set to True. Default to False

Returns:

pd.DataFrame

`modify_phecode_level`

modify_phecode_level(
    self,
    phecode_level: int
) -> None

Update the phecode level setting.

Parameters:

phecode_level (int): New phecode level (1 or 2).

Returns:

None

`disease_pair`

disease_pair(
    self,
    phewas_result: pd.DataFrame,
    min_interval_days: int = 0,
    max_interval_days: float = float('inf'),
    force: bool = False,
    n_process: int = 1,
    **kwargs
) -> None

Construct temporal and non-temporal disease pairs.

Parameters:

phewas_result (pd.DataFrame): DataFrame from phewas().
min_interval_days (int): Minimum days between diagnoses. Defaults to 0.
max_interval_days (float): Maximum days between diagnoses. Defaults to inf.
force (bool): Overwrite existing data. Defaults to False.
n_process (int): Number of parallel processes. Defaults to 1.
**kwargs: Additional mappings:
- phecode_col (str): Column for phecode. Defaults to 'phecode'.
- significance_col (str): Column for significance. Defaults to 'phewas_p_significance'.

Returns:

None

`save`

save(
    self,
    file: str
) -> None

Save object state to a gzip-compressed pickle file (.pkl.gz).

Parameters:

file (str): Filename or prefix (adds .pkl.gz).

Returns:

None

`load`

load(
    self,
    file: str,
    force: bool = False
) -> None

Load object state from a gzip-compressed pickle file.

Parameters:

file (str): Filename or prefix (adds .pkl.gz).
force (bool): Overwrite if True. Defaults to False.

Returns:

None

`save_npz`

save_npz(
    self,
    file: str
) -> None

Save object state to a NumPy .npz file.

Parameters:

file (str): Filename or prefix (adds .npz).

Returns:

None

`load_npz`

load_npz(
    self,
    file: str,
    force: bool = False
) -> None

Load object state from a NumPy .npz file.

Parameters:

file (str): Filename or prefix (adds .npz).
force (bool): Overwrite if True. Defaults to False.

Returns:

None

Analysis Functions

Function: `disease_network_pipeline`

disease_network_pipeline(
    data: DiseaseNetworkData,
    n_process: int,
    n_threshold_phewas: int,
    n_threshold_comorbidity: int,
    output_dir: str,
    project_prefix: str,
    keep_positive_associations: bool = False,
    save_intermediate_data: bool = False,
    system_exl: list = None,
    pipeline_mode: str = 'v1',
    method: str = 'RPCN',
    covariates: list = None,
    matching_var_dict: dict = {'sex':'exact'},
    matching_n: int = 2,
    min_interval_days: int = 0,
    max_interval_days: float = float('inf'),
    enforce_temporal_order: bool = False,
    correction: str = 'bonferroni',
    cutoff: float = 0.05,
    **kwargs
) -> dict

Parameters:

data (DiseaseNetworkData): The DiseaseNetworkData object.
n_process (int): Specifies the number of parallel processes to use. Defaults to required.
n_threshold_phewas (int): Minimum cases in exposed group for PheWAS inclusion. Passed to phewas().
n_threshold_comorbidity (int): Minimum co-occurrences for comorbidity strength. Passed to comorbidity_strength().
output_dir (str): Directory path for pipeline outputs.
project_prefix (str): Prefix for naming outputs.
keep_positive_associations (bool): Retain only positive associations. Defaults to False.
save_intermediate_data (bool): Save intermediate data objects. Defaults to False.
system_exl (list): Phecode systems to exclude. Defaults to None.
pipeline_mode (str): Analysis order mode ('v1' or 'v2'). Defaults to 'v1'.
method (str): Comorbidity network / trajectory method ('RPCN', 'PCN_PCA', 'CN'). Defaults to 'RPCN'.
covariates (list): Covariates for models. Defaults to None.
matching_var_dict (dict): Matching variables and criteria. Defaults to {'sex':'exact'}.
matching_n (int): Number of matched controls per case. Defaults to 2.
min_interval_days (int): Minimum days between diagnoses. Defaults to 0.
max_interval_days (float): Maximum days between diagnoses. Defaults to inf.
enforce_temporal_order (bool): Enforce temporal order in testing. Defaults to False.
correction (str): p-value correction method. Defaults to 'bonferroni'.
cutoff (float): Significance threshold. Defaults to 0.05.
**kwargs:
- alpha (float): L1 penalty weight. Defaults per method.
- auto_penalty (bool): Auto-select alpha. Defaults to True.
- alpha_range (tuple): Search range for alpha. Defaults to (1,15).
- scaling_factor (float): Scaling factor for alpha. Defaults to 1.
- n_PC (int): Number of principal components. Defaults to 5.
- explained_variance (float): Variance threshold for PCs.

Returns:

dict: Summary of significant results count.

Function: `phewas`

phewas(
    data: DiseaseNetworkData,
    covariates: list = None,
    proportion_threshold: float = None,
    n_threshold: int = None,
    n_process: int = 1,
    correction: str = 'bonferroni',
    cutoff: float = 0.05,
    system_inc: list = None,
    system_exl: list = None,
    phecode_inc: list = None,
    phecode_exl: list = None,
    log_file: str = None,
    lifelines_disable: bool = False
) -> pd.DataFrame

Parameters:

data (DiseaseNetworkData): Input data object.
covariates (list): Phenotypic covariates. Defaults to None.
proportion_threshold (float): Minimum proportion of cases. Mutually exclusive with n_threshold. Defaults to None.
n_threshold (int): Minimum case count. Mutually exclusive with proportion_threshold. Defaults to None.
n_process (int): Parallel processes. Defaults to 1.
correction (str): p-value correction method. Defaults to 'bonferroni'.
cutoff (float): Significance threshold. Defaults to 0.05.
system_inc (list): Systems to include. Defaults to None.
system_exl (list): Systems to exclude. Defaults to None.
phecode_inc (list): Specific phecodes to include. Defaults to None.
phecode_exl (list): Specific phecodes to exclude. Defaults to None.
log_file (str): Log file prefix. Defaults to None.
lifelines_disable (bool): Disable lifelines. Defaults to False.

Returns:

pd.DataFrame: PheWAS results.

Function: `phewas_multipletests`

phewas_multipletests(
    df: pd.DataFrame,
    correction: str = 'bonferroni',
    cutoff: float = 0.05
) -> pd.DataFrame

Parameters:

df (pd.DataFrame): Input results from phewas().
correction (str): p-value correction method. Defaults to 'bonferroni'.
cutoff (float): Significance threshold. Defaults to 0.05.

Returns:

pd.DataFrame: Adjusted results.

Function: `comorbidity_strength`

comorbidity_strength(
    data: DiseaseNetworkData,
    proportion_threshold: float = None,
    n_threshold: int = None,
    n_process: int = 1,
    log_file: str = None,
    correction_phi: str = 'bonferroni',
    cutoff_phi: float = 0.05,
    correction_RR: str = 'bonferroni',
    cutoff_RR: float = 0.05
) -> pd.DataFrame

Parameters:

data (DiseaseNetworkData): DiseaseNetworkData object.
proportion_threshold (float): The minimum proportion of individuals in the exposed group in which a disease pair must co-occur (temporal or non-temporal) to be included in the comorbidity strength estimation. If the proportion of co-occurrence is below this threshold, the disease pair is excluded from the analysis. proportion_threshold and n_threshold are mutually exclusive.
n_threshold (int): The minimum number of individuals in the exposed group in which a disease pair must co-occur (temporal or non-temporal) to be included in the comorbidity strength estimation. If the number of co-occurrences is below this threshold, the disease pair is excluded from the analysis. n_threshold and proportion_threshold are mutually exclusive.
n_process (int, default=1): Specifies the number of parallel processes to use for the analysis. Multiprocessing is enabled when n_process is set to a value greater than one.
correction_phi (str, default=’bonferroni’): Method for phi-correlation p-value correction from the statsmodels.stats.multitest.multipletests.
- Available methods are:
  - none : no correction
  - bonferroni : one-step correction
  - sidak : one-step correction
  - holm-sidak : step down method using Sidak adjustments
  - holm : step-down method using Bonferroni adjustments
  - simes-hochberg : step-up method (independent)
  - hommel : closed method based on Simes tests (non-negative)
  - fdr_bh : Benjamini/Hochberg (non-negative)
  - fdr_by : Benjamini/Yekutieli (negative) fdr_tsbh : two stage fdr correction (non-negative)
  - fdr_tsbky : two stage fdr correction (non-negative)
- See https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html for more details.
cutoff_phi (float, default=0.05): The significance threshold for adjusted phi-correlatio p-values.
correction_RR (str, default=’bonferroni’): Method for RR p-value correction from the statsmodels.stats.multitest.multipletests.
- Available methods are:
  - none : no correction
  - bonferroni : one-step correction
  - sidak : one-step correction
  - holm-sidak : step down method using Sidak adjustments
  - holm : step-down method using Bonferroni adjustments
  - simes-hochberg : step-up method (independent)
  - hommel : closed method based on Simes tests (non-negative)
  - fdr_bh : Benjamini/Hochberg (non-negative)
  - fdr_by : Benjamini/Yekutieli (negative) fdr_tsbh : two stage fdr correction (non-negative)
  - fdr_tsbky : two stage fdr correction (non-negative)
- See https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html for more details.
cutoff_RR (float, default=0.05): The significance threshold for adjusted RR p-values.
log_file (str, default=None): Path and prefix for the text file where log will be recorded. If None, the log will be written to the temporary files directory with file prefix of DiseaseNet_com_strength_.

Function: `comorbidity_strength_multipletests`

comorbidity_strength_multipletests(
    df: pd.DataFrame,
    correction_phi: str = 'bonferroni',
    cutoff_phi: float = 0.05,
    correction_RR: str = 'bonferroni',
    cutoff_RR: float = 0.05
) -> pd.DataFrame

Parameters:

df (pd.DataFrame): DataFrame containing the results from the comorbidity_strength function.
correction_phi (str, default=’bonferroni’): Method for phi-correlation p-value correction from the statsmodels.stats.multitest.multipletests.
- Available methods are:
  - none : no correction
  - bonferroni : one-step correction
  - sidak : one-step correction
  - holm-sidak : step down method using Sidak adjustments
  - holm : step-down method using Bonferroni adjustments
  - simes-hochberg : step-up method (independent)
  - hommel : closed method based on Simes tests (non-negative)
  - fdr_bh : Benjamini/Hochberg (non-negative)
  - fdr_by : Benjamini/Yekutieli (negative) fdr_tsbh : two stage fdr correction (non-negative)
  - fdr_tsbky : two stage fdr correction (non-negative)
- See https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html for more details.
cutoff_phi (float, default=0.05): The significance threshold for adjusted phi-correlatio p-values.
correction_RR (str, default=’bonferroni’): Method for RR p-value correction from the statsmodels.stats.multitest.multipletests.
- Available methods are:
  - none : no correction
  - bonferroni : one-step correction
  - sidak : one-step correction
  - holm-sidak : step down method using Sidak adjustments
  - holm : step-down method using Bonferroni adjustments
  - simes-hochberg : step-up method (independent)
  - hommel : closed method based on Simes tests (non-negative)
  - fdr_bh : Benjamini/Hochberg (non-negative)
  - fdr_by : Benjamini/Yekutieli (negative) fdr_tsbh : two stage fdr correction (non-negative)
  - fdr_tsbky : two stage fdr correction (non-negative)
- See https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html for more details.
cutoff_RR (float, default=0.05): The significance threshold for adjusted RR p-values.

Function: `binomial_test`

binomial_test(
    data: DiseaseNetworkData,
    comorbidity_strength_result: pd.DataFrame,
    comorbidity_network_result: pd.DataFrame = None,
    n_process: int = 1,
    log_file: str = None,
    correction: str = 'bonferroni',
    cutoff: float = 0.05,
    enforce_temporal_order: bool = False,
    **kwargs
) -> pd.DataFrame

Parameters:

data (DiseaseNetworkData): DiseaseNetworkData object.
comorbidity_strength_result (pd.DataFrame): DataFrame containing comorbidity strength analysis results produced by the ‘DiNetxify.comorbidity_strength’ function.
comorbidity_network_result (pd.DataFrame, default=None): DataFrame containing comorbidity network analysis results produced by the ‘DiNetxify.comorbidity_network’ function. When provided, the binomial test is limited to disease pairs deemed significant in the comorbidity network analysis.
n_process (int, default=1): Multiprocessing is disabled for this analysis.
correction (str, default=’bonferroni’): Method for binomial p-value correction from the statsmodels.stats.multitest.multipletests.
- Available methods are:
  - none : no correction
  - bonferroni : one-step correction
  - sidak : one-step correction
  - holm-sidak : step down method using Sidak adjustments
  - holm : step-down method using Bonferroni adjustments
  - simes-hochberg : step-up method (independent)
  - hommel : closed method based on Simes tests (non-negative)
  - fdr_bh : Benjamini/Hochberg (non-negative)
  - fdr_by : Benjamini/Yekutieli (negative) fdr_tsbh : two stage fdr correction (non-negative)
  - fdr_tsbky : two stage fdr correction (non-negative)
- See https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html for more details.
cutoff (float, default=0.05): The significance threshold for adjusted binomial p-values.
log_file (str, default=None): Path and prefix for the text file where log will be recorded. If None, the log will be written to the temporary files directory with file prefix of DiseaseNet_binomial_test_.
enforce_temporal_order (bool, default=False): If True, exclude individuals with non-temporal D1-D2 pair when performing the test. If False, include all individuals, including those with non-temporal D1-D2 pair.
**kwargs
- phecode_d1_col : str, default=’phecode_d1’ Name of the column in ‘comorbidity_strength_result’ and ‘comorbidity_network_result’ that specifies the phecode identifiers for disease 1 of the disease pair.
- phecode_d2_col : str, default=’phecode_d2’ Name of the column in ‘comorbidity_strength_result’ and ‘comorbidity_network_result’ that specifies the phecode identifiers for disease 2 of the disease pair.
- n_nontemporal_col : str, default=’n_d1d2_nontemporal’ Name of the column in ‘comorbidity_strength_result’ that specifies the number of individuals with non-temporal d1-d2 disease pair
- n_temporal_d1d2_col : str, default=’n_d1d2_temporal’ Name of the column in ‘comorbidity_strength_result’ that specifies the number of individuals with temporal d1->d2 disease pair.
- n_temporal_d2d1_col : str, default=’n_d2d1_temporal’ Name of the column in ‘comorbidity_strength_result’ that specifies the number of individuals with temporal d2->d1 disease pair.
- significance_phi_col : str, default=’phi_p_significance’ Name of the column in ‘comorbidity_strength_result’ that indicates the significance of phi-correlation for each disease pair.
- significance_RR_col : str, default=’RR_p_significance’ Name of the column in ‘comorbidity_strength_result’ that indicates the significance of RR for each disease pair.
- significance_coef_col : str, default=’comorbidity_p_significance’ Name of the column in ‘comorbidity_network_result’ that indicates the significance of comorbidity network analysis for each disease pair.

Function: `binomial_multipletests`

binomial_multipletests(
    df: pd.DataFrame,
    correction: str = 'bonferroni',
    cutoff: float = 0.05
) -> pd.DataFrame

Parameters:

df (pd.DataFrame): DataFrame containing the results from the comorbidity_strength function.
correction (str, default=’bonferroni’): Method for binomial p-value correction from the statsmodels.stats.multitest.multipletests.
- Available methods are:
  - none : no correction
  - bonferroni : one-step correction
  - sidak : one-step correction
  - holm-sidak : step down method using Sidak adjustments
  - holm : step-down method using Bonferroni adjustments
  - simes-hochberg : step-up method (independent)
  - hommel : closed method based on Simes tests (non-negative)
  - fdr_bh : Benjamini/Hochberg (non-negative)
  - fdr_by : Benjamini/Yekutieli (negative) fdr_tsbh : two stage fdr correction (non-negative)
  - fdr_tsbky : two stage fdr correction (non-negative)
- See https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html for more details.
cutoff (float, default=0.05): The significance threshold for adjusted binomial p-values.

Function: `comorbidity_network`

comorbidity_network(
    data: DiseaseNetworkData,
    comorbidity_strength_result: pd.DataFrame,
    binomial_test_result: pd.DataFrame = None,
    method: str = 'RPCN',
    covariates: list = None,
    n_process: int = 1,
    log_file: str = None,
    correction: str = 'bonferroni',
    cutoff: float = 0.05,
    **kwargs
) -> pd.DataFrame

Parameters:

data (DiseaseNetworkData): DiseaseNetworkData object.
comorbidity_strength_result (pd.DataFrame): DataFrame containing comorbidity strength analysis results produced by the ‘DiNetxify.comorbidity_strength’ function.
binomial_test_result (pd.DataFrame, default=None): DataFrame containing binomial test analysis results produced by the DiNetxify.binomial_test function.
method (str, default=’RPCN’): Specifies the comorbidity network analysis method to use. Choices are: - ‘RPCN: Regularized Partial Correlation Network. - ‘PCN_PCA: Partial Correlation Network with PCA. - ‘CN’: Correlation Network. Additional Options for RPCN: - ‘alpha’ : non-negative scalar The weight multiplying the l1 penalty term for other diseases covariates. Ignored if ‘auto_penalty’ is enabled. - ‘auto_penalty’ : bool, default=True If ‘True’, automatically determine the optimal ‘alpha’ based on model AIC value. - ‘alpha_range’ : tuple, default=(1,15) When ‘auto_penalty’ is True, search the optimal ‘alpha’ in this range. - ‘scaling_factor’ : positive scalar, default=1 The scaling factor for the alpha when ‘auto_penalty’ is True. Additional Options for PCN_PCA: - ‘n_PC’ : int, default=5 Fixed number of principal components to include in each model. - ‘explained_variance’ : float Determines the number of principal components based on the cumulative explained variance. Overrides ‘n_PC’ if specified.
covariates (list, default=None): List of phenotypic covariates to include in the model. By default, includes [‘sex’] and all covariates specified in the DiNetxify.DiseaseNetworkData.phenotype_data() function. To include the required variable sex as a covariate, always use ‘sex’ instead of its original column name. For other covariates specified in the DiNetxify.DiseaseNetworkData.phenotype_data() function, use their original column names.
n_process (int, default=1): Specifies the number of parallel processes to use for the analysis. Multiprocessing is enabled when n_process is set to a value greater than one.
correction (str, default=’bonferroni’): Method for binomial p-value correction from the statsmodels.stats.multitest.multipletests.
- Available methods are:
  - none : no correction
  - bonferroni : one-step correction
  - sidak : one-step correction
  - holm-sidak : step down method using Sidak adjustments
  - holm : step-down method using Bonferroni adjustments
  - simes-hochberg : step-up method (independent)
  - hommel : closed method based on Simes tests (non-negative)
  - fdr_bh : Benjamini/Hochberg (non-negative)
  - fdr_by : Benjamini/Yekutieli (negative) fdr_tsbh : two stage fdr correction (non-negative)
  - fdr_tsbky : two stage fdr correction (non-negative)
- See https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html for more details.
cutoff (float, default=0.05): The significance threshold for adjusted comorbidity network analysis p-values.
log_file (str, default=None): Path and prefix for the text file where log will be recorded. If None, the log will be written to the temporary files directory with file prefix of DiseaseNet_comorbidity_network_.
**kwargs
- phecode_d1_col : str, default=’phecode_d1’ Name of the column in ‘comorbidity_strength_result’ and ‘binomial_test_result’ that specifies the phecode identifiers for disease 1 of the disease pair.
- phecode_d2_col : str, default=’phecode_d2’ Name of the column in ‘comorbidity_strength_result’ and ‘binomial_test_result’ that specifies the phecode identifiers for disease 2 of the disease pair.
- significance_phi_col : str, default=’phi_p_significance’ Name of the column in ‘comorbidity_strength_result’ that indicates the significance of phi-correlation for each disease pair.
- significance_RR_col : str, default=’RR_p_significance’ Name of the column in ‘comorbidity_strength_result’ that indicates the significance of RR for each disease pair.
- significance_binomial_col : str default=’binomial_p_significance’ Name of the column in ‘binomial_test_result’ that indicates the significance of binomial test for each disease pair.
- alpha : non-negative scalar The weight multiplying the l1 penalty term for other diseases covariates. Ignored if ‘auto_penalty’ is enabled.
- auto_penalty : bool, default=True If ‘True’, automatically determines the best ‘alpha’ based on model AIC value.
- alpha_range : tuple, default=(1,15) When ‘auto_penalty’ is True, search the optimal ‘alpha’ in this range.
- scaling_factor : positive scalar, default=1 The scaling factor for the alpha when ‘auto_penalty’ is True.
- n_PC : int, default=5 Fixed number of principal components to include in each model.
- explained_variance : float Cumulative explained variance threshold to determine the number of principal components. Overrides 'n_PC' if specified.

Function: `comorbidity_multipletests`

comorbidity_multipletests(
    df: pd.DataFrame,
    correction: str = 'bonferroni',
    cutoff: float = 0.05
) -> pd.DataFrame

Parameters:

df (pd.DataFrame): DataFrame containing the results from the ‘comorbidity_network’ function.
correction (str, default=’bonferroni’): Method for binomial p-value correction from the statsmodels.stats.multitest.multipletests.
- Available methods are:
  - none : no correction
  - bonferroni : one-step correction
  - sidak : one-step correction
  - holm-sidak : step down method using Sidak adjustments
  - holm : step-down method using Bonferroni adjustments
  - simes-hochberg : step-up method (independent)
  - hommel : closed method based on Simes tests (non-negative)
  - fdr_bh : Benjamini/Hochberg (non-negative)
  - fdr_by : Benjamini/Yekutieli (negative) fdr_tsbh : two stage fdr correction (non-negative)
  - fdr_tsbky : two stage fdr correction (non-negative)
- See https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html for more details.
cutoff (float, default=0.05): The significance threshold for adjusted binomial p-values.

Function: `disease_trajectory`

disease_trajectory(
    data: DiseaseNetworkData,
    comorbidity_strength_result: pd.DataFrame,
    binomial_test_result: pd.DataFrame,
    method: str = 'RPCN',
    matching_var_dict: dict = {'sex':'exact'},
    matching_n: int = 2,
    max_n_cases: float = np.inf,
    global_sampling: bool = False,
    covariates: list = None,
    n_process: int = 1,
    log_file: str = None,
    correction: str = 'bonferroni',
    cutoff: float = 0.05,
    **kwargs
) -> pd.DataFrame

Parameters:

data (DiseaseNetworkData): DESCRIPTION.
comorbidity_strength_result (pd.DataFrame): DataFrame containing comorbidity strength analysis results produced by the DiNetxify.comorbidity_strength() function.
binomial_test_result (pd.DataFrame): DataFrame containing binomial test analysis results produced by the DiNetxify.binomial_test() function.
method (str, default=’RPCN’): Specifies the comorbidity network analysis method to use. Choices are:
- 'RPCN': Regularized Partial Correlation Network.
- 'PCN_PCA': Partial Correlation Network with PCA.
- 'CN': Correlation Network.
matching_var_dict (dict, default={‘sex’:’exact’}): Specifies the matching variables and the criteria used for incidence density sampling. For categorical and binary variables, the matching criteria should always be 'exact'. For continuous variables, provide a scalar greater than 0 as the matching criterion, indicating the maximum allowed difference when matching. To include the required variable sex as a matching variable, always use 'sex' instead of its original column name. For other covariates specified in the DiNetxify.DiseaseNetworkData.phenotype_data() function, use their original column names.
matching_n (int, default=2): Specifies the maximum number of matched controls for each case.
max_n_cases (int, default=np.inf): Specifies the maximum number of D2 cases to include in the analysis. If the number of D2 cases exceeds this value, a random sample of cases will be selected.
global_sampling (bool, default=False): Indicates whether to perform independent incidence density sampling for each D1→D2 pair (if False), or to perform a single incidence density sampling for all Dx→D2 pairs with separate regression models for each D1→D2 pair (if True). Global sampling is recommended when processing large datasets, though it might reduce result heterogeneity.
covariates (list, default=None): List of phenotypic covariates to include in the model. By default, includes all covariates specified in the DiNetxify.DiseaseNetworkData.phenotype_data() function. Categorical and binary variables used for matching should not be included as covariates. Continuous variables used for matching can be included as covariates, but caution is advised. To include the required variable sex as a covariate, always use sex instead of its original column name. For other covariates specified in the DiNetxify.DiseaseNetworkData.phenotype_data() function, use their original column names.
n_process (int, default=1): Specifies the number of parallel processes to use for the analysis. Multiprocessing is enabled when n_process is set to a value greater than one.
correction (str, default=’bonferroni’): Method for binomial p-value correction from the statsmodels.stats.multitest.multipletests.
- Available methods are:
  - none : no correction
  - bonferroni : one-step correction
  - sidak : one-step correction
  - holm-sidak : step down method using Sidak adjustments
  - holm : step-down method using Bonferroni adjustments
  - simes-hochberg : step-up method (independent)
  - hommel : closed method based on Simes tests (non-negative)
  - fdr_bh : Benjamini/Hochberg (non-negative)
  - fdr_by : Benjamini/Yekutieli (negative) fdr_tsbh : two stage fdr correction (non-negative)
  - fdr_tsbky : two stage fdr correction (non-negative)
- See https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html for more details.
cutoff (float, default=0.05): The significance threshold for adjusted comorbidity network analysis p-values.
log_file (str, default=None): Path and prefix for the text file where log will be recorded. If None, the log will be written to the temporary files directory with file prefix of DiseaseNet_trajectory_.
**kwargs Analysis option
- enforce_time_interval : bool, default=True If set to True, applies the specified minimum and maximum time intervals when determining the D2 outcome among individuals diagnosed with D1. These time interval requirements should be defined using the DiNetxify.DiseaseNetworkData.disease_pair() function.
- phecode_d1_col : str, default=’phecode_d1’ Name of the column in comorbidity_strength_result and binomial_test_result that specifies the phecode identifiers for disease 1 of the disease pair.
- phecode_d2_col : str, default=’phecode_d2’ Name of the column in comorbidity_strength_result and binomial_test_result that specifies the phecode identifiers for disease 2 of the disease pair.
- significance_phi_col : str, default=’phi_p_significance’ Name of the column in comorbidity_strength_result that indicates the significance of phi-correlation for each disease pair.
- significance_RR_col : str, default=’RR_p_significance’ Name of the column in comorbidity_strength_result that indicates the significance of RR for each disease pair.
- significance_binomial_col : str default=’binomial_p_significance’ Name of the column in binomial_test_result that indicates the significance of binomial test for each disease pair.
- alpha : non-negative scalar The weight multiplying the l1 penalty term for other diseases covariates. Ignored if auto_penalty is enabled.
- auto_penalty : bool, default=True If True, automatically determines the best alpha based on model AIC value.
- alpha_range : tuple, default=(1,15) When auto_penalty is True, search the optimal alpha in this range.
- scaling_factor : positive scalar, default=1 The scaling factor for the alpha when ‘auto_penalty’ is True.
- n_PC : int, default=5 Fixed number of principal components to include in each model.
- explained_variance : float Cumulative explained variance threshold to determine the number of principal components. Overrides 'n_PC' if specified.

Function: `trajectory_multipletests`

trajectory_multipletests(
    df: pd.DataFrame,
    correction: str = 'bonferroni',
    cutoff: float = 0.05
) -> pd.DataFrame

Parameters:

df (pd.DataFrame): DataFrame containing the results from the ‘disease_trajectory’ function.
correction (str, default=’bonferroni’): Method for binomial p-value correction from the statsmodels.stats.multitest.multipletests.
- Available methods are:
  - none : no correction
  - bonferroni : one-step correction
  - sidak : one-step correction
  - holm-sidak : step down method using Sidak adjustments
  - holm : step-down method using Bonferroni adjustments
  - simes-hochberg : step-up method (independent)
  - hommel : closed method based on Simes tests (non-negative)
  - fdr_bh : Benjamini/Hochberg (non-negative)
  - fdr_by : Benjamini/Yekutieli (negative) fdr_tsbh : two stage fdr correction (non-negative)
  - fdr_tsbky : two stage fdr correction (non-negative)
- See https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html for more details.
cutoff (float, default=0.05): The significance threshold for adjusted binomial p-values.

Class `Plot`

class Plot(
    phewas_result: pd.DataFrame,
    comorbidity_result: pd.DataFrame,
    trajectory_result: pd.DataFrame,
    exposure_name: str = None,
    exposure_location: Tuple[float, float, float] = None,
    exposure_size: float = None,
    phecode_col: str = 'phecode',
    disease_col: str = 'disease',
    system_col: str = 'system',
    phewas_number_col: str = 'N_cases_exposed',
    phewas_coef_col: str = 'phewas_coef',
    phewas_se_col: str = 'phewas_se',
    source_col: str = 'phecode_d1',
    target_col: str = 'phecode_d2',
    disease_pair_col: str = 'name_disease_pair',
    comorbidity_beta_col: str = 'comorbidity_beta',
    trajectory_beta_col: str = 'trajectory_beta',
    phewas_significance_col: str = 'phewas_p_significance',
    comorbidity_significance_col: str = 'comorbidity_p_significance',
    trajectory_significance_col: str = 'trajectory_p_significance',
    **kwargs
)

A class for integrating and visualizing disease relationships from PheWAS, comorbidity network, and trajectory analyses.

Parameters:

comorbidity_result (pd.DataFrame): Result DataFrame from comorbidity network analysis, includes non temporal disease pairs (D1, D2), association metrics for example beta coefficients and p values, and a boolean significance flag.
trajectory_result (pd.DataFrame): Result DataFrame from temporal trajectory analysis, includes temporal disease pairs (source -> target), temporal association metrics for example beta coefficients and p values, and a boolean significance flag.
phewas_result (pd.DataFrame): PheWAS results, includes phecode, effect sizes for example hazard ratios, case counts, and disease system classifications.
exposure_name (str, optional): Identifier for the primary exposure. Set to None for exposed only cohort studies.
exposure_location (Tuple[float, float, float], optional): Custom 3D coordinates x, y, z for the exposure node. Defaults to (0, 0, 0) if None. Ignored when exposure_name is None.
exposure_size (float, optional): Relative size scaling for the exposure node. Ignored when exposure_name is None.

If your result DataFrames use the default column names, keep these parameters as is.

phecode_col (str, optional): Column in phewas_result with phecodes, default 'phecode'.
disease_col (str, optional): Column in phewas_result with disease names, default 'disease'.
system_col (str, optional): Column in phewas_result with disease system labels, default 'system'.
phewas_number_col (str, optional): Column in phewas_result with case counts, default 'N_cases_exposed'.
phewas_coef_col (str, optional): Column in phewas_result with effect sizes, default 'phewas_coef'.
phewas_se_col (str, optional): Column in phewas_result with standard errors, default 'phewas_se'.
source_col (str, optional): Column in comorbidity_result and trajectory_result for source or antecedent diseases, default 'phecode_d1'.
target_col (str, optional): Column in comorbidity_result and trajectory_result for target or consequent diseases, default 'phecode_d2'.
disease_pair_col (str, optional): Column in comorbidity_result and trajectory_result with disease pair identifiers, default 'name_disease_pair'.
comorbidity_beta_col (str, optional): Column in comorbidity_result with effect sizes, default 'comorbidity_beta'.
trajectory_beta_col (str, optional): Column in trajectory_result with effect sizes, default 'trajectory_beta'.
phewas_significance_col (str, optional): Column in phewas_result used for significance filtering, default 'phewas_p_significance'.
comorbidity_significance_col (str, optional): Column in comorbidity_result used for significance filtering, default 'comorbidity_p_significance'.
trajectory_significance_col (str, optional): Column in trajectory_result used for significance filtering, default 'trajectory_p_significance'.
**kwargs
- SYSTEM (List[str], optional): Use with COLOR to assign colors by phecode system. If not provided, systems and their order are inferred from phewas_result. Default order:
  ['neoplasms', 'genitourinary', 'digestive', 'respiratory', 'infectious diseases', 'mental disorders', 'musculoskeletal', 'hematopoietic', 'dermatologic', 'circulatory system', 'neurological', 'endocrine/metabolic', 'sense organs', 'injuries & poisonings', 'congenital anomalies', 'symptoms', 'others']
- COLOR (List[str], optional): Colors corresponding to SYSTEM, one to one. Length of COLOR must be at least the length of SYSTEM. Supported formats include 'red', '#ED9A8D', and 'rgb(255, 0, 0)'. Default palette:
  ['#F46D5A', '#5DA5DA', '#5EBCD1', '#C1D37F', '#CE5A57', '#A5C5D9', '#F5B36D', '#7FCDBB', '#ED9A8D', '#94B447', '#8C564B', '#E7CB94', '#8C9EB2', '#E0E0E0', '#F1C40F', '#9B59B6', '#4ECDC4', '#6A5ACD']

Instance Methods

`three_dimension_plot`

three_dimension_plot(
    self,
    path: str,
    max_radius: float = 180.0,
    min_radius: float = 35.0,
    line_color: str = 'black',
    line_width: float = 1.0,
    size_reduction: float = 0.5,
    cluster_reduction_ratio: float = 0.4,
    layer_distance: float = 40.0,
    layout_width: float = 900.0,
    layout_height: float = 900.0,
    font_style: str = 'Times New Roman',
    font_size: float = 15.0
) -> None

Generate and save a 3D interactive HTML visualization.

Parameters:

path: File path to save the HTML visualization
max_radius: Maximum radial distance for node placement (default: 180.0)
min_radius: Minimum radial distance for node placement (default: 35.0)
line_color: Color for trajectory lines (default: "black")
line_width: Width for trajectory lines (default: 1.0)
size_reduction: Scaling factor for node sizes (default: 0.5)
cluster_reduction_ratio: Cluster compression factor for layout (default: 0.4)
layer_distance: Vertical distance between layers (default: 40.0)
layout_width: Figure width in pixels (default: 900.0)
layout_height: Figure height in pixels (default: 900.0)
font_style: Font family for text elements (default: 'Times New Roman')
font_size: Base font size in points (default: 15.0)

`comorbidity_network_plot`

comorbidity_network_plot(
    self,
    path: str,
    max_radius: float = 180.0,
    min_radius: float = 35.0,
    size_reduction: float = 0.5,
    cluster_reduction_ratio: float = 0.4,
    line_width: float = 1.0,
    line_color: str = 'black',
    layer_distance: float = 40.0,
    font_style: str = 'Times New Roman'
) -> None

Generate and save a 2D HTML visualization of the comorbidity network.

Parameters:

path: Output file path for saving HTML visualization
max_radius: Maximum radial position for nodes (default: 90.0)
min_radius: Minimum radial position for nodes (default: 35.0)
size_reduction: Scaling factor for node sizes (default: 0.5)
cluster_reduction_ratio: Compression factor for cluster layout (default: 0.4)
line_width: Width of comorbidity lines (default: 1.0)
line_color: Color of comorbidity lines (default: "black")
layer_distance: Distance between concentric circles (default: 40.0)
font_style: Font family for text elements (default: "Times New Roman")

`trajectory_plot`

trajectory_plot(
    self,
    path: str,
    dpi: float=500
) -> None

Generate and save trajectory plots per cluster as .png file.

Parameters:

path: Directory path to save output images
dpi: Image resolution in dots per inch for output files (default: 500)

`phewas_plot`

phewas_plot(
    self,
    path: str,
    system_font_size: float = 17,
    disease_font_size: float = 10,
    HR_max: float = 2,
    incident_number_max: int = None,
    is_exposure_only: bool = False,
    dpi: float = 200
) -> None

Creates a circular PheWAS polar bar plot across disease systems, outer ring shows individual diseases, inner segments group by system, color gradient encodes hazard ratio, text rotates automatically for readability.

Parameters:

path: Output file path for saving the plot
system_font_size: Font size for disease system or category labels (default: 17)
disease_font_size: Font size for disease labels (default: 10)
HR_max: Upper bound for the HR heatmap, values greater than or equal to this render as the same red, affects color only (default: 2)
incident_number_max: Upper bound for the incident count heatmap for exposure only cohorts, values greater than or equal to this render as the same red, None auto scales to the maximum observed count (default: None)
is_exposure_only: Flag for exposure only cohort, set to True for exposure only, False for standard or matched cohort (default: False)
dpi: Image resolution in dots per inch for output files (default: 200)

API Reference

Class DiseaseNetworkData

Instance Methods

phenotype_data

Table1

merge_medical_records

get_attribute

medical_records_to_dataframe

modify_phecode_level

disease_pair

save

load

save_npz

load_npz

Analysis Functions

Function: disease_network_pipeline

Function: phewas

Function: phewas_multipletests

Function: comorbidity_strength

Function: comorbidity_strength_multipletests

Function: binomial_test

Function: binomial_multipletests

Function: comorbidity_network

Function: comorbidity_multipletests

Function: disease_trajectory

Function: trajectory_multipletests

Class Plot

Instance Methods

three_dimension_plot

comorbidity_network_plot

trajectory_plot

phewas_plot

Class `DiseaseNetworkData`

`phenotype_data`

`Table1`

`merge_medical_records`

`get_attribute`

`medical_records_to_dataframe`

`modify_phecode_level`

`disease_pair`

`save`

`load`

`save_npz`

`load_npz`

Function: `disease_network_pipeline`

Function: `phewas`

Function: `phewas_multipletests`

Function: `comorbidity_strength`

Function: `comorbidity_strength_multipletests`

Function: `binomial_test`

Function: `binomial_multipletests`

Function: `comorbidity_network`

Function: `comorbidity_multipletests`

Function: `disease_trajectory`

Function: `trajectory_multipletests`

Class `Plot`

`three_dimension_plot`

`comorbidity_network_plot`

`trajectory_plot`

`phewas_plot`