Required Packages

Supplementary Material for "Analysis of Two-Stage Rollout Designs with Clustering for Causal Inference under Network Interference"

Required Packages

joblib 1.4.2
pymetis 2023.1.1
numpy 1.26.3
pandas 2.2.0
matplotlib 3.8.2
networkx 3.2.1
scipy 1.12.0
seaborn 0.13.2

Code Demos

figures_and_tables.ipynb
- Reproduces the figures and tables from the paper
preparing_network_data.ipynb
- Demo on preparing network data for running experiments
running_experiments
- Demo on running experiments (use to gather data to create the plots/tables in the paper)

Folders and Files

Amazon

Experiments: contains .json files with experiment parameters for Amazon network; also has data files (.pkl) generated in experiments
Network: data files for the network, plus some .py files for preparing the network data; the main important file is data.pkl which is used in the experiment files and contains a representation of the Amazon network as well as some clusterings (created by prepare_data.py); the file deg_hist.py generates a degree histrogram for the network.
Leskovec, J., Adamic, L. A., & Huberman, B. A. (2007). The dynamics of viral marketing. ACM Transactions on the Web (TWEB), 1(1), 5-es.
Leskovec, J. and Krevl, A. (2014). SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data.

BlogCatalog

Experiments: contains .json files with experiment parameters for BlogCatalog network; also has data files (.pkl) generated in experiments
Network: data files for the network, plus some .py files for preparing the network data; the main important file is data.pkl which is used in the experiment files and contains a representation of the BlogCatalog network as well as some clusterings (created by prepare_data.py); the file deg_hist.py generates a degree histrogram for the network.
Rossi, R., & Ahmed, N. (2015, March). The network data repository with interactive graph analytics and visualization. In Proceedings of the AAAI conference on artificial intelligence (Vol. 29, No. 1).
Tang, L., & Liu, H. (2009, June). Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 817-826).
Tang, L., & Liu, H. (2009, November). Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 1107-1116).

Email

Experiments: contains .json files with experiment parameters for Email network; also has data files (.pkl) generated in experiments
Network: data files for the network, plus some .py files for preparing the network data; the main important file is data.pkl which is used in the experiment files and contains a representation of the Email network as well as some clusterings (created by prepare_data.py); the file deg_hist.py generates a degree histrogram for the network.
Leskovec, J. and Krevl, A. (2014). SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data.
Leskovec, J., Kleinberg, J., & Faloutsos, C. (2007). Graph evolution: Densification and shrinking diameters. ACM transactions on Knowledge Discovery from Data (TKDD), 1(1), 2-es.
Yin, H., Benson, A. R., Leskovec, J., & Gleich, D. F. (2017, August). Local higher-order graph clustering. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 555-564).

experiment_python_scripts

Scripts to run experiments and plot the results or otherwise scripts to create tables/figures from the paper
interpolation_figure.py: Creates Figure 1: Visualization of extrapolated polynomials
run_compare_estimators_experiment.py: used to create data for Figures 2, 8, 9, 10, and 13 (compare bias/variance or MSE of different estimators)
- requires there to be a file called compare_estimators.json in the appropriate real-world network folder (sub-folder Experiments) containing experiment info; for an example, see Amazon/Experiments/compare_estimators.json
- creates a file called compare_estimators.pkl in same directory as .json file; this is used for plotting
compare_estimators_plot.py plots the data to create Figures 2, 8, 9, 10, and 13
run_compare_clusterings_experiment.py: used to create data for Figures 4, 11, 12, and 14 (comparing 2-stage performance under different clusterings for real-world networks)
- requires there to be a file called compare_clusterings.json in the appropriate real-world network folder (sub-folder Experiments) containing experiment info; for an example, see Amazon/Experiments/compare_clusterings.json
- creates a file called compare_clusterings.pkl in same directory as .json file; this is used for plotting
compare_clusterings_plot.py plots the data to create Figures 4, 11, 12, and 14
lattice_clustering_metrics.py: generates clustering metrics for Lattice (e.g. for Table 1) as a file called cluster_metrics_lattice.txt
clustering_metrics_table.py: generates clustering metrics for real-world networks (e.g. for Table 2) as a file called cluster_metrics_real-world.txt
experiment_functions.py : helper functions for running experiments (e.g. creating Lattice network, generating potential outcomes, randomized designs)

Ugander-Yin Potential Outcomes Model

found in experiment_functions.py (see functions pom_ugander_yin, _outcomes, and homophily_effects)
for no homophily, set $b=0$ in pom_ugander_yin function
for some homophily, set $b$ in pom_ugander_yin function to the desired value, e.g. $b=0.5$

Ugander, J., & Yin, H. (2023). Randomized graph cluster randomization. Journal of Causal Inference, 11(1), 20220014.

Example: Create Figure 2

In Amazon/Experiments there should be a file called compare_estimators.json containing the following information:
- name: the name of the experiment
- network: the name of the network
- input: where to find the file containing network info
- vary: which parameters will we vary in the experiment and over which values?
- fix: wich parameters will we fix for the experiment and to what values should they be set?
- replications: how many times do you want to run the randomized control trial?
- gamma: parameter for the thresholded difference in means estimator
For example:
```
{ 
    "name" : "compare_estimators", 
    "network" : "Amazon",
    "input" : "Network/data.pkl",
    "vary" : {
        "beta" : [1,2,3]
    },
    "fix" : {
        "q" : 0.5,
        "nc" : 250
    },
    "replications" : 1000,
    "gamma" : 0.25
}
```
From the main directory (Supplementary_Material), run the Python file run_compare_estimators_experiment.py
- Scroll to the bottom of the file and make sure the variable my_path is set correctly; in this example, it should be "Amazon/Experiments/compare_estimators.json"
- Depending on your computing power, this may take some time (for me it took just under 2 hours)
- upon completion, should create a file called compare_estimators.pkl in Amazon/Experiments folder
To plot, navigate to the Python file compare_estimators_plot.py, scroll to the bottom and make sure to set the variables mse and network_name appropriately
- in this case, since we want to make Figure 2 which has bias/variance plots, set mse=False (setting to True would generate Figure 8 instead) and network_name="Amazon"
- running this file should create a figure and save it as compare_estimators_Amazon.png
- In draw_plots, depending on the parameters you used in the .json file to create the data and depending on what you want to plot, you can customize different things

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Required Packages

Code Demos

Folders and Files

Amazon

BlogCatalog

Email

experiment_python_scripts

Ugander-Yin Potential Outcomes Model

Example: Create Figure 2

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Amazon		Amazon
BlogCatalog		BlogCatalog
Email		Email
Lattice		Lattice
Paper_figures_tables		Paper_figures_tables
experiment_python_scripts		experiment_python_scripts
.gitignore		.gitignore
README.md		README.md
figures_and_tables.ipynb		figures_and_tables.ipynb
preparing_network_data.ipynb		preparing_network_data.ipynb
running_experiments.ipynb		running_experiments.ipynb

mayscortez/two-stage-rollout-2025

Folders and files

Latest commit

History

Repository files navigation

Required Packages

Code Demos

Folders and Files

Amazon

BlogCatalog

Email

experiment_python_scripts

Ugander-Yin Potential Outcomes Model

Example: Create Figure 2

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages