Supplementary Material for "Analysis of Two-Stage Rollout Designs with Clustering for Causal Inference under Network Interference"
joblib1.4.2pymetis2023.1.1numpy1.26.3pandas2.2.0matplotlib3.8.2networkx3.2.1scipy1.12.0seaborn0.13.2
- figures_and_tables.ipynb
- Reproduces the figures and tables from the paper
- preparing_network_data.ipynb
- Demo on preparing network data for running experiments
- running_experiments
- Demo on running experiments (use to gather data to create the plots/tables in the paper)
- Experiments: contains .json files with experiment parameters for Amazon network; also has data files (.pkl) generated in experiments
- Network: data files for the network, plus some .py files for preparing the network data; the main important file is
data.pklwhich is used in the experiment files and contains a representation of the Amazon network as well as some clusterings (created byprepare_data.py); the filedeg_hist.pygenerates a degree histrogram for the network. - Leskovec, J., Adamic, L. A., & Huberman, B. A. (2007). The dynamics of viral marketing. ACM Transactions on the Web (TWEB), 1(1), 5-es.
- Leskovec, J. and Krevl, A. (2014). SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data.
- Experiments: contains .json files with experiment parameters for BlogCatalog network; also has data files (.pkl) generated in experiments
- Network: data files for the network, plus some .py files for preparing the network data; the main important file is
data.pklwhich is used in the experiment files and contains a representation of the BlogCatalog network as well as some clusterings (created byprepare_data.py); the filedeg_hist.pygenerates a degree histrogram for the network. - Rossi, R., & Ahmed, N. (2015, March). The network data repository with interactive graph analytics and visualization. In Proceedings of the AAAI conference on artificial intelligence (Vol. 29, No. 1).
- Tang, L., & Liu, H. (2009, June). Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 817-826).
- Tang, L., & Liu, H. (2009, November). Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 1107-1116).
- Experiments: contains .json files with experiment parameters for Email network; also has data files (.pkl) generated in experiments
- Network: data files for the network, plus some .py files for preparing the network data; the main important file is
data.pklwhich is used in the experiment files and contains a representation of the Email network as well as some clusterings (created byprepare_data.py); the filedeg_hist.pygenerates a degree histrogram for the network. - Leskovec, J. and Krevl, A. (2014). SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data.
- Leskovec, J., Kleinberg, J., & Faloutsos, C. (2007). Graph evolution: Densification and shrinking diameters. ACM transactions on Knowledge Discovery from Data (TKDD), 1(1), 2-es.
- Yin, H., Benson, A. R., Leskovec, J., & Gleich, D. F. (2017, August). Local higher-order graph clustering. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 555-564).
-
Scripts to run experiments and plot the results or otherwise scripts to create tables/figures from the paper
-
interpolation_figure.py: Creates Figure 1: Visualization of extrapolated polynomials -
run_compare_estimators_experiment.py: used to create data for Figures 2, 8, 9, 10, and 13 (compare bias/variance or MSE of different estimators)- requires there to be a file called
compare_estimators.jsonin the appropriate real-world network folder (sub-folder Experiments) containing experiment info; for an example, seeAmazon/Experiments/compare_estimators.json - creates a file called
compare_estimators.pklin same directory as .json file; this is used for plotting
- requires there to be a file called
-
compare_estimators_plot.pyplots the data to create Figures 2, 8, 9, 10, and 13 -
run_compare_clusterings_experiment.py: used to create data for Figures 4, 11, 12, and 14 (comparing 2-stage performance under different clusterings for real-world networks)- requires there to be a file called
compare_clusterings.jsonin the appropriate real-world network folder (sub-folder Experiments) containing experiment info; for an example, seeAmazon/Experiments/compare_clusterings.json - creates a file called
compare_clusterings.pklin same directory as .json file; this is used for plotting
- requires there to be a file called
-
compare_clusterings_plot.pyplots the data to create Figures 4, 11, 12, and 14 -
lattice_clustering_metrics.py: generates clustering metrics for Lattice (e.g. for Table 1) as a file calledcluster_metrics_lattice.txt -
clustering_metrics_table.py: generates clustering metrics for real-world networks (e.g. for Table 2) as a file calledcluster_metrics_real-world.txt -
experiment_functions.py: helper functions for running experiments (e.g. creating Lattice network, generating potential outcomes, randomized designs)
- found in
experiment_functions.py(see functions pom_ugander_yin, _outcomes, and homophily_effects) - for no homophily, set
$b=0$ in pom_ugander_yin function - for some homophily, set
$b$ in pom_ugander_yin function to the desired value, e.g.$b=0.5$
Ugander, J., & Yin, H. (2023). Randomized graph cluster randomization. Journal of Causal Inference, 11(1), 20220014.
-
In
Amazon/Experimentsthere should be a file calledcompare_estimators.jsoncontaining the following information:- name: the name of the experiment
- network: the name of the network
- input: where to find the file containing network info
- vary: which parameters will we vary in the experiment and over which values?
- fix: wich parameters will we fix for the experiment and to what values should they be set?
- replications: how many times do you want to run the randomized control trial?
- gamma: parameter for the thresholded difference in means estimator
For example:
{ "name" : "compare_estimators", "network" : "Amazon", "input" : "Network/data.pkl", "vary" : { "beta" : [1,2,3] }, "fix" : { "q" : 0.5, "nc" : 250 }, "replications" : 1000, "gamma" : 0.25 } -
From the main directory (Supplementary_Material), run the Python file
run_compare_estimators_experiment.py- Scroll to the bottom of the file and make sure the variable
my_pathis set correctly; in this example, it should be "Amazon/Experiments/compare_estimators.json" - Depending on your computing power, this may take some time (for me it took just under 2 hours)
- upon completion, should create a file called
compare_estimators.pklin Amazon/Experiments folder
- Scroll to the bottom of the file and make sure the variable
-
To plot, navigate to the Python file
compare_estimators_plot.py, scroll to the bottom and make sure to set the variablesmseandnetwork_nameappropriately- in this case, since we want to make Figure 2 which has bias/variance plots, set
mse=False(setting to True would generate Figure 8 instead) andnetwork_name="Amazon" - running this file should create a figure and save it as
compare_estimators_Amazon.png - In
draw_plots, depending on the parameters you used in the .json file to create the data and depending on what you want to plot, you can customize different things
- in this case, since we want to make Figure 2 which has bias/variance plots, set