Skip to content

Supplementary Material for "Analysis of Two-Stage Rollout Designs with Clustering for Causal Inference under Network Interference" (2025)

Notifications You must be signed in to change notification settings

mayscortez/two-stage-rollout-2025

Repository files navigation

Supplementary Material for "Analysis of Two-Stage Rollout Designs with Clustering for Causal Inference under Network Interference"

Required Packages

  • joblib 1.4.2
  • pymetis 2023.1.1
  • numpy 1.26.3
  • pandas 2.2.0
  • matplotlib 3.8.2
  • networkx 3.2.1
  • scipy 1.12.0
  • seaborn 0.13.2

Code Demos

  • figures_and_tables.ipynb
    • Reproduces the figures and tables from the paper
  • preparing_network_data.ipynb
    • Demo on preparing network data for running experiments
  • running_experiments
    • Demo on running experiments (use to gather data to create the plots/tables in the paper)

Folders and Files

Amazon

  • Experiments: contains .json files with experiment parameters for Amazon network; also has data files (.pkl) generated in experiments
  • Network: data files for the network, plus some .py files for preparing the network data; the main important file is data.pkl which is used in the experiment files and contains a representation of the Amazon network as well as some clusterings (created by prepare_data.py); the file deg_hist.py generates a degree histrogram for the network.
  • Leskovec, J., Adamic, L. A., & Huberman, B. A. (2007). The dynamics of viral marketing. ACM Transactions on the Web (TWEB), 1(1), 5-es.
  • Leskovec, J. and Krevl, A. (2014). SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data.

BlogCatalog

  • Experiments: contains .json files with experiment parameters for BlogCatalog network; also has data files (.pkl) generated in experiments
  • Network: data files for the network, plus some .py files for preparing the network data; the main important file is data.pkl which is used in the experiment files and contains a representation of the BlogCatalog network as well as some clusterings (created by prepare_data.py); the file deg_hist.py generates a degree histrogram for the network.
  • Rossi, R., & Ahmed, N. (2015, March). The network data repository with interactive graph analytics and visualization. In Proceedings of the AAAI conference on artificial intelligence (Vol. 29, No. 1).
  • Tang, L., & Liu, H. (2009, June). Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 817-826).
  • Tang, L., & Liu, H. (2009, November). Scalable learning of collective behavior based on sparse social dimensions. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 1107-1116).

Email

  • Experiments: contains .json files with experiment parameters for Email network; also has data files (.pkl) generated in experiments
  • Network: data files for the network, plus some .py files for preparing the network data; the main important file is data.pkl which is used in the experiment files and contains a representation of the Email network as well as some clusterings (created by prepare_data.py); the file deg_hist.py generates a degree histrogram for the network.
  • Leskovec, J. and Krevl, A. (2014). SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data.
  • Leskovec, J., Kleinberg, J., & Faloutsos, C. (2007). Graph evolution: Densification and shrinking diameters. ACM transactions on Knowledge Discovery from Data (TKDD), 1(1), 2-es.
  • Yin, H., Benson, A. R., Leskovec, J., & Gleich, D. F. (2017, August). Local higher-order graph clustering. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 555-564).

experiment_python_scripts

  • Scripts to run experiments and plot the results or otherwise scripts to create tables/figures from the paper

  • interpolation_figure.py: Creates Figure 1: Visualization of extrapolated polynomials

  • run_compare_estimators_experiment.py: used to create data for Figures 2, 8, 9, 10, and 13 (compare bias/variance or MSE of different estimators)

    • requires there to be a file called compare_estimators.json in the appropriate real-world network folder (sub-folder Experiments) containing experiment info; for an example, see Amazon/Experiments/compare_estimators.json
    • creates a file called compare_estimators.pkl in same directory as .json file; this is used for plotting
  • compare_estimators_plot.py plots the data to create Figures 2, 8, 9, 10, and 13

  • run_compare_clusterings_experiment.py: used to create data for Figures 4, 11, 12, and 14 (comparing 2-stage performance under different clusterings for real-world networks)

    • requires there to be a file called compare_clusterings.json in the appropriate real-world network folder (sub-folder Experiments) containing experiment info; for an example, see Amazon/Experiments/compare_clusterings.json
    • creates a file called compare_clusterings.pkl in same directory as .json file; this is used for plotting
  • compare_clusterings_plot.py plots the data to create Figures 4, 11, 12, and 14

  • lattice_clustering_metrics.py: generates clustering metrics for Lattice (e.g. for Table 1) as a file called cluster_metrics_lattice.txt

  • clustering_metrics_table.py: generates clustering metrics for real-world networks (e.g. for Table 2) as a file called cluster_metrics_real-world.txt

  • experiment_functions.py : helper functions for running experiments (e.g. creating Lattice network, generating potential outcomes, randomized designs)

Ugander-Yin Potential Outcomes Model

  • found in experiment_functions.py (see functions pom_ugander_yin, _outcomes, and homophily_effects)
  • for no homophily, set $b=0$ in pom_ugander_yin function
  • for some homophily, set $b$ in pom_ugander_yin function to the desired value, e.g. $b=0.5$

Ugander, J., & Yin, H. (2023). Randomized graph cluster randomization. Journal of Causal Inference, 11(1), 20220014.

Example: Create Figure 2

  • In Amazon/Experiments there should be a file called compare_estimators.json containing the following information:

    • name: the name of the experiment
    • network: the name of the network
    • input: where to find the file containing network info
    • vary: which parameters will we vary in the experiment and over which values?
    • fix: wich parameters will we fix for the experiment and to what values should they be set?
    • replications: how many times do you want to run the randomized control trial?
    • gamma: parameter for the thresholded difference in means estimator

    For example:

    { 
        "name" : "compare_estimators", 
        "network" : "Amazon",
        "input" : "Network/data.pkl",
        "vary" : {
            "beta" : [1,2,3]
        },
        "fix" : {
            "q" : 0.5,
            "nc" : 250
        },
        "replications" : 1000,
        "gamma" : 0.25
    }
    
  • From the main directory (Supplementary_Material), run the Python file run_compare_estimators_experiment.py

    • Scroll to the bottom of the file and make sure the variable my_path is set correctly; in this example, it should be "Amazon/Experiments/compare_estimators.json"
    • Depending on your computing power, this may take some time (for me it took just under 2 hours)
    • upon completion, should create a file called compare_estimators.pkl in Amazon/Experiments folder
  • To plot, navigate to the Python file compare_estimators_plot.py, scroll to the bottom and make sure to set the variables mse and network_name appropriately

    • in this case, since we want to make Figure 2 which has bias/variance plots, set mse=False (setting to True would generate Figure 8 instead) and network_name="Amazon"
    • running this file should create a figure and save it as compare_estimators_Amazon.png
    • In draw_plots, depending on the parameters you used in the .json file to create the data and depending on what you want to plot, you can customize different things

About

Supplementary Material for "Analysis of Two-Stage Rollout Designs with Clustering for Causal Inference under Network Interference" (2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published