🔬 Program Tracing

✨ Table of Contents

🔬 Program Tracing

🏁 Introduction

If you are a student completing this project as part of a class at Allegheny College, you can check the schedule on the course web site for the due date or ask the course instructor for more information about the due date. You can also find the deadline for the project, as reported by GitHub Classroom, by clicking the grey box at the top of this file. Please furthermore note that the content provided in the README.md file for this GitHub repository is an overview of the project and thus may not include the details for every step needed to successfully complete every project deliverable. This means that you may need to schedule a meeting during the course instructor's office hours to discuss aspects of this project.

🤝 Seeking Assistance

Even though the course instructor will have covered all of the concepts central to this project before you start to work on it, please note that not every detail needed to successfully complete the assignment will have been covered during prior classroom sessions. This is by design as an important skill that you must practice as security engineer is to search for and then understand and ultimately apply the technical content found in additional resources.

🛫 Project Overview

This project invites you to implement and use a program called programtracer that can produce a detailed trace of a program's execution in the service of automated malware analysis. As explained by CrowdStrike in the article entitled 10 Malware Detection Techniques, there a several different, yet often complementary, techniques for detecting malware of a system. This project invites you to explore more about dynamic malware analysis where you will write a programtracer that will observe the execution of a Python program and then record a program trace that a malware analyst could study so as to better understand its behavior and to perhaps extract a behavior signature that could be used to detect it in the future. The programtracer should be able to perform a rudimentary analysis of a save trace and compare two different traces that it previously produced.

🏁 Getting Started

After cloning this repository to your computer, please take the following steps to get started on the project:

Make sure that you are using a recent version of Python 3.12 to complete this assignment by typing python --version in your terminal; if you are not using a recent version of Python please upgrade before proceeding.
Make sure that you are using a recent version of Poetry 1.8 to complete this assignment by typing poetry --version in your terminal; if you are not using a recent version of Poetry please upgrade before proceeding.
Before moving to the next step, you may need to again type poetry install one or more times in order to avoid the appearance of warnings when you next run the programtracer program.

Please note that you are invited to complete all of the background research, implementation, and experimentation needed to implement and use the programtracer, as outlined further in the following subsections.

📚 Background Research

Your programtracer will take as input a Python program and/or a Python program's test suite, and then produce a detailed trace of the program's behavior. The trace should record all of the details about the specific instructions that were run during the execution of the Python program. To learn more about program tracing, please consult the following references organized into the following technical categories:

Concepts: Introduction to the technical concepts of program analysis and dynamic malware analysis.
Packages: Built-in packages for program tracing in Python
Tools: Tools for program tracing in Python

🚀 Implementation Details

After reading all of the background research and exploring the references that the prior section provides, you should pick a small Python program and Pytest test suite (perhaps even one that you wrote yourself) and attempt run it and then produce a trace of each line of source code that the test suite ran in the Python program. The trace that your tool produces should include all of the executed instructions at the level of the abstract syntax tree (AST), the Python source code, and/or the native code produced by the Python interpreter. Whenever possible, the trace should also include the values of variables that were accessed by each of the detected instructions. Finally, the trace should be stored in a file in either a plaintext, comma-separated value (CSV), or JavaScript object notation (JSON) format.

Once you have an implementation that is working for a small Python project, you should create a complete implementation of the programtracer project, using the main.py file to implement the command-line interface (CLI) for the program. As you add features to your tool you should confirm that it works for progressively larger Python programs and test suites. You should then implement the following features into your programtracer tool:

Command-Line Interface: The programtracer should have a command-line that accepts the name of a Python program and/or a Python program's test suite and then performs the program tracing when the tests run on the program.
Program Tracing: The programtracer should trace the execution of a Python program and save the trace in a suitable format in a specified directory and file.
Variable Tracking: The programtracer should track the values of variables as they are referenced by the specific instructions in the program's source code.
Trace Analysis: The programtracer should be able to analyze the trace by reporting information about, for instance, the number of instructions in the trace, the number of times each instruction was executed, the number of times a variable is accessed by instructions, and the number of unique values stored in the variables accessed by the instructions.
Trace Comparison: The programtracer should be able to compare two traces and surface the similarities and differences between the them. This feature would be useful in the context of malware analysis to compare the behavior of a new program to a well-known malware program.
Efficiency Analysis: The programtracer should offer at least two efficiency analysis features that involve measuring the performance of tasks such as creating the trace, saving the trace, analyzing one or more traces, or the size of the traces when either stored in memory or on disk.

You should aim to fully implement all of these features as a part of your programtracer tool. If you are not able to implement a specific feature, then you must both document the steps that you took and explain why it was not possible to fully implement a featured in writing/reflection.md file.

🎉 Experiment Details

To evaluate the programtracer tool, you should conduct an experiment that (loosely) follows the following steps:

Select a Python Program and Test Suite: Choose at least five small- to medium-sized Python program and their corresponding Pytest test suites. Make sure that these are all programs that you did not implement yourself. Aim to strike a balance between programs that are realistic and programs that are small enough that you can feasibly analyze and understand their traces.
Run the programtracer Tool: Execute the programtracer tool on each of the selected Python programs and its test suite. Ensure that the tool generates a trace file in the specified format (i.e., plaintext, CSV, or JSON).
Verify the Trace Output: For each selected program and its test suite, manually inspect the majority of the trace file to verify that it accurately records the program's execution. Check that the trace includes details such as executed instructions, variable values, and any other relevant runtime information.
Analyze the Trace: For each selected program and its test suite, use the programtracer tool's analysis features to gather information about the trace. This includes:
- The number of instructions in the trace.
- The number of times each instruction was executed.
- The number of times variables were accessed by instructions.
- The number of unique values stored in the variables accessed by the instructions.
Compare Traces: After making a change to the source code of each Python program, run the programtracer tool on it. Manually compare the traces that arise from this modified program and the original to identify the similarities and differences in their execution behavior. You could imagine that this is the step that a malware analyst would take to (a) compare the behavior of a new program to a well-known malware program or (b) compare the behavior of a program before it was infected with malware to after it was infected.
Efficiency Analysis: For each selected program and its test suite, time the execution of the programtracer tool when it is completing tasks such as creating the trace, saving the trace, and analyzing the trace. Record the size of the trace files when stored in memory and on disk.
Collect Data: Collect all relevant data from the analysis and efficiency measurements. Ensure that the data is well-organized and clearly labeled and add it to the writing/reflection.md file.
Report Results: Summarize the findings from the experiment in a report. The report in the writing/reflection.md file should include:
- An overview of the selected Python program and test suite.
- A description of the trace output and its verification.
- Results from the trace analysis, including any notable patterns or insights.
- A comparison of different traces, highlighting key differences.
- Efficiency analysis results, including performance metrics and trace file sizes.
- Any challenges encountered during the experiment and how they were addressed.

✨ Additional Information

If you have already installed the GatorGrade program that runs the automated grading checks provided by GatorGrader you can, from the repository's base directory, run the automated grading checks by typing gatorgrade --config config/gatorgrade.yml.
You may also review the output from running GatorGrader in GitHub Actions.
Don't forget to provide all of the required responses to the technical writing prompts in the writing/reflection.md file.
Please make sure that you completely delete the TODO markers and their labels from all of the provided source code. This means that instead of only deleting the TODO marker from the code you should delete the TODO marker and the entire prompt and then add your own comments to demonstrate that you understand all of the source code in this project.
Please make sure that you also completely delete the TODO markers and their labels from every line of the writing/reflection.md file. This means that you should not simply delete the TODO marker but instead delete the entire prompt so that your reflection is a document that contains polished technical writing that is suitable for publication on your professional web site.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github/workflows		.github/workflows
config		config
programtracer		programtracer
writing		writing
.gitignore		.gitignore
.mdlrc		.mdlrc
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔬 Program Tracing

✨ Table of Contents

🏁 Introduction

🤝 Seeking Assistance

🛫 Project Overview

🏁 Getting Started

📚 Background Research

🚀 Implementation Details

🎉 Experiment Details

✨ Additional Information

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔬 Program Tracing

✨ Table of Contents

🏁 Introduction

🤝 Seeking Assistance

🛫 Project Overview

🏁 Getting Started

📚 Background Research

🚀 Implementation Details

🎉 Experiment Details

✨ Additional Information

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages