Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Bioinformatics with Python Cookbook

You're reading from   Bioinformatics with Python Cookbook Solve advanced computational biology problems and build production pipelines with Python and AI tools

Arrow left icon
Product type Paperback
Published in Dec 2025
Publisher Packt
ISBN-13 9781836642756
Length 618 pages
Edition 4th Edition
Languages
Tools
Arrow right icon
Author (1):
Arrow left icon
Shane Brubaker Shane Brubaker
Author Profile Icon Shane Brubaker
Shane Brubaker
Arrow right icon
View More author details
Toc

Table of Contents (22) Chapters Close

Preface 1. Chapter 1: Computer Specifications and Python Setup 2. Chapter 2: Basics of Data Manipulation FREE CHAPTER 3. Chapter 3: Modern Coding Practices and AI-Generated Coding 4. Chapter 4: Data Science and Graphing 5. Chapter 5: Alignment and Variant Calling 6. Chapter 6: Annotation and Biological Interpretation 7. Chapter 7: Genomes and Genome Assembly 8. Chapter 8: Accessing Public Databases 9. Chapter 9: Protein Structure and Proteomics 10. Chapter 10: Phylogenetics 11. Chapter 11: Population Genetics 12. Chapter 12: Metabolic Modeling and Other Applications 13. Chapter 13: Genome Editing 14. Chapter 14: Cloud Basics 15. Chapter 15: Workflow Systems 16. Chapter 16: More Workflow Systems 17. Chapter 17: Deep Learning and LLMs for Nucleic Acid and Protein Design 18. Chapter 18: Single-Cell Technology and Imaging 19. Chapter 19: Unlock Your Exclusive Benefits 20. Index 21. Other Books You May Enjoy

Installing the required basic software with Anaconda

Next, we will begin setting up your required software libraries, including Python itself. If you are already using a different Python distribution, you are strongly encouraged to consider Anaconda, as it has become the de facto standard for data science and bioinformatics. Also, it is the distribution that will allow you to install software from bioconda (https://bioconda.github.io/).

Getting ready

Python can be run on top of different environments. For instance, you can use Python inside the Java Virtual Machine (JVM) (via Jython or with .NET via IronPython). However, here, we are not only concerned with Python but also with the complete software ecology around it. Therefore, we will use the standard (CPython) implementation, since the JVM and .NET versions exist mostly to interact with the native libraries of these platforms.

For our code, we will be using Python 3.12. If you were starting with Python and bioinformatics, any operating system would work. But here, we are mostly concerned with intermediate to advanced usage, and so we will focus on macOS.

If you are on Windows and do not have easy access to macOS or Linux, don’t worry. Modern virtualization software (such as VirtualBox and Docker) will come to your rescue, which will allow you to install a virtual OS on your operating system. Another option is to use Windows Subsystem for Linux (WSL2), which allows you to run Linux on Windows. For documentation on WSL2, look here:

Another option for you will be to use a cloud workstation (see the Technical requirements section).

Bioinformatics and data science are moving at breakneck speed; this is not just hype, it’s a reality. When installing software libraries, choosing a version might be tricky. Depending on the code that you have, it might not work with some old versions or perhaps not even work with a newer version. Hopefully, any code that you use will indicate the correct dependencies – though this is not guaranteed. In this book, we will fix the precise versions of all software packages, (or provide you with a minimal version, or specify one in the associated chapter YAML file as appropriate. Check your chapter’s README.md file or the Updates section of each notebook for more information.) and we will make sure that the code will work with them. It is quite natural that the code might need tweaking with other package versions.

The software developed for this book is available at https://github.com/PacktPublishing/Bioinformatics-with-Python-Cookbook-fourth-edition. To access it, you will need to install Git. First, make sure HomeBrew is installed (https://brew.sh/):

brew install git

You can go to the GitHub page for the book and get the HTTPS link for downloading the source:

git clone https://github.com/PacktPublishing/Bioinformatics-with-Python-Cookbook-Fourth-Edition.git

This will download the source code to your computer.

Before you install the Python stack, you will need to install all of the external non-Python software that you will be interoperating with. The list will vary from chapter to chapter, and all chapter-specific packages will be explained in their respective chapters. Most of the software is available via bioconda (https://bioconda.github.io/) (also called conda for short) or is pip installable (https://pypi.org/project/pip/).

Where possible in this book, we will allow you to do everything from your Jupyter notebook, even installing the software. To do this, we will use the ! command, which allows you to run a command that you would normally run from your Terminal from the notebook instead – for example:

! ls

This will run the ls or list directory command as if it had been run from the Terminal.

In some cases, for more involved installations, you will need to go into the Terminal, but we’ll advise you on how to do those steps as we go through the relevant recipes.

You will need to install some development compilers and libraries, all of which are free. On Ubuntu, consider installing the build-essential package (apt-get install build-essential), and on macOS, consider Xcode (https://developer.apple.com/xcode/).

We will mention many amazing Python libraries in this book, but here is a brief overview of some of the most important ones:

Name

Application

URL

Purpose

Biopython

All chapters

https://biopython.org/

Bioinformatics library

Biotite

Protein Design

https://www.biotite-python.org/latest/index.html

MultiTool and Protein Structure

Cython

Big data

http://cython.org/

High performance

Dask

Big data

http://dask.pydata.org

Parallel processing

DendroPY

Phylogenetics

https://dendropy.org/

Phylogenetics

HTSeq

NGS/Genomes

https://htseq.readthedocs.io

NGS processing

jupytext

Notebook conversion

https://jupytext.readthedocs.io/en/latest/

Convert your notebook to Python text

Keras

Deep Learning

https://keras.io/

Higher-level library for ML

Matplotlib

Visualization

https://matplotlib.org/

Graphing library

NumPy

All chapters

http://www.numpy.org/

Array/matrix processing

Numba

Big data

https://numba.pydata.org/

High performance

Project Jupyter

All chapters

https://jupyter.org/

Interactive computing

PyMol

Proteomics

https://pymol.org

Molecular visualization

PyVCF

NGS

https://pyvcf.readthedocs.io

VCF processing

Pysam

NGS

https://github.com/pysam-developers/pysam

SAM/BAM processing

SciPy

All chapters

https://www.scipy.org/

Scientific computing

TensorFlow

Machine learning

https://www.tensorflow.org/

Machine learning library

pandas

All chapters

https://pandas.pydata.org/

Data processing

scikit-learn

Machine learning

https://scikit-learn.org

Machine learning library

seaborn

All chapters

https://seaborn.pydata.org/

Statistical chart library

Table 1.1 – Major Python packages that are useful in bioinformatics

We will use pandas to process most table data.

How to do it...

To get started, take a look at the following steps:

  1. Start by downloading the Anaconda distribution from https://www.anaconda.com/products/individual. We will be using version 2024.06, although you will probably be fine with the most recent one. You can accept all the installation’s default settings, but you might want to make sure that the conda binaries are in your path (do not forget to open a new window so that the path can be updated).
  2. As an alternative to downloading from the website, you can use this command:
    curl -O https://repo.anaconda.com/archive/Anaconda3-2024.06-1-MacOSX-x86_64.sh
  3. If you have another Python distribution, be careful with PYTHONPATH and existing Python libraries. It’s probably better to unset PYTHONPATH. As much as possible, uninstall all other Python versions and installed Python libraries. These steps will help reduce future confusion about which installation of Python you are pointing to.
  4. Let’s go ahead with the libraries. We will now create a new conda environment called bioinformatics_base with biopython=1.84, as shown in the following command (type it in your Terminal):
    conda create -n bioinformatics_base python=3.12
  5. Let’s activate the environment, as follows:
    conda activate bioinformatics_base
  6. Let’s add the bioconda and conda-forge channels to our source list:
    conda config --add channels bioconda
    conda config --add channels conda-forge

    Note: Conda channels are remote hosting locations that store common packages we may need.

  7. Also, install the basic packages:
    ! conda install biopython==1.84 jupyterlab==4.3.0 matplotlib==3.9.2 numpy==2.1.0 pandas==2.2.3 scipy==1.14.1

    As an alternative to the above, you can also set up your conda environment using a file that specifies the packages needed. It is provided as bioinformatics_base.yml. It is a YAML file, which stands for "YAML Ain't Markup Language" (https://yaml.org/ To use the file run this command:

    conda env create –f ~/work/CookBook/Ch01/bioinformatics_base.yml

    This will install the required packages for you.

Tip

We often install the latest version of the package by just typing something like conda install biopython, but in this book, we will often do something called “pinning the version.” This means we write an explicit version to help with the reproducibility of the code. We won’t pin the version in every example throughout the book. In most cases, your code should work fine with the latest version. However, we’ll include version pinning where it’s necessary. If any version-specific issues arise in the future, notes will be added to the README.md file for each chapter and in the Updates section of the corresponding notebook.

  1. Now, let’s save our environment so that we can reuse it later to create new environments in other machines or if you need to clean up the base environment:
    conda list –e > bioinformatics_base.txt

Tip

On the left side of your Terminal, you will see what Anaconda environment you are in so you can always tell where you are at. For instance, right now, it should say (bioinformatics_base).

One thing that can be confusing is that using the python -V command in this environment could show an older version. This is because Python 3 is referred to via the python3 command. To fix this, you want to alias the Python command. Typically, it is easiest to put this in your shell file, which is a file that is always run when you open a Terminal window. In Linux, it was .bashrc, but on macOS, you will use the .zshrc file (often pronounced z-shark).

Solution: Open your ~/.zshrc file in a text editor

Add the following line to the end of the file:

alias python=python3

Now save it.

To run it, you can type source ~/.zshrc.

Now, when you run python -V or python --version, you should see that it is 3.12. If you are in a notebook and want to double-check your version, you can run ! python -V in a cell.

There’s more...

If you prefer not to use Anaconda, you will be able to install many of the Python libraries via pip using whatever distribution you choose. You can go through the book and keep installing packages in bioinformatics_base if you want. But you may, at times, find that you want to create an environment specific to a particular chapter to help isolate any complexity in package installations. Let’s look at how to do that real quick:

For example, imagine you want to create an environment for machine learning with scikit-learn. You can do the following:

  1. First, we need to deactivate our current environment:
    conda deactivate
  2. Create a clone of the original environment with the following:
    conda create -n scikit-learn --clone bioinformatics_base
  3. Add scikit-learn:
    conda activate scikit-learn
    conda install scikit-learn

See Also

lock icon The rest of the chapter is locked
Visually different images
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Bioinformatics with Python Cookbook
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Modal Close icon
Modal Close icon