介绍
不同物种的多组学数据量正以前所未有的速度增长,每天都有新的基因组组装、相关注释以及高通量测序资源被提交到各种基因组数据存储库中。面对这种数据洪流,现有的和新的数据库都在建立优化的层次结构来管理海量信息。然而,由于缺乏易于使用的命令行工具,再加上现有选项的功能限制和设计上的不直观性,给研究人员带来了巨大的挑战。这一差距凸显了对一种能够实现这些不同存储库中多组学数据的高效检索和整合的工具的迫切需求。
我们开发了 Gencube 这款命令行工具,它能够集中检索和整合来自多个主流数据库的六种不同数据类型,包括基因组组装、基因集、注释、序列、比较基因组数据以及基于下一代测序技术的组学资源。
The volume of multi-omics data for diverse species is growing at an unprecedented rate, with new genome assemblies, related annotations, and high-throughput sequencing resources being submitted daily to various genomic data repositories. In response to this data influx, both existing and new databases are establishing optimized hierarchical structures to manage the vast amount of information. However, the lack of accessible command-line tools, combined with the functional limitations and unintuitive design of existing options, presents significant challenges for researchers. This gap underscores a critical need for a tool that enables streamlined retrieval and integration of omics data across these diverse repositories.
We have developed Gencube, a command-line tool that enables centralized retrieval and integration of a comprehensive set of six different data types—genome assemblies, gene sets, annotations, sequences, comparative genomic data, and NGS-based omics resources—from various leading databases.
代码
https://github.com/snu-cdrc/gencube/tree/main
$ gencube
usage: gencube [-h] {genome,geneset,annotation,sequence,crossgenome,seqmeta,info} ...
gencube v1.0.0
positional arguments:
{genome,geneset,annotation,sequence,crossgenome,seqmeta,info}
genome Search, download, and modify chromosome labels for genome assemblies
geneset Search, download, and modify chromosome labels for genesets (gene annotations)
annotation Search, download, and modify chromosome labels for various genome annotations, such as gaps and repeats
sequence Search and download sequence data of genesets
crossgenome Search and download comparative genomics data, such as homology, and codon or protein alignments
seqmeta Search, retrive, and integrate metadata of experimental sequencing data
info Resubmit email and NCBI API key for use with NCBI's Entrez Utilities (E-Utilities)
options:
-h, --help show this help message and exit
参考
- Gencube: Centralized retrieval and integration of multi-omics resources from leading databases
- https://github.com/snu-cdrc/gencube/tree/main