0% found this document useful (0 votes)

149 views16 pages

Genome Data Mining: One Linkage Score Per DNA Letter.

There is only one way to preprocess big Genome Data for AI Tools to work on: Assigning a single linkage score for ever single DNA letter (out of 3 billion) and disease term combination. Resulting database of hundreds of millions hot spots will form the backbone of AI approaches. The described algoritm solves the problems of Probe Specificity and DNA Redundancy and is the only way to get a single linkage score for every DNA letter. Basically, marker based DNA data mining is fine tuned.

Uploaded by

Korkut Vata

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

149 views16 pages

Genome Data Mining: One Linkage Score Per DNA Letter.

Uploaded by

Korkut Vata

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Genome-Data Mining

Basis:

Getting a statistical score for

every DNA position and for
every disease term to build a
keyword normalized database
for Artificial Intelligence to
work on.
Database Structure
n1 = ATGC---------------------------tt / Atherosclerosis, HER2-Positive Breast
Cancer,

n2 = ATGC---------------------------tt / Hyperlipidemia, Anemia, ……

n3 = ATGC--------------------------tc / Glioma, polydacly..

. .

n= 1.000.000
----------------------------------------------------------.
Methodology

cttcacaagt(CATG)cgcgtcgtt

•
Search: A - 345.000
T - 250.000
0.007
0.006
seconds
seconds
G - 255.000 0.011 seconds
C - 350.000 0.009 seconds
Solution for:

Probe Specificity and DNA Redundancy

Probe Specificity:
cttcacaagt(CATG)cgcgtcgtt
A – 245.000
T – 175.000
G- 252.000
C- 218.000
DNA Redundancy:
+ 890.000 out of 1 million
If the sum (ATGC) > 1 million

then; Probe is specific for 890.000 samples

extend the probe on 5', 3' Sample size is 890.000 for the
particular probe and particular
untill; database

the sum (ATGC) < 1 million Correction factor %88

Solution for:

Probe Specificity and DNA Redundancy

Probe Specificity:
cttcacaagt(CATG)cgcgtcgtt
A – 245.000
T – 175.000
G- 252.000
C- 218.000
DNA Redundancy:
+ 890.000 out of 1 million
If the sum (ATGC) > 1 million

then; Probe is specific for 890.000 samples

extend the probe on 5', 3' Sample size is 890.000 for the
particular probe and particular
untill; database

the sum (ATGC) < 1 million Correction factor %88

Statistics for Every DNA Letter
•
--------2.345.789.156

•
1 cctggagcac ggaagattct t gcggacacaaatcgcaact gctaaataaa atttatttat

•
61 ttgagtgcac agccatgagt cttcacaagt(CATG)cgcgtcgtt atgcttgact tttaaccaaa

•
121 acacttcgat tgtttcgcgt agcaatagtc gcacaatttt tgaagctttc aaggagttcc

•
181 tggatttttg ggatatcggc aacgaagttt ctgcagagtc agcagttcgg gtctccagca

•
241 acggagcttt caacttgccg cagagttttg gcaacgaatc caacgaatat gcccacctgg

•
301 ctacgcctgt ggatccagcc tacggaggca acaacacgaa caacatgatg cagttcacga

•
361 acaatctgga aattttggcc aacaataatt ccgatggcaa taacaaaatt aatgcatgca

•
421 acaaattcgt ctgccacaag ggcactgatt ccgaggatga ctccacggag gtcgatatca

•
481 aggaggatat tccgaaaacg gtggaggtat cgggatcgga attgaccacg gaacccatgg

•
541 ccttcttgca gggattaaac tccgggaatc tgatgcagtt cagccagcaa tccgtgctgc

•
601 gcgaaatgat gctgcaggac attcagatcc aggcgaacac gctgcccaag ctagagaatc

•
2.346.478 ----------

•
DNA position 2,345.789.156 Position Identifier: “cttcacaagt(CATG)cgcgtcgtt”
Disease / Normal
Disease / Normal
A- 165.000 C- 143.000
T– 225.000 T – 281.000
G- 255.000 G- 263.000
Delta Change 25% > Treshold
C- 365.000 C - 382.000

After normalization with respect to disease term frequencies.

HotSpot Card:
For HER2 Positive Breast Cancer
Probe: cttcacaagt(CATG)cgcgtcgt,
DNA position # 2.345.789.156
Score : 27% Decrease in T content
Genome-DataMining

DNA position 2,345.789.91

DNA position 2,345.799.913
DNA position 2,945.534.915
DNA position 1,345.789.91
DNA position 2.128.867.985
HER2-Positive Breast Cancer
DNA position 1,345.789.913 A 2x increased risk

DNA position 2,345.799.913 C 3 x increased risk

DNA position 945.534.715 BRCA1(known) T 5 x increased

DNA position 1,345.789.91 T 1,5 increased risk

DNA position 2.128.867.985 G 3X decreased risk

•
N = 17.546 data points for
HER-2 Positive Breast Cancer
Risk Factors for HER-2 Positive Breast Cancer
•
Upload your genome data, search for:
•
“Her-2 positive Breast Cancer”
Search: “Her-2 positive Breast Cancer”

•
17.546 DNA letters are checked for 2 copies of
genome (maternal, fathernal) and combined
risk factor is displayed.
•
2.3 times higher risk.
Extent
For All Disease Terms and Disease Variants….

Problems Solved:
•
Repeat regions are recognized and correct positions within
multiple regions are identified.
Correction factors for database size are determined.
Result
•
A database for every disease term.

•
Enter = “HER2 Positive Breast Cancer”

•
Result = 8 x increased risk for individual XYZ

Based on 17.546 Hotspots on XYZ's Genome

ARTIFICIAL INTELLIGENCE, AI WORKING BASE

HotSpot Card:
For HER2 Positive Breast Cancer
Probe: cttcacaagt(CATG)cgcgtcgt,
DNA position # 2.345.789.156
Score : 27% Decrease in T content in Disease

~ 15.000 Disease Terms

~ 20.000 Hotspots / Disease Terms

~300.000.000 Data Points Above Treshold Values

Structural Mapping: promoter, exon, intron, -5', -3'
Pathway Mapping
Metabolismal Mapping
Phenotype Mapping
Literature Mapping
Restructuring Healthcare Globally

Medical Education will be reframed with a weight towards Data- Mining.

Surgery will be the main medical profession.

The 100$ Whole Genome Test will provide the best genetic and clinical diagnosis
for the next century.

All human characteristics will be known including emotional status and

tendencies to certain behaviours.

.
The global healthcare system will be runned by an international consortium.
LIMITS ?
No Limit
But

Big-Pharma Politics

International Collaboration
Against
Big-Pharma Politics
By

Korkut Vata
Scientist

Tolunay Gümüş
Actor

A Rapid, Shallow Whole Genome Sequencing Workflow Applicable To Limiting Amounts of Cell-Free DNA
No ratings yet
A Rapid, Shallow Whole Genome Sequencing Workflow Applicable To Limiting Amounts of Cell-Free DNA
9 pages
Molecular Diagnostics
No ratings yet
Molecular Diagnostics
19 pages
Animal Genomics And: Methods For Genotype Detection
No ratings yet
Animal Genomics And: Methods For Genotype Detection
52 pages
Dna Test Report - Medgenome Labs: Kruthika Biswakarma 573670/7822728
No ratings yet
Dna Test Report - Medgenome Labs: Kruthika Biswakarma 573670/7822728
6 pages
Hong Kong Genome Project - Pilot Study Published
No ratings yet
Hong Kong Genome Project - Pilot Study Published
16 pages
Mmc2 - So Sánh Legacy Và Harmonized Dữ Liệu Trên GDC - Viết Tổng Quan
No ratings yet
Mmc2 - So Sánh Legacy Và Harmonized Dữ Liệu Trên GDC - Viết Tổng Quan
29 pages
2023 450 Moesm1 Esm
No ratings yet
2023 450 Moesm1 Esm
8 pages
Precision NIPT for Genetic Disorders
No ratings yet
Precision NIPT for Genetic Disorders
14 pages
Bioanalytic
No ratings yet
Bioanalytic
592 pages
Identification of Disease Genes
No ratings yet
Identification of Disease Genes
24 pages
Understanding The Human Genome
No ratings yet
Understanding The Human Genome
15 pages
Genomics Lectures 15 To 16-2023
No ratings yet
Genomics Lectures 15 To 16-2023
19 pages
7 Molecular Diagnostics Lec II
No ratings yet
7 Molecular Diagnostics Lec II
35 pages
Hereditary Cancer Gene Panel 190genes
No ratings yet
Hereditary Cancer Gene Panel 190genes
11 pages
Impact of The Human Genome Project On Medical-Practice
No ratings yet
Impact of The Human Genome Project On Medical-Practice
12 pages
UCSC Genome Browser
No ratings yet
UCSC Genome Browser
424 pages
Clinical Genomics in Medicine
No ratings yet
Clinical Genomics in Medicine
2 pages
Cancer Cytogenetics
No ratings yet
Cancer Cytogenetics
70 pages
Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras
No ratings yet
Project O: Breast Cancer Gene Analysis Using R: Sheena Scroggins, Susan Mcgowan, John Caras
25 pages
Genetic Testing & Molecular Diagnostics
No ratings yet
Genetic Testing & Molecular Diagnostics
18 pages
Introduction To Gene Expression Analysis
No ratings yet
Introduction To Gene Expression Analysis
43 pages
Molecular Diagnosis of Genetic Diseases
No ratings yet
Molecular Diagnosis of Genetic Diseases
5 pages
2007-FEBS Letters - 2007 - Cheung - CPG CPNPG Motifs in The Coding Region Are Preferred Sites For Mutagenesis in The Breast
No ratings yet
2007-FEBS Letters - 2007 - Cheung - CPG CPNPG Motifs in The Coding Region Are Preferred Sites For Mutagenesis in The Breast
7 pages
What Is Genetic Testing? and What Is Its Value?: Sherri J. Bale, PH.D., Facmg President and Clinical Director Genedx
No ratings yet
What Is Genetic Testing? and What Is Its Value?: Sherri J. Bale, PH.D., Facmg President and Clinical Director Genedx
23 pages
Screenshot 2025-05-13 at 2.17.47 AM
No ratings yet
Screenshot 2025-05-13 at 2.17.47 AM
8 pages
Agilent Microarrray Overview PDF
No ratings yet
Agilent Microarrray Overview PDF
39 pages
Gene Ontology and Functional Enrichment: Genome 559: Introduction To Statistical and Computational Genomics
No ratings yet
Gene Ontology and Functional Enrichment: Genome 559: Introduction To Statistical and Computational Genomics
30 pages
Monogenic Deseases ZH-2018 S
No ratings yet
Monogenic Deseases ZH-2018 S
56 pages
GENTICS Combined PDF
No ratings yet
GENTICS Combined PDF
87 pages
Topic 5 Complex Diseases 20121302
No ratings yet
Topic 5 Complex Diseases 20121302
67 pages
Genomics Lectures 9 To 14-2023 PDF
No ratings yet
Genomics Lectures 9 To 14-2023 PDF
65 pages
Statistics and Informatics in Molecular Cancer Research - 1st Edition PDF DOCX Download
No ratings yet
Statistics and Informatics in Molecular Cancer Research - 1st Edition PDF DOCX Download
17 pages
qPCR Techniques for Researchers
No ratings yet
qPCR Techniques for Researchers
15 pages
Personal Genomics Solutions
No ratings yet
Personal Genomics Solutions
20 pages
TRN1580530 7698837 Clinical Report 1666029364641932
No ratings yet
TRN1580530 7698837 Clinical Report 1666029364641932
6 pages
Slujba Pasti
No ratings yet
Slujba Pasti
49 pages
Comprehensive Genomic Profiling at Illumina
No ratings yet
Comprehensive Genomic Profiling at Illumina
23 pages
Genetics & COGD 6 Pages
No ratings yet
Genetics & COGD 6 Pages
16 pages
Prezentare Array Cancer
No ratings yet
Prezentare Array Cancer
44 pages
Big Data and Genomics
No ratings yet
Big Data and Genomics
54 pages
A Physical Map of 30,000 Human Genes: J. Am. Chem. Soc. 117, 4193 (1995) ), As Demonstrat
No ratings yet
A Physical Map of 30,000 Human Genes: J. Am. Chem. Soc. 117, 4193 (1995) ), As Demonstrat
3 pages
Genetic Testing
No ratings yet
Genetic Testing
50 pages
Buletin Analizesimion
No ratings yet
Buletin Analizesimion
4 pages
Mahoney Mmds
No ratings yet
Mahoney Mmds
32 pages
Array CGH2011
No ratings yet
Array CGH2011
10 pages
Gilbert 1999
No ratings yet
Gilbert 1999
12 pages
Genetics 2868
No ratings yet
Genetics 2868
8 pages
The Human Genome Project Is An Ambitious Research Effort Aimed at Deciphering The Chemical Makeup of The Entire Human Genetic Code
No ratings yet
The Human Genome Project Is An Ambitious Research Effort Aimed at Deciphering The Chemical Makeup of The Entire Human Genetic Code
26 pages
Parralal
No ratings yet
Parralal
12 pages
TRN2044095 7939696 Clinical Report
No ratings yet
TRN2044095 7939696 Clinical Report
10 pages
Cytogenetics Reviewer
No ratings yet
Cytogenetics Reviewer
86 pages
Genetic Testing: Key Insights
No ratings yet
Genetic Testing: Key Insights
56 pages
Molecular Diagnostic1
No ratings yet
Molecular Diagnostic1
60 pages
Gangliosidoses GM 1
No ratings yet
Gangliosidoses GM 1
17 pages
Clinical Exome Sequencing & Maternal Cell Contamination
No ratings yet
Clinical Exome Sequencing & Maternal Cell Contamination
4 pages
CME Genetic Medicine (113042) Self-Assessment Questionnaire
No ratings yet
CME Genetic Medicine (113042) Self-Assessment Questionnaire
2 pages
1978 Abstracts
No ratings yet
1978 Abstracts
101 pages
Self Evident Fractal Nature of Pi
No ratings yet
Self Evident Fractal Nature of Pi
2 pages
The Constants of Nature
No ratings yet
The Constants of Nature
3 pages
Knitting The Space-Time Matrix
100% (1)
Knitting The Space-Time Matrix
17 pages
Knitting The Space-Time Matrix
100% (1)
Knitting The Space-Time Matrix
17 pages
Dual, Schizophrenic Nature of Light: Both As A Particle and As A Wave.
No ratings yet
Dual, Schizophrenic Nature of Light: Both As A Particle and As A Wave.
8 pages
An Hubble Alternative Outperforming James Webb Telescope With A 30 Meter Mirror Diameter.
100% (1)
An Hubble Alternative Outperforming James Webb Telescope With A 30 Meter Mirror Diameter.
1 page
π e and Ф as Grandchildren of the Fine Structure Constant α.
No ratings yet
π e and Ф as Grandchildren of the Fine Structure Constant α.
14 pages
Dentapict: Investors Desk
No ratings yet
Dentapict: Investors Desk
12 pages
AI Hypothesis Testing Methodology
100% (1)
AI Hypothesis Testing Methodology
8 pages
Day 1+2
No ratings yet
Day 1+2
5 pages
Technical Document: Niagara Kitcontrol Guide
No ratings yet
Technical Document: Niagara Kitcontrol Guide
80 pages
Sailor. For Global Communication: Sailor Transmitter T2131 and T2135 Error 83
No ratings yet
Sailor. For Global Communication: Sailor Transmitter T2131 and T2135 Error 83
5 pages
Principles (To Nurture) With Answer
No ratings yet
Principles (To Nurture) With Answer
4 pages
Ds2 Series User Manual: Ac Servo System
No ratings yet
Ds2 Series User Manual: Ac Servo System
167 pages
IOIO+App Inventor PDF
No ratings yet
IOIO+App Inventor PDF
11 pages
Grand Prairie, TX Shale Gas Pad Sites and Well Information Updates (12.18.2012)
No ratings yet
Grand Prairie, TX Shale Gas Pad Sites and Well Information Updates (12.18.2012)
11 pages
Sewing Pattern Onesie
80% (5)
Sewing Pattern Onesie
52 pages
GSMA-Mobile-Identity Estonia Case Study June-2013
No ratings yet
GSMA-Mobile-Identity Estonia Case Study June-2013
20 pages
Creative Writing Draft1
No ratings yet
Creative Writing Draft1
6 pages
HUAWEI S Series Switches After-Sales Documentation Bookshelf (Enterprise)
No ratings yet
HUAWEI S Series Switches After-Sales Documentation Bookshelf (Enterprise)
32 pages
Devild 2 Act 1
No ratings yet
Devild 2 Act 1
9 pages
Go-Live Checklist - Sarah Armenio
No ratings yet
Go-Live Checklist - Sarah Armenio
1 page
Activated Carbon in Biomethane Production
100% (1)
Activated Carbon in Biomethane Production
31 pages
CMM Nokia Capacity
No ratings yet
CMM Nokia Capacity
57 pages
Sanjay Soya Private Limited Vs Narayani Trading CoMH20212503211051476COM926559
No ratings yet
Sanjay Soya Private Limited Vs Narayani Trading CoMH20212503211051476COM926559
23 pages
K-Means Clustering - MATLAB Kmeans
No ratings yet
K-Means Clustering - MATLAB Kmeans
23 pages
Boehmer E - Event-Study Methodology Under Conditions of Event-Induced Variance
No ratings yet
Boehmer E - Event-Study Methodology Under Conditions of Event-Induced Variance
20 pages
Zach Blaesi Philosophy CV 2024
No ratings yet
Zach Blaesi Philosophy CV 2024
6 pages
A Note On Anger
No ratings yet
A Note On Anger
8 pages
Managerial Appointments Guide
No ratings yet
Managerial Appointments Guide
13 pages
Lesson 5: Traffic Studies Types of Data Collected
No ratings yet
Lesson 5: Traffic Studies Types of Data Collected
11 pages
Sptopics Design Module1
No ratings yet
Sptopics Design Module1
6 pages
Sales Organizational Structure
No ratings yet
Sales Organizational Structure
7 pages
Cummins - Replacement Parts Catalogue-Engines
100% (1)
Cummins - Replacement Parts Catalogue-Engines
172 pages
Ferrero Nutella Small Mini Design Jar Set of 32 A 25g, Spread, Nut Nougat Cream, Chocolate Spread Amazon - Co.uk Grocery
No ratings yet
Ferrero Nutella Small Mini Design Jar Set of 32 A 25g, Spread, Nut Nougat Cream, Chocolate Spread Amazon - Co.uk Grocery
1 page
Time Expressions Present Simple P Continuous
No ratings yet
Time Expressions Present Simple P Continuous
2 pages
Duties o SSE OHE
100% (1)
Duties o SSE OHE
640 pages
ADBH-2-Form Drilling Records
No ratings yet
ADBH-2-Form Drilling Records
26 pages
Basic Triangle Shawl Knitting Guide
No ratings yet
Basic Triangle Shawl Knitting Guide
2 pages