Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Repeated DNA Sequences in C++
Suppose we have a DNA sequence. As we know, all DNA is composed of a series of nucleotides abbreviated such as A, C, G, and T, for example: "ACGAATTCCG". When we are studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
We have to write one method to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
So if the input is like “AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT”, then the output will be ["AAAAACCCCC", "CCCCCAAAAA"].
To solve this, we will follow these steps −
Define an array ret, n := size of s, create two sets called visited and visited2
define a map called bitVal.
Store corresponding values for ACGT like 0123 into butVal.
mask := 0
-
for i in range 0 to n – 1
mask := mask * 4
mask := mast OR bitVal[s[i]]
mask := mask AND FFFFF
-
if i < 9, then just continue to the next iteration
insert substring form index i – 9 to 9, into ret
insert mark into visited2.
insert mask into visited
return ret
Example(C++)
Let us see the following implementation to get a better understanding −
#include <bits/stdc++.h>
using namespace std;
void print_vector(vector<auto> v){
cout << "[";
for(int i = 0; i<v.size(); i++){
cout << v[i] << ", ";
}
cout << "]"<<endl;
}
typedef long long int lli;
class Solution {
public:
vector<string>findRepeatedDnaSequences(string s) {
vector <string> ret;
int n = s.size();
set <int> visited;
set <int> visited2;
map <char, int> bitVal;
bitVal['A'] = 0;
bitVal['C'] = 1;
bitVal['G'] = 2;
bitVal['T'] = 3;
lli mask = 0;
for(int i = 0; i < n; i++){
mask <<= 2;
mask |= bitVal[s[i]];
mask &= 0xfffff;
if(i < 9) continue;
if(visited.count(mask) && !visited2.count(mask)){
ret.push_back(s.substr(i - 9, 10));
visited2.insert(mask);
}
visited.insert(mask);
}
return ret;
}
};
main(){
Solution ob;
print_vector(ob.findRepeatedDnaSequences("AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"));
}
Input
"AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output
[AAAAACCCCC, CCCCCAAAAA, ]