187. Repeated DNA Sequences
LeetCode 187. Repeated DNA Sequences
Description
The DNA sequence is composed of a series of nucleotides abbreviated as 'A', 'C', 'G', and 'T'.
For example,
"ACGAATTCCG"is a DNA sequence.
When studying DNA , it is useful to identify repeated sequences within the DNA.
Given a string s that represents a DNA sequence , return all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.
Example 1:
Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"
Output: ["AAAAACCCCC","CCCCCAAAAA"]Constraints:
1 <= s.length <= 10^5s[i]is either'A','C','G', or'T'.
Tags
Hash Table, Bit Manipulation
Solution
We first encode each type of nucleotide into an interger between 0 (00) and 3 (11). Then, we slide a window with length of 10 over the given string and use a hash table to check duplicates. The encoding precedures are as follows:
Map
'A','C','G', and'T'into 0, 1, 2 and 3, respectively, and use an integerbitmaskto represent the substring a.k.a the sliding window in binary form;Left shift
bitmaskby 2 bits, since we are using 00~11 to represent each type of nucleotide;Add the incoming nucleotide (binary form) onto the
bitmask;Clear the highest 2 bits by
& (1111 1111 1111 1111 1111).
Complexity
Time complexity:
Space complexity:
Code
Reference
Last updated
Was this helpful?