187. Repeated DNA Sequences
LeetCode 187. Repeated DNA Sequences
Description
The DNA sequence is composed of a series of nucleotides abbreviated as 'A'
, 'C'
, 'G'
, and 'T'
.
For example,
"ACGAATTCCG"
is a DNA sequence.
When studying DNA , it is useful to identify repeated sequences within the DNA.
Given a string s
that represents a DNA sequence , return all the 10
-letter-long sequences (substrings) that occur more than once in a DNA molecule. You may return the answer in any order.
Example 1:
Constraints:
1 <= s.length <= 10^5
s[i]
is either'A'
,'C'
,'G'
, or'T'
.
Tags
Hash Table, Bit Manipulation
Solution
We first encode each type of nucleotide into an interger between 0 (00) and 3 (11). Then, we slide a window with length of 10 over the given string and use a hash table to check duplicates. The encoding precedures are as follows:
Map
'A'
,'C'
,'G'
, and'T'
into 0, 1, 2 and 3, respectively, and use an integerbitmask
to represent the substring a.k.a the sliding window in binary form;Left shift
bitmask
by 2 bits, since we are using 00~11 to represent each type of nucleotide;Add the incoming nucleotide (binary form) onto the
bitmask
;Clear the highest 2 bits by
& (1111 1111 1111 1111 1111)
.
Complexity
Time complexity:
Space complexity:
Code
Reference
Last updated
Was this helpful?