Post

LC 287 - Find the Duplicate Number

LC 287 - Find the Duplicate Number

Question

Given an array of integers nums containing n + 1 integers where each integer is in the range [1, n] inclusive.

There is only one repeated number in nums, return this repeated number.

You must solve the problem without modifying the array nums and using only constant extra space.

Example 1:

1
2
Input: nums = [1,3,4,2,2]
Output: 2

Example 2:

1
2
Input: nums = [3,1,3,4,2]
Output: 3

Example 3:

1
2
Input: nums = [3,3,3,3,3]
Output: 3

Constraints:

  • 1 <= n <= 105
  • nums.length == n + 1
  • 1 <= nums[i] <= n
  • All the integers in nums appear only once except for precisely one integer which appears two or more times.

Follow up:

  • How can we prove that at least one duplicate number must exist in nums?
  • Can you solve the problem in linear runtime complexity?

Question here and solution here

Solution

concept

The main problem here is the constraints: we cannot modify the array and only constant extra space. (this means we cannot use a hashset) This problem we need to recognise that:

  1. this is a linked list problem: specifically we want to find the cycle in a linked list and the start of the cycle is our answer
  2. in order to achieve this, we need to use Floyd’s tortoise and hare algorithm.

For the first part, we need to take note that each number in the array is a pointer that points to the next index of the array. Take note below the graph is not correct, it should be1 -> 3 -> .... Basically the node itself is the value in the array, and it points to the next index. Note that due to definition, 0 is not included in the array, this means that index 0 is never going to be part of the cycle, since no number can point to 0. This is important information for the Floyd’s algorithm.

The Floyd’s algorithm requires 2 phase:

  1. slow and fast pointer starts at the same place (and this place cannot be part of the cycle). In this question this is guarantee because index 0 is not part of cycle
    1. when the slow and fast meet in the cycle, we will stop and take note theslow position
  2. we then have another pointer slow2 which starts at the starting point (same as fast and slow pointer initially), and have slow2 and slow traverse again.
    1. when slow2 and slow meets, this is the starting point of the cycle
    2. this because the distance from slow2 to the starting point is the same as the distance of the slow to the starting of the cycle. If slow2 is very far away from the starting point of the cycle, slow might need to cycle many times before meeting slow2.

code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
class Solution:
    """
    Floyd's Tortoise and Hare (Cycle Detection)
    rephrase the problem to a linked list problem: the value of the array is the index of the next element
    time: O(n)
    space: O(1)
    Refer to NeetCode's solution: https://www.youtube.com/watch?v=wjYnzkAhcNk

    1. Find the intersection point of the slow and fast, these two pointers will meet if cycle is detected
    2. Find the entrance of the cycle:
    the distance of the overlap of fast and slow pointers to the entrance of the cycle is equal to the distance of the head to the entrance of the cycle
    """
    def findDuplicate(self, nums: List[int]) -> int:
        # first phase: find the intersection point of the slow and fast pointers
        slow, fast = 0, 0
        while True:
            slow = nums[slow]
            fast = nums[nums[fast]]
            if slow == fast:
                break
        # second phase: find the entrance of the cycle
        # if the distance from head to the entrance is very long, then slow pointer will loop many rounds in cycle
        slow_2 = 0
        while True:
            slow = nums[slow]
            slow_2 = nums[slow_2]
            if slow == slow_2:
                return slow

Complexity

time: $O(n)$
space: $O(1)$

This post is licensed under CC BY 4.0 by the author.