Writing better code: Finding duplicates in an array

A few days ago at work one of my colleagues posed a question about the most elegant way to find the duplicates in an array (in javascript), and after having seen the variety of solutions that people suggested I thought I’d write a small post about it. Hopefully it’ll help some think about different possible solutions to code problems.

For this post, we’re just going to use an array of strings and explore several different ways of finding & storing duplicates in arrays.

Here’s the array we’ll use:

const letters = ['a', 'b', 'd', 'b', 'c', 'a', 'e', 'c', 'a'];

So, for the first example we’re going to start with something that’s a bit unsophisticated, but probably close to the kind of solution most people would immediately think of.

const getDuplicates = (list) => {
  const seen = [];
  const dupes = [];

  for (var i = 0; i < list.length; i += 1) {
    const value = list[i];

    if (seen.indexOf(value) >= 0 && dupes.indexOf(value) < 0) {
      dupes.push(value);
    }

    seen.push(value);
  }

  return dupes;
};

// ["a", "b", "c"]

This solution relies on a for loop which leaves more margin for error. This could simply be improved by using an ES6 forEach. By doing so we’d also remove the need to define the value of each item to a variable.

Additionally we could use ES6 includes instead of checking for indices.

const getDuplicates = (list) => {
  const seen = [];
  const dupes = [];

  list.forEach((value) => {
    if (seen.includes(value) && !dupes.includes(value)) {
      dupes.push(value);
    }

    seen.push(value);
  });

  return dupes;
};

// ["a", "b", "c"]

This solution though, still relies on modifying scope outside of the for loop, which is not ideal.

Another similar solution is to use a while loop and remove items from the array as we loop.

const getDuplicates = (list) => {
  const copy = [...list];
  const dupes = [];

  while (copy.length) {
    const value = copy.shift();

    if (copy.includes(value) && !dupes.includes(value)) {
      dupes.push(value);
    }
  }

  return dupes;
};

// ["a", "b", "c"]

This has the benefit that we do not need to create a seen list of values, but it still has the same problems as the previous solution as far as scoping goes. Also we need to take a copy of our array before we can use shift on it as this would mutate the value.

Additionally, while loops are, in my opinion, something to avoid wherever possible as they can easily get stuck and cause your app / site to freeze up. E.g. If this function was accidentally passed an object with a length key instead of an array, the length would always be truthy and so you’d hit a maximum call stack size error.

The next solution is the over-engineered solution.

const getDuplicates = (list) => {
  const getDuplicatesInner = (lst, dupes) => {
    const [value, ...remaining] = lst;

    if (!lst || !lst.length) {
      return dupes;
    }

    if (!dupes.includes(value) && remaining.includes(value)) {
      return getDuplicatesInner(remaining, [...dupes, value]);
    }

    return getDuplicatesInner(remaining, dupes)
  };

  return getDuplicatesInner(list, []);
};

// ["a", "b", "c"]

This solution is a recursive function, but with the recursive part contained inside the scope of the external API (so that it cannot be incorrectly used). Unlike the previous examples there is no mutation in this example, and there is no accessing variables from an outer scope. Although this sounds good, this example is not great. It is incredibly verbose and the use of the ... spread operator is going to make it very slow, as this is equivalent to calling array.slice(1) in ES5. It is essentially creating another 2 arrays for every iteration of the original array.

The first concise solution.

const getDuplicates = (list) => {
  return list.reduce((memo, v, i) =>
    !memo.includes(v) && i !== list.lastIndexOf(v) ? [...memo, v] : memo
  , []);
};

// ["a", "b", "c"]

By simply using a reduce function we can create an array of the duplicate values without any mutation, while / for loops, or accessing external scope.

This can also be easy modified to return an array of the indices of the duplicates.

const getDuplicateIndices = (list) => {
  return list.reduce((memo, v, i) =>
    (i !== list.indexOf(v) || i !== list.lastIndexOf(v)) ? [...memo, i] : memo
  , []);
};

// [0, 1, 3, 4, 5, 7, 8]

Again, this could be easily modified to return an array with true if the value is a duplicate, and false if it is not.

const getDuplicateBooleans = (list) => {
  return list.reduce((memo, v, i) =>
    [...memo, i !== list.indexOf(v) || i !== list.lastIndexOf(v)]
  , []);
};

// [true, true, false, true, true, true, false, true, true]

Okay, so reduce is pretty flexible, but here we are still using the spread operator to avoid mutation, and actually, that last example can be done in an even simpler way, using map.

const getDuplicateBooleans = (list) => {
  return list.map((v, i) =>
    i !== list.indexOf(v) || i !== list.lastIndexOf(v)
  );
};

// [true, true, false, true, true, true, false, true, true]

So, if all you want is an array of true / false to find the duplicates this seems perfect. I wonder if we can simplify any of the other reduce examples… answer: we totally can.

Let’s get an array of the duplicates without using reduce.

const getDuplicates = (list) => {
  return list.filter((v, i) =>
    i === list.indexOf(v) && i !== list.lastIndexOf(v)
  );
};

// ["a", "b", "c"]

Well, that was really easy too. All we had to do was filter and exclude everything except the first index of a duplicate.

We could also adjust this to give us all instances of duplicates if we really wanted.

const getDuplicates = (list) => {
  return list.filter((v) =>
    list.indexOf(v) !== list.lastIndexOf(v)
  );
};

// ["a", "b", "b", "c", "a", "c", "a"]

Okay, good so far. Can we also use what we’ve done for these last few examples to get the indices of the duplicates without using reduce? Of course we can, we’re developers - we can do anything.

const getDuplicateIndices = (list) => {
  return list.map((v, i) =>
  	(i !== list.indexOf(v) || i !== list.lastIndexOf(v)) ? i : false
  ).filter((v) => v !== false);
};

// [0, 1, 3, 4, 5, 7, 8]

By simply combining our map & filter functions we can get an array of all the duplicates’ indices.

We can even modify this to only return the first index of a duplicate.

const getDuplicateIndices = (list) => {
  return list.map((v, i) =>
  	(i === list.indexOf(v) && i !== list.lastIndexOf(v)) ? i : false
  ).filter((v) => v !== false);
};

// [0, 1, 4]

I think that should be enough to give you an idea of how you can take a different approach to solving problems.

There is no correct answer for this, it can greatly depend on the use case, what libraries you’re using, what browsers you want to support, etc, but I hope I have expanded your mind at least a little.

…and if you just came here looking for a way to get the duplicates in an array, I hope at least one of these will help you.