Check Element Existence from Generator rather than List


Problem Statement

In Python, we often need to check if an element exists in a list, e.g.

if "y" in ["x", "y", "z"]:
    print("y is present")

The runtime cost of lookup is O(n) which is acceptable in a small list.

But in the case of a list with a large number of elements, especially an unwieldy list created by list comprehension, the cost of initializing the list is nontrivial.

Solution

Instead of looping through a list, we can check through a generator.

Generator expression is similar to a list comprehension but uses parentheses () instead of brackets [].

if "y" in ("x", "y", "z"):
    print("y is present")

Runtime Comparison

Assuming a requirement to check if integer 1_000_000 exists in a list with 100_000_000 elements, the following demonstrate the runtime comparison by using list and using generator.

import timeit

MAX_RANGE = 100_000_000


def if_exist_in_list(target):
    element_list = [i for i in range(MAX_RANGE)]
    return True if target in element_list else False


def if_exist_in_generator(target):
    element_generator = (i for i in range(MAX_RANGE))
    return True if target in element_generator else False


iterations = 1

total_time = timeit.timeit('if_exist_in_list(1_000_000)', number=iterations,
                           globals=globals())
print(f"Run time is {total_time / iterations:.4f} seconds by using list")

total_time = timeit.timeit(
    'if_exist_in_generator(1_000_000)', number=iterations, globals=globals()
)
print(f"Run time is {total_time / iterations:.4f} seconds by using generator")

element_list is created by list comprehension, while element_generator is created by generator expression.

By executing the above snipper, the runtime advantage on generator over a list is obvious:

Run time is 2.2028 seconds by using list
Run time is 0.0293 seconds by using generator

Analysis

Even though looping over a list and generator both have a runtime cost of O( n), the advantage of using generator is due to the minimum initialization cost of data structure.

In list comprehension, all elements have to be instantiated in memory before an O(n) lookup procedure could begin.

While looping a generator, each element is yield on the fly, no additional memory allocation required. Once an element is found, the generator terminates.

Therefore, using generator is a runtime-efficient and memory-optimized solution in checking element existence, at least compared with list data structure.