
Check Element Existence from Generator rather than List
Problem Statement
In Python, we often need to check if an element exists in a list, e.g.
if "y" in ["x", "y", "z"]:
print("y is present")
The runtime cost of lookup is O(n) which is acceptable in a small list.
But in the case of a list with a large number of elements, especially an unwieldy list created by list comprehension, the cost of initializing the list is nontrivial.
Solution
Instead of looping through a list, we can check through a generator.
Generator expression is similar to a list comprehension but uses parentheses
()
instead of brackets[]
.
if "y" in ("x", "y", "z"):
print("y is present")
Runtime Comparison
Assuming a requirement to check if integer 1_000_000
exists in a list with
100_000_000
elements, the following demonstrate the runtime comparison by
using list and using generator.
import timeit
MAX_RANGE = 100_000_000
def if_exist_in_list(target):
element_list = [i for i in range(MAX_RANGE)]
return True if target in element_list else False
def if_exist_in_generator(target):
element_generator = (i for i in range(MAX_RANGE))
return True if target in element_generator else False
iterations = 1
total_time = timeit.timeit('if_exist_in_list(1_000_000)', number=iterations,
globals=globals())
print(f"Run time is {total_time / iterations:.4f} seconds by using list")
total_time = timeit.timeit(
'if_exist_in_generator(1_000_000)', number=iterations, globals=globals()
)
print(f"Run time is {total_time / iterations:.4f} seconds by using generator")
element_list
is created by list comprehension, while element_generator
is
created by generator expression.
By executing the above snipper, the runtime advantage on generator over a list is obvious:
Run time is 2.2028 seconds by using list
Run time is 0.0293 seconds by using generator
Analysis
Even though looping over a list and generator both have a runtime cost of O( n), the advantage of using generator is due to the minimum initialization cost of data structure.
In list comprehension, all elements have to be instantiated in memory before an O(n) lookup procedure could begin.
While looping a generator, each element is yield
on the fly, no additional
memory allocation required. Once an element is found, the generator terminates.
Therefore, using generator is a runtime-efficient and memory-optimized solution in checking element existence, at least compared with list data structure.