Python provides a useful data type for easily finding common or unique items (also referred to as elements) between two or more groups of data: sets. Like dictionaries, lists, and tuples, sets are a type of collection—a container for separate pieces of data. Here’s what distinguishes sets from these other data types:
1) Sets, by design, contain no duplicate items. Any iterable can be passed to the set() constructor, but all duplicated elements in that iterable will automatically be removed:
>>>lst = ["row", "row", "row your boat"]
>>> set(lst)
{'row your boat', 'row'}
Note how the order of the “row” and “row your boat” strings in the set doesn’t match that of the original list. This naturally brings us to our next point. . .
2) Sets are unordered (unlike sequence data types like lists and tuples), and thus don’t support index or slice-based operations. Therefore, sets won’t “remember” the order in which the original elements were created or later added.
3) All items in a set must be hashable—but the set itself doesn’t have to be. In fact, sets are mutable (meaning their contents can be changed after creation). Consequently, sets can’t be members of other sets (i.e. nested sets aren’t possible), nor can sets be used as keys in a dictionary.
But wait, in the code example above didn’t we create a set straight from a list, which isn’t hashable? Although the code may make it look like we merely converted a copy of the list into a set, in reality the set() constructor imported all the elements of that list into a new set. If we wrap that same list inside of another list, and then try to create a set from that nested list, the operation will fail:
>>> lst2 = [lst]
>>> lst2
[['row', 'row', 'row your boat']]
>>> set(lst2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
At this point we should point out that in Python, sets actually come in two flavors: the first is the regular set, which is mutable. The second is the frozenset, which is immutable and thus can be used as a dictionary key and also become a member in another set or frozenset.
Common Uses for Sets
Sets are really designed for mathematical operations focused on membership testing:
Difference: What items are present in one set but not the other? For example, consider the scenario of two competing businesses who then merge. Their databases of potential customers very likely partially overlap. Importing both databases as sets and performing a difference operation could quickly identify new business opportunities the one of the companies’ sales team didn’t even know existed.
Union: Essentially merging two or more sets and removing duplicates.
Intersection: Compare to or more sets to find what items are common to all of them. What responses to survey questions appear across all age or ethnic groups?
Symmetric Difference: Which elements are only found in one set? What specific survey question responses are only found in a specific subgroup (e.g. Millennials or Boomers).
Python sets can be powerful tools for consolidating, analyzing, comparing, and reporting collections of real-world data.
Copyright © Python People