Deduplication Functionality
NSHConfig provides utility functions to deduplicate config instances based on equality comparison rather than hashing.
Overview
The library includes dedicated functions for removing duplicate configs from collections:
deduplicate()
: A general utility function that removes duplicates from any iterablededuplicate_configs()
: A specialized alias for deduplicating config instances
Implementation
Unlike traditional Python deduplication with sets (which uses hashing), NSHConfig’s deduplication functions use explicit equality comparison. This approach is particularly useful when:
You’re working with complex config objects
You need to deduplicate objects that might not be hashable
You want to control exactly how deduplication works
How It Works
The deduplication process:
Iterates through each item in the input iterable
Compares it to previously seen items using equality comparison (
==
)Only keeps the first occurrence of each unique item
Returns a new list with duplicates removed
Example Usage
from __future__ import annotations
import nshconfig as C
from nshconfig._src.utils import deduplicate_configs
class MyConfig(C.Config):
name: str = "default"
value: int = 0
# Creating configs
config1 = MyConfig(name="example", value=42)
config2 = MyConfig(name="example", value=42) # Same values as config1
config3 = MyConfig(name="different", value=100)
# Deduplicating configs
configs = [config1, config2, config3, MyConfig(name="example", value=42)]
unique_configs = deduplicate_configs(configs)
assert len(unique_configs) == 2 # Only 2 unique configs
assert unique_configs[0] == config1 # First occurrence of this config
assert unique_configs[1] == config3 # First occurrence of this config
Using With Custom Types
The deduplicate
function is generic and can be used with any type that supports equality comparison:
from nshconfig._src.utils import deduplicate
# Deduplicating strings
strings = ["apple", "banana", "apple", "cherry", "banana"]
unique_strings = deduplicate(strings)
assert unique_strings == ["apple", "banana", "cherry"]
# Deduplicating custom objects
class CustomObject:
def __init__(self, id: int, name: str):
self.id = id
self.name = name
def __eq__(self, other):
if not isinstance(other, CustomObject):
return False
return self.id == other.id # Equality based on ID only
objects = [
CustomObject(1, "First"),
CustomObject(2, "Second"),
CustomObject(1, "Different name, same ID"),
CustomObject(3, "Third")
]
unique_objects = deduplicate(objects)
assert len(unique_objects) == 3 # Only 3 unique objects by ID
Relation to Hash Functionality
While NSHConfig also provides automatic hash functionality for Config
classes (see Hash Functionality), the deduplication utilities operate independently of hashing. This means:
You can deduplicate configs even if they don’t have a
__hash__
methodDeduplication is based on equality comparison, not hash values
The order of items in the original collection is preserved
Performance Considerations
The current implementation uses a linear search (any(config == seen_config for seen_config in seen)
) which has O(n²) time complexity in the worst case. For very large collections, this might be less efficient than hash-based deduplication.
However, this approach:
Does not require objects to be hashable
Preserves the original order of items
Uses explicit equality comparison which may be desirable in some cases