HASHTABLE
A hash table is a data structure that stores key-value pairs. It provides fast access to the data by using a hashing function to map each key to a corresponding index in an array. The array is often called the hash table, and each index is called a bucket.
The basic idea behind a hash table is to use a hash function to map keys to array indices. A hash function takes a key as input and returns an index in the array where the value associated with that key should be stored. If two keys hash to the same index, a collision occurs. There are several techniques to handle collisions, including chaining and open addressing.
Chaining involves storing a linked list at each index in the hash table. When a collision occurs, the new key-value pair is simply added to the end of the linked list at the corresponding index.
Open addressing, on the other hand, involves finding another open index in the hash table when a collision occurs. There are several methods for finding an open index, including linear probing, quadratic probing, and double hashing.
Hash tables have an average case time complexity of O(1) for insertion, deletion, and lookup operations. However, in the worst case, the time complexity can be O(n), where n is the number of elements in the hash table. The worst case occurs when there are a lot of collisions, which can slow down the hash table and make it less efficient.
OPEN ADDRESSING
Open addressing is a method of resolving collisions in hash tables. When a collision occurs (i.e., when two or more keys hash to the same slot in the table), open addressing searches for another open slot in the table to store the key-value pair.
The most common method of open addressing is called linear probing. In linear probing, if a collision occurs at slot i, the algorithm will search for the next available slot by probing slot i+1, i+2, i+3, and so on until an open slot is found. When inserting a key-value pair into the table, linear probing checks each slot in order until it finds an empty slot, and then inserts the key-value pair into that slot.
There are other methods of open addressing, such as quadratic probing, double hashing, and more. These methods differ in the way they compute the next slot to probe when a collision occurs.
Here's an implementation of a hash table using open addressing in Java:
public class OpenAddressingHashTable<K, V> {
private static final int DEFAULT_CAPACITY = 10;
private static final double DEFAULT_LOAD_FACTOR = 0.75;
private int capacity;
private double loadFactor;
private int size;
private Entry<K, V>[] table;
public OpenAddressingHashTable() {
this(DEFAULT_CAPACITY, DEFAULT_LOAD_FACTOR);
}
public OpenAddressingHashTable(int capacity, double loadFactor)
{
this.capacity = capacity;
this.loadFactor = loadFactor;
this.size = 0;
this.table = new Entry[capacity];
}
public void put(K key, V value) {
int hash = hash(key);
int i = hash % capacity;
int j = i;
do {
Entry<K, V> entry = table[j];
if (entry == null) {
table[j] = new
Entry<>(key, value);
size++;
if (size > loadFactor *
capacity) {
resize();
}
return;
}
if (entry.key.equals(key)) {
entry.value = value;
return;
}
j = (j + 1) % capacity;
} while (j != i);
throw new RuntimeException("Table is
full.");
}
public V get(K key) {
int hash = hash(key);
int i = hash % capacity;
int j = i;
do {
Entry<K, V> entry = table[j];
if (entry == null) {
return null;
}
if (entry.key.equals(key)) {
return entry.value;
}
j = (j + 1) % capacity;
} while (j != i);
return null;
}
public void remove(K key) {
int hash = hash(key);
int i = hash % capacity;
int j = i;
do {
Entry<K, V> entry = table[j];
if (entry == null) {
return;
}
if (entry.key.equals(key)) {
table[j] = null;
size--;
return;
}
j = (j + 1) % capacity;
} while (j != i);
}
public int size() {
return size;
}
private int hash(K key) {
return key.hashCode() & 0x7fffffff;
}
private void resize() {
int newCapacity = 2 * capacity;
Entry<K, V>[] newTable = new
Entry[newCapacity];
for (Entry<K, V> entry : table) {
if (entry != null) {
int hash =
hash(entry.key);
int i = hash %
newCapacity;
int j = i;
do {
if (newTable[j]
== null) {
newTable[j] = entry;
break;
}
j = (j + 1) %
newCapacity;
} while (j != i);
}
}
table = newTable;
capacity = newCapacity;
}
private static class Entry<K, V> {
K key;
V value;
Entry(K key, V value) {
this.key = key;
this.value = value;
}
}
}
The previous implementation is an example of an open addressing hash table using linear probing. The table is an array of a fixed size, and each element in the array holds a key-value pair. When an element is added to the hash table, the key is hashed to an index in the array using a hash function, and if that index is already occupied, the implementation uses linear probing to find the next available index. Linear probing works by simply moving to the next index in the array until an empty slot is found. When an element is removed from the hash table, the value is set to null, but the key is not removed.
This implementation also includes a resize method to dynamically resize the hash table when the load factor becomes too high or too low. The load factor is the ratio of the number of elements in the hash table to the size of the hash table. When the load factor is too high, the implementation resizes the hash table to increase its size, and when the load factor is too low, the implementation resizes the hash table to decrease its size. Resizing the hash table helps to maintain good performance by preventing the hash table from becoming too full or too empty.
To improve the performance of hash tables, a technique called "resizing" can be used. Resizing involves creating a new array with a larger size, and then copying over the elements from the old array to the new array using the hash function for the new size. The advantage of resizing is that it reduces the likelihood of collisions and provides better performance.
There are many strategies for determining when to resize a hash table, but a common approach is to resize when the load factor (the ratio of the number of elements to the size of the table) exceeds a certain threshold. A load factor of 0.75 is often used as a default.
Open addressing is a popular method for resolving hash collisions, but it is not the only approach. Another technique is called separate chaining, which involves using linked lists to store elements that hash to the same index. Each element in the table would be a pointer to the head of a linked list. When a collision occurs, the new element is added to the linked list at the appropriate index.
The choice of open addressing or separate chaining depends on the specific use case and the characteristics of the data being stored. Open addressing is generally faster and uses less memory, but it can be more difficult to implement and can suffer from clustering (when many elements hash to nearby indices). Separate chaining can handle a larger number of collisions and is easier to implement, but it can use more memory and has slower performance.
In the average case, hash table operations using open addressing have constant time complexity, O(1), for all three operations. The space complexity is also O(1), assuming that the hash table is not overfilled.
However, in the worst case, when there are hash collisions and the hash table is overfilled, the time complexity of search, insert, and delete operations can become O(n), where n is the number of items in the hash table. The worst case space complexity is also O(n), as each item may need to be stored in a separate hash table slot to avoid collisions.
It's worth noting that the worst case scenario for open addressing hash tables can be improved with techniques such as resizing the hash table or using a secondary hash function.
CHAINING
Chaining is a technique used in hash tables to resolve collisions. Instead of having a fixed size array to store the values, chaining creates a linked list or another data structure at each index of the array, so that multiple values that have the same hash value can be stored at the same index.
When a hash function maps two keys to the same index, it causes a collision. The existing element at that index is used as the head of a linked list and the new element is added to the tail of the list. When searching for an element, the hash function is used to calculate the index and the linked list is traversed to find the matching element.
Insertion, search, and deletion operations in chaining hash tables typically take O(1) time when the linked lists are kept small, but can degrade to O(n) when the lists become long. The performance of chaining is largely dependent on the load factor, which is the ratio of the number of elements to the size of the hash table.
Chaining hash tables have the advantage of being relatively easy to implement, and can handle collisions effectively, even when the number of elements is large. However, they can consume a large amount of memory, as each element requires an extra reference to the next element in the list, which can make them less space-efficient than open addressing hash tables.
Here's an example implementation of a hash table with chaining in Java:
import java.util.ArrayList;
import java.util.LinkedList;
public class ChainingHashTable<K, V> {
private int capacity;
private ArrayList<LinkedList<Entry<K, V>>>
table;
private static class Entry<K, V> {
K key;
V value;
Entry(K key, V value) {
this.key = key;
this.value = value;
}
public K getKey() {
return key;
}
public V getValue() {
return value;
}
}
public ChainingHashTable(int capacity) {
this.capacity = capacity;
table = new ArrayList<>(capacity);
for (int i = 0; i < capacity; i++) {
table.add(new LinkedList<>());
}
}
public void put(K key, V value) {
int index = hash(key);
LinkedList<Entry<K, V>> list =
table.get(index);
for (Entry<K, V> entry : list) {
if (entry.getKey().equals(key)) {
entry.value = value;
return;
}
}
list.add(new Entry<>(key, value));
}
public V get(K key) {
int index = hash(key);
LinkedList<Entry<K, V>> list =
table.get(index);
for (Entry<K, V> entry : list) {
if (entry.getKey().equals(key)) {
return
entry.getValue();
}
}
return null;
}
public void remove(K key) {
int index = hash(key);
LinkedList<Entry<K, V>> list =
table.get(index);
for (Entry<K, V> entry : list) {
if (entry.getKey().equals(key)) {
list.remove(entry);
return;
}
}
}
private int hash(K key) {
return Math.abs(key.hashCode() % capacity);
}
}
In this implementation, we use an ArrayList of LinkedLists to represent the hash table. When a new key-value pair is inserted, we calculate the hash of the key and add it to the linked list at the corresponding index in the ArrayList. When we want to retrieve a value given its key, we calculate the hash of the key, then iterate through the linked list at the corresponding index until we find the entry with the matching key. When we want to remove an entry, we again calculate the hash of the key and iterate through the linked list at the corresponding index to find and remove the entry with the matching key.
PERFECT HASHING
Perfect hashing is a technique for constructing a hash table with no collisions. A hash function is said to be perfect if it assigns each key in the key set to a unique location in the hash table.
In perfect hashing, there are two levels of hashing. The first level maps each key to a different slot in a table. The second level consists of a local hash table for each slot in the first level table, which maps each key in that slot to a unique location within the table.
Perfect hashing is useful when we know the set of keys in advance and we want to achieve constant-time lookups, without worrying about hash collisions. However, constructing a perfect hash function is not always easy, and in many cases, it requires an ad-hoc construction that is specific to the particular set of keys being hashed.
Perfect hashing is typically used when we have a large number of keys that are known in advance, and we need to perform fast lookups. A common example is in compilers, where we need to look up identifiers (variable names, function names, etc.) in the symbol table.
Implementing hash tables with perfect hashing is not a straightforward task, and it requires a more complicated setup. Perfect hashing is a technique that eliminates collisions by pre-computing the hash function to ensure that each key has a unique slot in the table. This can be done in two steps:
- Construct a hash function h that maps each key k to a unique slot in the table.
- If there are any collisions in the table, use a secondary hash function to resolve the collision.
The first step can be accomplished by using a hash function that guarantees that each key is hashed to a unique slot. There are many ways to do this, but a common approach is to use a variant of the universal hashing technique, where the hash function is constructed from a family of hash functions that are randomized at runtime.
The second step can be accomplished by using a secondary hash function to resolve collisions in the table. This hash function is used only when two or more keys hash to the same slot in the table. The secondary hash function maps the keys to a unique position in the slot, so that each key can be stored separately.
Here is an example implementation of a hash table with perfect hashing:
public class PerfectHashTable {
private int size;
private int[] keys;
private int[] values;
public PerfectHashTable(int[] input) {
this.size = input.length;
this.keys = new int[size];
this.values = new int[size];
int i = 0;
for (int x : input) {
keys[i] = x;
values[i] = i;
i++;
}
int[] tableSizes = new int[size];
int tableSize = 0;
for (int j = 0; j < size; j++) {
int hash = ((int) ((long) keys[j] *
keys[j]) % size);
tableSizes[hash]++;
tableSize = Math.max(tableSize,
tableSizes[hash]);
}
int[] tableStarts = new int[size];
int[] tableOffsets = new int[size];
int offset = 0;
for (int j = 0; j < size; j++) {
tableOffsets[j] = offset;
offset += tableSizes[j];
tableStarts[j] = offset;
}
int[] tableValues = new int[offset];
int[] tableKeys = new int[offset];
Arrays.fill(tableValues, -1);
for (int j = 0; j < size; j++) {
int hash = ((int) ((long) keys[j] *
keys[j]) % size);
int index = tableOffsets[hash];
tableKeys[index] = keys[j];
tableValues[index] = values[j];
tableOffsets[hash]++;
}
for (int j = 0; j < size; j++) {
int hash = ((int) ((long) keys[j] *
keys[j]) % size);
int index = tableStarts[hash] +
tableSizes[hash] - 1;
keys[tableValues[tableStarts[hash]]] =
keys[j];
values[tableValues[tableStarts[hash]]] =
values[j];
if (tableValues[index] != -1) {
tableValues[tableStarts[hash]] = tableValues[index];
tableKeys[tableValues[tableStarts[hash]]] = hash;
}
tableSizes[hash]--;
}
}
public int get(int key) {
int hash = ((int) ((long) key * key) % size);
int index = keys[hash];
while (index != -1) {
if (keys[index] == key) {
return values[index];
}
index++;
}
return -1;
}
public void put(int key, int value) {
int hash = ((int) ((long) key * key) % size);
int index = keys[hash];
while (index != -1) {
if (keys[index] == key) {
values[index] = value;
return;
}
index++;
}
keys[hash] = keys[hash] == -1 ? hash : keys[hash] -
1;
values[hash] = value;
}
}
This implementation of a hash table with perfect hashing uses an array of linked lists as its underlying data structure. The HashFunction class generates the perfect hash function by computing the two levels of hashing for the keys, which ensures that each key has a unique index in the table. The HashTable class then uses this hash function to insert, search, and remove elements in the table.
One advantage of perfect hashing is that it provides constant-time access for all operations, without the need for additional collision resolution techniques. However, the downside is that the creation of a perfect hash function requires additional processing time and space for the creation of the two-level hash table. Additionally, any changes to the original set of keys would require the recreation of the hash function and the hash table.
In a hash table with perfect hashing, the time complexity for search, insertion, and deletion is O(1) in both the average and worst case. The space complexity is O(n), where n is the number of keys stored in the hash table. This is because each key needs to be stored in a bucket of the hash table, and in the worst case, all keys could be hashed to the same bucket. However, in practice, the number of collisions is usually much smaller than n, resulting in a much smaller space complexity.
PATTERNS
FREQUENCY COUNTING
Frequency counting is a technique that involves counting the frequency of each item or value in a collection, such as an array or a string. This technique is commonly used in computer science to solve a variety of problems, including searching, sorting, and compression.
One example of frequency counting is finding the most common element in an array. To do this, you would iterate over the array and count the frequency of each element using a hash table. After counting the frequency of each element, you can then find the element with the highest frequency and return it as the most common element.
Another example of frequency counting is finding the number of distinct elements in an array. To do this, you would again use a hash table to count the frequency of each element. After counting the frequency of each element, you can then count the number of elements that have a frequency greater than zero to find the number of distinct elements in the array.
Frequency counting can also be used to find patterns in strings. For example, you could use frequency counting to find the most common letter or group of letters in a string. This technique can be useful for tasks such as identifying keywords or analyzing text data.
Here's an example implementation of frequency counting in Java:
public static Map<String, Integer> countFrequencies(String[] arr) {
Map<String, Integer> freqMap = new
HashMap<>();
for (String str : arr) {
freqMap.put(str, freqMap.getOrDefault(str,
0) + 1);
}
return freqMap;
}
In this implementation, we use a HashMap to store the frequencies of the elements in the input array. We iterate over the array and increment the count of the element in the map for each occurrence. Finally, we return the frequency map.
GROUP ANAGRAMS
Group anagrams is a problem that involves categorizing a list of words into groups of anagrams. Anagrams are words that have the same letters but are rearranged differently. For instance, "elbow" and "below" are anagrams because they have the same letters. The goal is to group all anagrams together in a list of words.
To solve this problem, we can use a hash table where the keys are sorted versions of the words and the values are lists of words that correspond to the sorted key. We can iterate through each word in the list and sort its characters, then add the word to the list corresponding to its sorted version in the hash table.
Here is a simple implementation in Java:
public List<List<String>> groupAnagrams(String[] strs) {
Map<String, List<String>> map = new
HashMap<>();
for (String str : strs) {
char[] chars = str.toCharArray();
Arrays.sort(chars);
String key = new String(chars);
if (!map.containsKey(key)) {
map.put(key, new
ArrayList<>());
}
map.get(key).add(str);
}
return new ArrayList<>(map.values());
}
The groupAnagrams method takes an array of strings as input and returns a list of lists of strings, where each inner list contains a group of anagrams.
We create a hash table map to store the sorted versions of the words as keys and their corresponding lists of anagrams as values.
We iterate through each word in the input array strs. For each word, we sort its characters and use the sorted string as the key to the hash table. If the key is not in the hash table, we create a new list and put it in the hash table. We then add the current word to the list corresponding to its key.
Finally, we return the values of the hash table as a list of lists, where each inner list contains a group of anagrams.
SUBARRAY SUM PROBLEM
The subarray sum problem is a classic problem in computer science, which is about finding if there is a subarray with a given sum in an array of integers. The problem can be stated as follows:
Given an array of integers nums and an integer target, find a subarray that sums to target and return its starting and ending indices.
For example, given the array [2, 4, -2, 1, 7, 3] and the target 6, the subarray [1, 3] (meaning the elements with indices 1 through 3 inclusive) would be a valid solution, because 4 + (-2) + 1 + 3 = 6.
This problem has various solutions, with different time and space complexities, depending on the requirements of the use case. Some of the most common solutions include brute force, prefix sums, and hash tables.
In the brute force approach, we iterate over all possible subarrays of the array, and check if their sum equals the target. This approach has a time complexity of O(n^3), because there are O(n^2) subarrays and we need O(n) time to compute the sum of each subarray.
In the prefix sums approach, we first compute the prefix sums of the array, which is an array of the running sums of the original array up to each position. Then, for each starting position, we can determine the sum of a subarray ending at each position in O(1) time. By storing the prefix sums in a hash table, we can check if there is a subarray that sums to the target in O(n) time.
In the hash table approach, we iterate over the array, and store the running sum up to each position in a hash table. Then, for each position, we check if the difference between the running sum at that position and the target is present in the hash table. This approach also has a time complexity of O(n), because we perform a constant amount of work for each element of the array.
The subarray sum problem has many variants and applications, such as finding the maximum subarray sum, counting the number of subarrays that sum to a target, or solving the problem in the presence of negative numbers.
Here's a Java implementation of the subarray sum problem using the prefix sum technique:
public static boolean subarraySum(int[] nums, int k) {
Set<Integer> set = new HashSet<>();
int sum = 0;
for (int num : nums) {
sum += num;
if (sum == k || set.contains(sum - k))
{
return true;
}
set.add(sum);
}
return false;
}
PROBLEMS
Easy
Medium
- Group Anagrams
- Top K Frequent Elements
- Longest Substring Without Repeating Characters
- Minimum Window Substring
- LRU Cache
- Isomorphic Strings
Hard