Collections
Collections
The Collection in Java is a framework that provides architecture to store and manipulate the group
of objects.
Java Collections can achieve all the operations that you perform on a data such as searching,
sorting, insertion, manipulation, and deletion.
Java Collection means a single unit of objects. Java Collection framework provides many interfaces
(Set, List, Queue, Deque) and classes (ArrayList, Vector, LinkedList, PriorityQueue, HashSet,
LinkedHashSet, TreeSet).
Iterable interface
This is the root interface for the entire collection framework. The collection interface extends the
iterable interface. Therefore, inherently, all the interfaces and classes implement this interface. The
main functionality of this interface is to provide an iterator for the collections. Therefore, this interface
contains only one abstract method which is the iterator. It returns the Iterator iterator();
Collection Interface
All the Classes of the Collection Framework implement the Collection Interface. The Collection interface
is not directly implemented by any class. However, it is implemented indirectly via its subtypes or
subinterfaces like List, Queue, and Set. Basic operations in Collection interface
Adding the elements –add(E e) and addAll(Collection c)
Removing the elements - remove(E e) and removeAll(Collection c)
Iterating the elements – iterate()
size() – returns the number of elements in the collection
stream() – returns a sequential stream
isEmpty() – returns true if the collection contains no elements.
1. List – an ordered collection. Maintains the order of insertion and allow duplicates.
2. Set - A collection that doesn’t allow duplicates.
3. Map - A collection of key-value pairs, perfect for fast lookups.
List Interface
ArrayList internally uses dynamic array to store the elements. ArrayList is a resizable array. When
ArrayList is initialized, default capacity of 10 is assigned.
LinkedList acts as a dynamic array and we do not have to specify the size while creating it, the size of
the list automatically increases when we dynamically add and remove items. LinkedList is
implemented using the doubly linked list data structure. The main difference between a normal
linked list and a doubly LinkedList is that a doubly linked list contains an extra pointer, typically
called the previous pointer, together with the next pointer and data which are there in the singly
linked list.
LinkedList Vs ArrayList
ArrayList have super-fast random access. Data access is faster in ArrayList. In LinkedList, if 1000 records
are there. If you want to get the 65th element, LinkedList has to start from the beginning element and
keep getting the next element. So data access or retrieval is slow in LinkedList.
In ArrayList if you want to insert an new element in 45th place, internally arrayList will create another
Array with the original element and insert the new element in 45th place and shift the other elements in
memory. But in Linked List, it can find the 45th place and insert an element and change the next and Prev
references. Adding and removing elements are faster in LinkedList.
When to choose ArrayList : If your list is static and the values don’t change much often and list is used
for retrieving elements use ArrayList as random access is super-fast.
When to choose LinkedList : When your program don’t need too much of data retrieval and focus on
adding and removing elements then Use LinkedList.
ArrayList LinkedList
Internally uses a dynamic array to store the Internally uses a doubly linked list to store the
elements. elements
ArrayList is better for storing and accessing data. LinkedList is better for data manipulation.
Data access is faster in ArrayList. Adding and removing elements are faster in
LinkedList.
ArrayList Vs Vector
Differences
Vector is synchronized; ArrayList is not. Because of this ArrayList is faster than vector.
Set Interface
Set is a collection that does not allow duplicate elements, and it can be part of different
implementations like HashSet, TreeSet, or LinkedHashSet. However, it's important to note that the Set
interface itself is not synchronized by default.
HashSet
LinkedHashset
Java LinkedHashSet class maintains insertion order.
Java LinkedHashSet class is non-synchronized.
Allows only 1 null element like HashSet.
TreeSet
TreeSet class maintains ascending order.
TreeSet class access and retrieval times are quiet fast.
TreeSet class doesn't allow null element.
List Vs Set
List Set
List is an indexed sequence Set is an non-indexed sequence
Allows duplicates Set doesn’t allow duplicates
Elements can be accessed by their position. Position access to elements is not allowed.
Multiple null elements can be stored Null elements can store only once
List - ArrayList, LinkedList, Stack Set implementations are HashSet, LinkedHashSet
How does Set ensure that it does not have duplicates? What does it do?
The Set interface ensures that it does not allow duplicate elements by relying on the implementation-
specific mechanisms of classes like HashSet, LinkedHashSet, or TreeSet. Here’s how it works internally:
HashSet
Underlying Mechanism: HashSet uses a HashMap internally to store its elements as keys. The value
for all entries is a constant dummy object (PRESENT).
How Duplicates Are Avoided:
o When you add an element to a HashSet, its hashCode() is calculated to determine the bucket.
o If the bucket already contains an element with the same hash code, the equals() method checks
whether the element is identical to an existing one.
o If hashCode() and equals() determine the element is a duplicate, it is not added to the set.
LinkedHashSet
TreeSet
Underlying Mechanism: TreeSet is implemented using a red-black tree (a self-balancing binary search
tree).
How Duplicates Are Avoided:
Map interface
A map contains values on the basis of key, i.e. key and value pair. Each key and value pair is known
as an entry. A Map contains unique keys.
A Map is useful if you have to search, update or delete elements on the basis of a key.
There are two interfaces for implementing Map in java: Map and SortedMap, and three classes:
HashMap, LinkedHashMap, and TreeMap.
A Map doesn't allow duplicate keys, but you can have duplicate values. HashMap and
LinkedHashMap allow null keys and values, but TreeMap doesn't allow any null key or value.
A Map can't be traversed, so you need to convert it into Set using keySet() or entrySet() method.
HashMap
Java HashMap class implements the Map interface which allows us to store key and value pair,
where keys should be unique and allows fast retrieval of values using keys.
HashMap may have one null key and multiple null values.
If you try to insert the duplicate key, it will replace the element of the corresponding key. It is easy
to perform operations using the key index like updation, deletion, etc.
HashMap doesn’t maintain the order.
HashMap is not synchronized. It allows us to store the null elements as well, but there should be
only one null key. Since Java 5, it is denoted as HashMap<K,V>, where K stands for key and V for
value.
Capacity and Load Factor: HashMap has a default initial capacity and load factor. The initial capacity is
the number of buckets when the HashMap is created, and the load factor determines when the
HashMap should resize.
LinkedHashMap
A LinkedHashMap contains values based on the key. It implements the Map interface and extends
the HashMap class.
It contains only unique elements. It may have one null key and multiple null values.
It is non-synchronized.
It is the same as HashMap with an additional feature that it maintains insertion order. For
example, when we run the code with a HashMap, we get a different order of elements.
SynchronizedHashMap Vs ConcurrentHashMap
ConcurrentHashMap : While dealing with thread in our application HashMap is not a good choice
because of the performance issue. To resolve this issue, we use ConcurrentHashMap in our application.
ConcurrentHashMap is thread-safe therefore multiple threads can operate on a single object without
any problem. In ConcurrentHashMap, the Object is divided into a number of segments according to the
concurrency level. By default, it allows 16 thread to read and write from the Map without any
synchronization. In ConcurrentHashMap, at a time any number of threads can perform retrieval
operation but for updating in the object, the thread must lock the particular segment in which the
thread wants to operate. This type of locking mechanism is known as Segment locking or bucket
locking. Hence, at a time 16 update operations can be performed by threads.
Synchronized HashMap : Java HashMap is a non-synchronized collection class. If we need to perform
thread-safe operations on it then we must need to synchronize it explicitly.
The synchronizedMap() method of java.util.Collections class is used to synchronize it. It returns a
synchronized (thread-safe) map backed by the specified map.
ConcuurentHashMap doesn’t allow inserting Synchronized HashMap allows inserting null as a key.
null as a key or value.
ConccurentHashMap doesn’t throw Synchronized HashMap
ConcurrentModificationException. throw ConcurrentModificationException.
Methods Description
put(K key, V value) Adds a key-value pair to the map. If the key already exists, updates
its value.
putAll(Map<? extends K, ? extends V> m) Copies all entries from the specified map into this map.
get(Object key) Retrieves the value associated with the specified key, or null if the
key is not present.
remove(Object key) Removes the entry for the specified key from the map.
containsKey(Object key) Returns true if the map contains the specified key.
containsValue(Object value) Returns true if the map contains the specified value.
isEmpty() Returns true if the map contains no key-value pairs.
size() Returns the number of key-value pairs in the map.
HashTable
It is similar to HashMap, but is synchronized. Hashtable stores key/value pair in hash table.
In Hashtable we specify an object that is used as a key, and the value we want to associate to that
key. The key is then hashed, and the resulting hash code is used as the index at which the value is
stored within the table.
The initial default capacity of Hashtable class is 11.
HashMap Vs HashTable
HashMap HashTable
HashMap allows null key and values. If you add Doesn’t allow null key and values.
multiple null keys, the final null key value will be
replaced.
EX: HashMap map = new HashMap();
map.put(null, "1");
map.put(null,"2");
System.out.println(map);
Output: {null=2}
HashMap is not synchronized. Better for single HashTable is Synchronized. Better for multi-
threaded environment threaded environment
For inserting, deleting and locating the elements in Better for Sorting the elements
a Map, HashMap will be faster than HashTable.
ConcurrentHashMap Vs HashTable
Both are synchronized by default. In general, ConcurrentHashMap is the preferred choice in concurrent
programming scenarios in Java due to its efficiency and flexibility.
ConcurrentHashMap HashTable
ConcurrentHashMap uses fine grained locking at HashTable uses a single lock for the entire table.
the bucket level (segments). This allows This means only one thread can access the table at
concurrent reads and limited concurrent writes, a time, even for reads, creating a bottleneck in
significantly improving performance in high-concurrency scenarios.
multithreading environments.
ConcurrentHashMap provides a fail-safe iterator When iterating over a Hashtable, if the map is
that can be used safely even if the map is modified modified during the iteration, a
during iteration. Changes to the map do not affect ConcurrentModificationException may be thrown.
the ongoing iteration.
Recommended for concurrent applications where HashTable is considered somewhat legacy and is
a map needs to be shared among multiple threads, rarely used in modern Java applications. It is
providing better performance and scalability. usually replaced by ConcurrentHashMap or other
concurrent collections.
ConcurrentHashMap Offers better performance in HashTable performance can degrade under high
multi-threaded scenarios. It allows concurrent contention due to its global lock mechanism.
reads and updates, which minimizes the impact of
locking.
TreeMap
HashMap TreeMap
HashMap allows a single null key and multiple null TreeMap does not allow null keys but can
values. have multiple null values.
HashMap is faster than TreeMap because it TreeMap is slow in comparison to HashMap
provides constant-time performance that is O(1) because it provides the performance of O(log(n))
for the basic operations like get() and put(). for most operations like add(), remove() and
contains().
HashMap class contains only basic functions TreeMap class is rich in functionality, because it
like get(), put(), KeySet(), etc. contains functions like: tailMap(), firstKey(),
lastKey(), pollFirstEntry(), pollLastEntry().
HashMap does not maintain any order. The elements are sorted in natural
order (ascending).
The HashMap should be used when we do not The TreeMap should be used when we require
require key-value pair in sorted order. key-value pair in sorted (ascending) order.
hashCode() : Java Object hashCode() is a native method and returns the integer hash code value of the
object. The general contract of hashCode() method is:
Multiple invocations of hashCode() should return the same integer value, unless the object property
is modified that is being used in the equals() method.
An object hash code value can change in multiple executions of the same application.
If two objects are equal according to equals() method, then their hash code must be same.
If two objects are unequal according to equals() method, their hash code are not required to be
different. Their hash code value may or may-not be equal.
If o1.equals(o2), then o1.hashCode() == o2.hashCode() should always be true. If o1.hashCode() ==
o2.hashCode is true, it doesn’t mean that o1.equals(o2) will be true.
The general contract states that if two objects are equal, their hash codes must also be equal. Failure to
override the hashCode method can lead to inconsistent behavior when objects are used in hash-based
collections.
Overriding equals and hashCode is crucial when working with collections. Collections
like HashSet, HashMap, or Hashtable rely on the hashCode method to organize and search for objects
efficiently.
If we don’t override equals and hashCode correctly, these collections may not function as expected.
Objects that should be considered equal might not be properly identified, leading to duplicates in sets or
incorrect retrieval from maps.
A hash function is responsible for transforming an input (a key) into a hash value, which determines the
index where the corresponding value should be stored in the hash table. However, it is possible for two
different keys to generate the same hash value, leading to a collision. A hash collision occurs when two
different inputs produce the same hash value after being processed by a hashing algorithm.
Hash Map in Java: In a HashMap, the hashCode() of a key determines the bucket where the key-value
pair is stored. If two keys have the same hashCode() but are not equal (key1.equals(key2) returns false),
it results in a hash collision.
Collisions are resolved using techniques like Separate chaining and open addressing.
1) Separate Chaining: Separate chaining is a technique that uses linked lists to store elements with
the same hash value. It stores the new element in the end of the linked list of the same bucket.
2) Open Addressing: Open addressing is another collision resolution technique where all elements are
stored in the same table. In case of a collision, a new index is calculated using a probing sequence
until an empty slot is found.
For example hash collision happens in hashmap and you're trying to insert, an item.Then
what happens after that?
In case of a collision, the equals() method is used to check if the new key matches any existing key:
If a matching key is found, the value is updated instead of inserting a new node.
If no match is found, the new node is appended to the end of the list.
A HashMap in Java is a data structure that stores key-value pairs and allows fast retrieval of values using
keys. HashMap uses Hashing mechanism internally.
Data Structure
A HashMap uses an array of nodes (buckets) where each bucket can hold multiple key-value pairs.
Each node in a bucket is represented by an instance of the Node<K, V> class, which stores:
o The key
o The value
o The hash code of the key
o A reference to the next node (for handling collisions)
Hashing:
When you insert a key-value pair, the key's hashCode() is computed. The hash code is processed
using a hash function (typically a bitwise operation like hash ^ (hash >>> 16)) to reduce the risk of
collisions.
This hash code is used to determine the bucket (an index in the internal array) where the key-value
pair should be stored.
Insertion
Step 1: The key's hashCode() is calculated, and the bucket index is determined.
Step 2: If the bucket is empty, the key-value pair is stored as a new node.
Step 3: If a collision occurs (multiple keys mapping to the same bucket), the equals() method
checks if the key already exists:
o If yes, the value is updated.
o If no, the new key-value pair is added to the bucket
Retrieval
To retrieve a value:
o The key's hashCode() is calculated to find the bucket index.
o The bucket is searched for the key using equals().
o If the key is found, its value is returned.
Resize Operation
When the number of entries exceeds the load factor (default 0.75), the HashMap resizes itself by
doubling the capacity.
During resizing:
o A new array is created.
o All entries are rehashed and redistributed into the new array.
Performance
Best Case: O(1) for insertion, retrieval, and deletion if there are no collisions.
Worst Case: O(n) when all keys collide and form a single linked list/tree in a bucket.
Collision Resolution
Chaining: Colliding entries are stored in a linked list within the same bucket.
Treeification (Java 8+): If the number of entries in a bucket exceeds a threshold (default 8), the
linked list is converted into a balanced tree (red-black tree) for faster access.
Generics were introduced in Java 5, which allowed for type-safe collections. Prior to this, collections
could hold any object type, which led to runtime ClassCastException. With generics, you could specify
the type of objects the collection would hold, providing better type checking at compile time.
Example: List<String> list = new ArrayList<>(); // A list that holds only Strings
list.add("Hello");
list.add(123); // Compile-time error, as it expects a String
Optional : The introduction of Optional provided a better way to handle nulls in collection.
Immutable Collections: Java 9 introduced factory methods for creating immutable collections using the
List.of(), Set.of(), and Map.of() methods.
Collection enhancements: Java 9 introduced new methods to the Collection interface, such as:
Immutable Collections
In Java, immutable collections are collections whose elements cannot be modified after they are
created. This immutability provides several benefits, such as thread-safety, the ability to use collections
in safe parallel processing, and prevention of accidental modification of data. Starting from Java 9,
immutable collections were introduced to the Java Collections Framework. These collections are easier
to create and offer better performance for read-only data structures.
Example :
It does not create truly immutable collections; it only makes the wrapped collection unmodifiable.
If the underlying collection is modified, the changes are reflected in the unmodifiable collection.
Thread Safety: Immutable collections are inherently thread-safe since their contents cannot be
changed after they are created. This means no synchronization is needed when reading the
collection in multi-threaded applications.
Safety: They prevent unintended modifications, which makes your code more predictable and easier
to reason about.
Simplified Code: By using immutable collections, you can avoid writing extra synchronization code
and reduce the risk of errors caused by unintended state changes.
Functional Programming: Immutable collections fit well with functional programming paradigms,
where data structures are often treated as immutable objects and transformations on them return
new collections rather than modifying the original collection.