java中的Set集合和List集合一样都继承Collection的接口,Set集合是一个不包含重复元素的集合,主要包含三种存放数据类型的变量,分别是HashSet、LinkedHashSet和TreeSet,和List集合一样,Set集合也是不安全类,在并发情况修改下会报:java.util.ConcurrentModificationException(并发修改异常);
/** * Adds the specified element to this set if it is not already present. * More formally, adds the specified element <tt>e</tt> to this set if * this set contains no element <tt>e2</tt> such that * <tt>(e==null ? e2==null : e.equals(e2))</tt>. * If this set already contains the element, the call leaves the set * unchanged and returns <tt>false</tt>. * * @param e element to be added to this set * @return <tt>true</tt> if this set did not already contain the specified * element */ public boolean add(E e) { return map.put(e, PRESENT)==null; }
查看HashSet的源码可知,HashSet并不是一个线程安全的方法,这里还发现了一个有趣的事情,就是HashSet的地城是通过HashMap来实现,这个最后再说。因为HashSet为线程不安全的方法,在并发情况下就会产生java.util.ConcurrentModificationException,之所以add方法会是线程不安全方法,我猜想应该和List集合是一样的,为了提高在对线程安全要求不苛刻情况下,集合处理数据效率。
解决方案:
1.通过使用Collections提供的synchronizedSet方法来获取Set集合:
/** * Returns a synchronized (thread-safe) set backed by the specified * set. In order to guarantee serial access, it is critical that * <strong>all</strong> access to the backing set is accomplished * through the returned set.<p> * * It is imperative that the user manually synchronize on the returned * set when iterating over it: * <pre> * Set s = Collections.synchronizedSet(new HashSet()); * ... * synchronized (s) { * Iterator i = s.iterator(); // Must be in the synchronized block * while (i.hasNext()) * foo(i.next()); * } * </pre> * Failure to follow this advice may result in non-deterministic behavior. * * <p>The returned set will be serializable if the specified set is * serializable. * * @param <T> the class of the objects in the set * @param s the set to be "wrapped" in a synchronized set. * @return a synchronized view of the specified set. */ public static <T> Set<T> synchronizedSet(Set<T> s) { return new SynchronizedSet<>(s); }
我们通过源码可以看到,synchronizedSet方法是生成一个叫SynchronizedSet集合类。
/** * @serial include */ static class SynchronizedSet<E> extends SynchronizedCollection<E> implements Set<E> { private static final long serialVersionUID = 487447009682186044L; SynchronizedSet(Set<E> s) { super(s); } SynchronizedSet(Set<E> s, Object mutex) { super(s, mutex); } public boolean equals(Object o) { if (this == o) return true; synchronized (mutex) {return c.equals(o);} } public int hashCode() { synchronized (mutex) {return c.hashCode();} } }
但是我们会奇怪的发现这个类它没有最集合操作的方法,但是它继承了一个叫SynchronizedCollection。这个父类才是具体对集合操作的类,也是真正实现线程安全的类。
public boolean add(E e) { synchronized (mutex) {return c.add(e);} }
以add方法为例,通过synchronized来实现线程操作的安全。
2.通过通过读写复制的方法来创建Set集合:
通过new CopyOnWriteArraySet方法来创建Set集合,这种方法之所以能实现线程安全,和List的读写复制的原理是一样的,我们通过底层源码可知,它就是复用List的CopyOnWriteArrayList类。
/** * Creates an empty set. */ public CopyOnWriteArraySet() { al = new CopyOnWriteArrayList<E>(); }
补充:
之前谈到的HashSet的底层其实是通过HashMap实现。
/** * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has * default initial capacity (16) and load factor (0.75). */ public HashSet() { map = new HashMap<>(); }
从HashSet的构造方法来看,我们new一个HashSet方法实际上是new了一个HashMap,在我们的印象里Set集合相比较List集合来说,Set集合不允许有重复,其他和List差不多,跟Map差很多,但是HashSet却使用的是HashMap实现的。
/** * Adds the specified element to this set if it is not already present. * More formally, adds the specified element <tt>e</tt> to this set if * this set contains no element <tt>e2</tt> such that * <tt>(e==null ? e2==null : e.equals(e2))</tt>. * If this set already contains the element, the call leaves the set * unchanged and returns <tt>false</tt>. * * @param e element to be added to this set * @return <tt>true</tt> if this set did not already contain the specified * element */ public boolean add(E e) { return map.put(e, PRESENT)==null; }
我们可以从add方法来看,其实HashSet只是利用了HashMap的key唯一这个特性,也就是只用了它一半,使用key来存储数据,而value统一存储成了PRESENT,而PRESENT就是一个Object的对象。
// Dummy value to associate with an Object in the backing Map private static final Object PRESENT = new Object();