public class FileHashMap<K,V>
extends java.util.AbstractMap<K,V>
FileHashMap implements a java.util.Map that keeps the keys in memory, but stores the values as serialized objects in a random access disk file. When an application attempts to access the value associated with a given key, the FileHashMap object determines the location of the serialized value object, seeks to that location in the random access file, and reads and deserializes the value object. In a sense, a FileHashMap is akin to a simple, classic indexed sequential file. This approach gives a FileHashMap object the following characteristics:
File Name Conventions
A FileHashMap is instantiated with a base file name (i.e., a file name without an extension). This base file name specifies the path to the file(s) that used to store the map's data; the FileHashMap tacks the following extensions onto the prefix to arrive at the actual file names.
Extensions and Meanings Extension Meaning .ix The saved in-memory index. This file is created only if the FileHashMap is not marked as transient. (See below.) The INDEX_FILE_SUFFIX
constant defines this string..db The on-disk data (value) file, where the serialized objects are stored. The DATA_FILE_SUFFIX
constant defines this string.
For instance, if you create a FileHashMap with the statement
Map map = new FileHashMap("/tmp/mymap");
the serialized value objects will be stored in file "/tmp/mymap.db", and the index (if saved) will be stored in "/tmp/mymap.ix".
Transient versus Persistent Maps
A FileHashMap is persistent by default. When a
FileHashMap object is finalized or explicitly closed
(using the close()
method), its in-memory index is saved
to disk. You can reopen the saved map by instantiating another
FileHashMap object, and specifying the same file prefix. The
new FileHashMap object will load its initial in-memory index
from the saved index; any modifications to the new object will be
written back to the on-disk index with the new object finalized or
closed.
A FileHashMap can be marked as non-persistent, or transient,
by passing the TRANSIENT
flag to the
constructor. A transient map is not saved; its disk files are
removed when the map is finalized or manually closed. You cannot create
a transient map using a file prefix that specifies an existing, saved
map. That is, the disk files for a transient map must not exist at the
time the map is created.
Optimizations
The FileHashMap class attempts to optimize access to the disk-resident values and to conserve memory use. These optimizations include the following specific measures.
Sequential access to values
The iterators attempt to minimize file seeking while looping over the
stored values. The values()
method returns a
Set whose iterator loops through the values in the order they
were written to the data file. Traversing the value set via the iterator
will access the FileHashMap's data file sequentially, from top
to bottom. For example, the following code fragment loops through a
FileHashMap's values; because the iterator returns the values
in the order they appear in the data file, the code fragment accesses
the data file sequentially:
FileHashMap map = new FileHashMap("myfile"); for (Iterator values = map.values().iterator(); values.hasNext(); ) { Object value = it.next(); System.out.println(value); }
Similarly, the Set objects returned by the
keySet()
and entrySet()
methods use iterators that return the keys in the order that
the associated values appear in the disk file. For example, the following
code fragment also accesses the data file sequentially:
FileHashMap map = new FileHashMap("myfile"); for (Iterator keys = map.keySet().iterator(); keys.hasNext(); ) { Object key = it.next(); Object value = map.get(key); System.out.println("key=" + key + ", value=" + value); }
Note that this optimization strategy can be foiled in a number of ways. For instance, the sequential access behavior can be thwarted if a second thread is accessing the map while the first thread is iterating over it, or if the thread that's iterating over the values is simultaneously inserting values in the table.
Accessing the keys, without reading the values, does not result in any file access at all, because the keys are cached in memory. See the next section for more details.
Memory Conservation
The values stored in the map are serialized and written to a data file; they are not cached in memory at all. Therefore, you can store a lot more objects in a FileHashMap before running out of memory (assuming you have enough disk space available).
The keys are stored in memory, however. Each key is associated with a special internal object that keeps track of the location and length of the associated value in the disk file. As you place more items in a FileHashMap, the index will grow, consuming memory. But it will consume far less memory than if you use an in-memory Map, such as a java.util.HashMap, which stores the keys and the values in memory.
The Iterator and Set objects returned by the
entrySet()
and values()
methods contain proxy objects, not real values. That is, they do
not actually contain any values. Instead, they contain proxies
for the values, objects that specify the locations of the values in the
disk file. A value is loaded from disk only when you actually attempt to
retrieve it from the Iterator or Set.
Reclaiming Gaps in the File
Normally, when you remove an object from the map, the space where the
object was stored in the data file is not reclaimed. This strategy allows
for faster insertions, since new objects are always added to the end of
the disk file. However, for long-lived FileHashMap objects that
are periodically modified, this strategy may not be appropriate. For that
reason, you can pass a special RECLAIM_FILE_GAPS
flag to the
constructor. If specified, the flag tells the object to keep track of
"gaps" in the file, and reuse them if possible. When a new object is
inserted into the map, and RECLAIM_FILE_GAPS
is enabled, the
object will attempt to find the smallest unused area in the file to
accommodate the new object. It will only add the new object to the end
of the file (enlarging the file) if it cannot find a suitable gap.
This mode is not the default, because it can add time to processing.
However, it does not access the file at all; the file gap maintenance
logic uses in-memory data only. So, while it adds a small amount of
computational overhead, the difference between running with
RECLAIM_FILE_GAPS
enabled and running with it disabled should not
be dramatic.
Restrictions
This class currently has the following restrictions and unimplemented behavior.
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
DATA_FILE_SUFFIX
Data file suffix.
|
static int |
FORCE_OVERWRITE
Constructor flag value: Ignored unless
TRANSIENT is also
specified, FORCE_OVERWRITE tells the constructor to
overwrite the on-disk files if they already exist, instead of
throwing an exception. |
static java.lang.String |
INDEX_FILE_SUFFIX
Index file suffix.
|
static int |
NO_CREATE
Constructor flag value: If specified, the disk files will not be
created if they don't exist; instead, the constructor will throw an
exception.
|
static int |
RECLAIM_FILE_GAPS
Constructor flag value: Tells the object to reclaim unused space in
the file, whenever possible.
|
static int |
TRANSIENT
Constructor flag value: If specified, the on-disk hash table is
considered to be transient and will be removed when this object is
closed or finalized.
|
Constructor and Description |
---|
FileHashMap()
Create a new transient FileHashMap object.
|
FileHashMap(java.lang.String tempFilePrefix)
Create a new transient FileHashMap object.
|
FileHashMap(java.lang.String pathPrefix,
int flags)
Create a new FileHashMap object that will read its data
from and/or store its data in files derived from the specified
prefix.
|
Modifier and Type | Method and Description |
---|---|
void |
clear()
Removes all mappings from this map.
|
void |
close()
Close this map.
|
boolean |
containsKey(java.lang.Object key)
Returns true if this map contains a mapping for the
specified key.
|
boolean |
containsValue(java.lang.Object value)
Returns true if this map maps one or more keys that are
mapped to the specified value.
|
void |
delete()
Deletes the files backing this FileHashMap.
|
java.util.Set<java.util.Map.Entry<K,V>> |
entrySet()
Returns a "thin" set view of the mappings contained in this map.
|
boolean |
equals(java.lang.Object o)
Compares the specified object with this map for equality.
|
protected void |
finalize()
Finalizer
|
V |
get(java.lang.Object key)
Returns the value associated with the the specified key.
|
int |
hashCode()
Returns the hash code value for this map.
|
boolean |
isValid()
Determine whether the FileHashMap is valid or not.
|
java.util.Set<K> |
keySet()
Returns a Set containing all the keys in this map.
|
V |
put(K key,
V value)
Associates the specified value with the specified key in this map.
|
V |
remove(java.lang.Object key)
Removes the mapping for this key from this map, if present.
|
void |
save()
Save any in-memory index changes to disk without closing the map.
|
int |
size()
Returns the number of key-value mappings in this map.
|
java.util.Collection<V> |
values()
Returns a collection view of the values contained in this map.
|
public static final java.lang.String INDEX_FILE_SUFFIX
public static final java.lang.String DATA_FILE_SUFFIX
public static final int NO_CREATE
public static final int TRANSIENT
NO_CREATE
. Also, if TRANSIENT is specified to the
constructor, but the on-disk files already exist, the constructor
will throw an exception unless FORCE_OVERWRITE
is also
specified.public static final int FORCE_OVERWRITE
TRANSIENT
is also
specified, FORCE_OVERWRITE tells the constructor to
overwrite the on-disk files if they already exist, instead of
throwing an exception.public static final int RECLAIM_FILE_GAPS
public FileHashMap() throws java.io.IOException
Create a new transient FileHashMap object. The transient
disk files will be in the normal temporary directory and will have
the prefix "FileHashMap". The names of the associated disk files are
guaranteed to be unique, and will go away when the object is
finalized or when the Java VM goes away. Call this constructor is
identical to calling FileHashMap(String)
with a null
filePrefix parameter.
Note that this constructor implicitly sets the TRANSIENT
flag.
java.io.IOException
- Unable to create temp fileFileHashMap(String,int)
,
FileHashMap(String)
public FileHashMap(java.lang.String tempFilePrefix) throws java.io.IOException
Create a new transient FileHashMap object. The transient disk files will be in the normal temporary directory and will have the specified temporary file prefix. If no temporary file prefix is specified (i.e., the tempFilePrefix parameter is null), then "fhm" will be used. The names of the associated disk files are guaranteed to be unique, and they will go away when the object is finalized or when the Java VM goes away.
Note that this constructor implicitly sets the TRANSIENT
flag.
tempFilePrefix
- the prefix to use with the temporary file, or
null for default prefix "fhm"java.io.IOException
- Unable to create temp fileFileHashMap(String,int)
,
FileHashMap()
public FileHashMap(java.lang.String pathPrefix, int flags) throws java.io.FileNotFoundException, ObjectExistsException, java.lang.ClassNotFoundException, VersionMismatchException, java.io.IOException
Create a new FileHashMap object that will read its data from and/or store its data in files derived from the specified prefix. The prefix is a file prefix onto which data and index suffixes will be appended.
Values for the flag parameter
The flags parameter consists of a set of logically OR'd constants.
If (flags & NO_CREATE) is zero, the constructor will attempt to create the index files if they don't exist. Otherwise, it expects to find a saved hash map, and it will throw an exception if the files are not present.
If (flags & TRANSIENT) is non-zero, the constructor will attempt to create a transient map. A transient map is not saved; its disk files are removed when the map is finalized or manually closed. You cannot create a transient map using a file prefix that specifies an existing, saved map. Specifying the TRANSIENT flag causes any NO_CREATE flag to be ignored.
For example, the following statement creates a transient FileHashMap:
Map map = new FileHashMap ("/my/temp/dir", FileHashMap.TRANSIENT);
whereas this statements opens a persistent FileHashMap, creating it if it doesn't already exist:
Map map = new FileHashMap ("/my/map/dir", FileHashMap.CREATE);
pathPrefix
- The pathname prefix to the files to be usedflags
- Flags that control the disposition of the files.
A value of 0 means no flags are set. See above
for details.java.io.FileNotFoundException
- The specified hash files do not
exist, and the NO_CREATE
flag was specified.java.lang.ClassNotFoundException
- Failed to deserialize an objectVersionMismatchException
- Bad or unsupported version stamp
in FileHashMap index fileObjectExistsException
- One or both of the files already
exist, but the TRANSIENT
flag was set and the
FORCE_OVERWRITE
flag was
not set.java.io.IOException
- Other errorsNO_CREATE
,
TRANSIENT
,
FileHashMap()
,
FileHashMap(String)
protected void finalize() throws java.lang.Throwable
finalize
in class java.lang.Object
java.lang.Throwable
- on errorpublic void clear()
Removes all mappings from this map. The data file is cleared by closing it, deleting it, and reopening it. If an I/O error occurs at any point, this object will be closed and marked invalid.
public void close() throws java.io.NotSerializableException, java.io.IOException
Close this map. If the map is not marked as transient, this method saves the index to disk. Otherwise, it removes both the index file and data file.
java.io.NotSerializableException
- Can't save index (bug)java.io.IOException
- Error writing indexsave()
public boolean containsKey(java.lang.Object key)
Returns true if this map contains a mapping for the specified key. Since the keys are cached in an in-memory index, this method does not have to access the data file, and is fairly cheap.
public boolean containsValue(java.lang.Object value)
Returns true if this map maps one or more keys that are mapped to the specified value. Because it must iterate through some portion of the on-disk value set, this method can be slow.
public void delete()
close()
.public java.util.Set<java.util.Map.Entry<K,V>> entrySet()
Returns a "thin" set view of the mappings contained in this map. Each element in the returned set is a Map.Entry; each Map.Entry contains a key and a proxy for the associated value. The map's values themselves are not loaded into memory until requested via a call to Map.Entry.getValue().
The set is backed by the map, so changes to the map are reflected in the set. If the map is modified while an iteration over the set is in progress, the results of the iteration are undefined. Neither the set nor its associated iterator supports any of the set-modification methods (e.g., add(), remove(), etc). If you attempt to call any of those methods, the called method will throw an UnsupportedOperationException.
NOTE: Calling this method on an invalid or closed FileHashMap will result in an exception.
public boolean equals(java.lang.Object o)
Compares the specified object with this map for equality. Returns true if the given object is also a map and the two Maps represent the same mappings. More formally, two maps t1 and t2 represent the same mappings if t1.entrySet().equals(t2.entrySet()). This ensures that the equals method works properly across different implementations of the Map interface.
Warning:: Because this method must compare the actual values stored in the map, and the values in a file, this method can be quite slow.
public V get(java.lang.Object key)
Returns the value associated with the the specified key. Returns null if the map contains no mapping for this key.
get
in interface java.util.Map<K,V>
get
in class java.util.AbstractMap<K,V>
key
- key whose associated value is to be returned.java.lang.ClassCastException
- if the key is of an inappropriate type for
this map.java.lang.NullPointerException
- key is nullcontainsKey(Object)
public int hashCode()
Returns the hash code value for this map. The hash code of a map is defined to be the sum of the hash codes of each entry in the map's entrySet() view. This ensures that t1.equals(t2) implies that t1.hashCode()==t2.hashCode() for any two maps t1 and t2, as required by the general contract of Object.hashCode.
Warning:: The recommended hash code for each entry in the entrySet() view is a combination of hash codes for the entry's key and the entry's value. (See java.util.Map.Entry for details.) Because the values in a FileHashMap object are stored in a file, this method can be quite slow.
hashCode
in interface java.util.Map<K,V>
hashCode
in class java.util.AbstractMap<K,V>
Object.hashCode()
,
Object.equals(Object)
,
equals(Object)
public boolean isValid()
public java.util.Set<K> keySet()
Returns a Set containing all the keys in this map. Since the keys are cached in memory, this method is relatively efficient. The keys are returned in an order that optimizes sequential access to the data file; see the Optimizations section in the main class documentation for details.
The set is backed by the map, so changes to the map are reflected in the set. If the map is modified while an iteration over the set is in progress, the results of the iteration are undefined. Neither the set nor its associated iterator supports any of the set-modification methods (e.g., add(), remove(), etc). If you attempt to call any of those methods, the called method will throw an UnsupportedOperationException.
public V put(K key, V value) throws java.lang.ClassCastException, java.lang.IllegalArgumentException, java.lang.NullPointerException
Associates the specified value with the specified key in this map. If the map previously contained a mapping for this key, the old value is replaced. This map class does not permit a null value to be stored.
put
in interface java.util.Map<K,V>
put
in class java.util.AbstractMap<K,V>
key
- key with which the specified value is to be associated.value
- value to be associated with the specified key.java.lang.ClassCastException
- if the class of the specified key or
value prevents it from being stored
in this map.java.lang.IllegalArgumentException
- Value not serializable, or I/O error
while attempting to serialize value.java.lang.NullPointerException
- the specified key or value is
null.public V remove(java.lang.Object key)
Removes the mapping for this key from this map, if present. Note: The space occupied by the serialized value in the data file is not coalesced at this point.
public void save() throws java.io.IOException, java.io.NotSerializableException
Save any in-memory index changes to disk without closing the map. You can call this method even if the map is marked as temporary; on a temporary map, save() simply returns without doing anything.
java.io.IOException
- Error saving changes to disk.java.io.NotSerializableException
- Can't save index because it contains
one or more objects that cannot be
serialized.close()
public int size()
Returns the number of key-value mappings in this map. If the map contains more than Integer.MAX_VALUE elements, returns Integer.MAX_VALUE. This method queries the in-memory key index, so it is relatively efficient.
public java.util.Collection<V> values()
Returns a collection view of the values contained in this map. The returned collection is a "thin" view of the values contained in this map. The collection contains proxies for the actual disk-resident values; the values themselves are not loaded until a Collection method such as contains() is called.
The collection is backed by the map, so changes to the map are reflected in the set. If the map is modified while an iteration over the set is in progress, the results of the iteration are undefined. The set does not support any of the add() methods.
Warning:: The toArray() methods can be dangerous, since they will attempt to load every value from the data file into an in-memory array.