it.unimi.dsi.webgraph
Class ImmutableGraph

java.lang.Object
  extended by it.unimi.dsi.webgraph.ImmutableGraph
All Implemented Interfaces:
FlyweightPrototype<ImmutableGraph>
Direct Known Subclasses:
ArcLabelledImmutableGraph, BVGraph, ImmutableSequentialGraph, ImmutableSubgraph, UnionImmutableGraph

public abstract class ImmutableGraph
extends Object
implements FlyweightPrototype<ImmutableGraph>

A simple abstract class representing an immutable graph.

Warning: in WebGraph 2.0 the iterator architecture has been made fully lazy to achieve greater speed. The current implementation of this class uses lazy integer iterators and is thus incompatible with previous versions (which were using IntIterator). The LazyIntIterators.eager(LazyIntIterator) factory method can be used to turn lazy iterators into eager ones for easier refactoring, but we suggest some idioms that are significantly more efficient. Please read below.

Subclasses of this class are used to create and access immutable graphs, that is, graphs that are computed once for all, stored conveniently, and then accessed repeatedly. Moreover, immutable graphs are usually very large—so large that two such graphs may not fit into central memory (the main example being a sizable portion of the web).

A subclass of this class must implement methods to obtain the number of nodes, the outdegree of a node and the successors of a node (either successors(int) or successorArray(int)). Additionally, it may provide methods to obtain the number of arcs, and a basename.

This class provides equals(Object) and hashCode() methods that consider two graph equals if they have the same size and all their successor lists are equal.

Iterating on successors

Starting with WebGraph 2.0, the iterator architecture is fully lazy—you have no hasNext() method. Rather, the LazyIntIterator returned by successors(int) will return -1 when no more successors are available. The idiomatic forms for enumerating successors via iterators are

 LazyIntIterator successors = g.successors( x );
 int d = g.outdegree( x );
 while( d-- != 0 ) doSomething( successors.nextInt() );
 
and
 LazyIntIterator successors = g.successors( x );
 int t;
 while( ( t = successors.nextInt() ) != -1 ) doSomething( t );
 

The alternative method successorArray(int) provides an array containing the successors and possibly more elements. Use outdegree(int) to know how many elements are valid. The efficiency of successors(int) and successorArray(int) may vary depending on the implementation.

Refactoring for WebGraph 2+

If you want to refactor code using an older version of WebGraph for WebGraph 2+, there is a trivial way: each call to successors(int) must be wrapped into an eager (i.e., Java standard) iterator. If your code was something like

 IntIterator successors = g.successors( x );
 while( successors.hasNext() ) doSomething( successors.nextInt() );
 
it is sufficient to modify it as follows:
 IntIterator successors = LazyIntIterators.eager( g.successors( x ) );
 while( successors.hasNext() ) doSomething( successors.nextInt() );
 

If, however, you where not using hasNext(), but rather the first idiomatic form described above, changing the type of successors to LazyIntIterator will do the job.

Building an immutable graph

Due to their large size, immutable graphs have a peculiar serialisation scheme. Every subclass of this class must implement a number of static methods that create an immutable graph, given a string (usually a basename for a set of files) and, optionally, a ProgressLogger. The signatures that must be implemented are

Additionally, the following signatures can be implemented:

The special semantics associated with loadSequential() is that the graph should be loaded in a particular compact form that just allows sequential access. The special semantics associated to loadOffline() is that the immutable graph should be set up, and possibly some metadata could be read from disk, but no actual data is loaded into memory; the class should guarantee that offline sequential access (i.e., by means of nodeIterator(int)) is still possible. In other words, in most cases nodeIterator(int) will have to be overridden by the subclasses to behave properly even in an offline setting (see nodeIterator()). The special semantics associated with loadOnce() is that the graph can be traversed just once using a call to nodeIterator(). The special semantics associated with loadMapped() is that metadata could be read from disk, but the graph will be accessed by memory mapping; the class should guarantee that random access is possible.

Note that a simple class may just implement all special forms of graph loading delegating to the standard load method (see, e.g., ASCIIGraph). Specific implementations of ImmutableGraph may also decide to expose internal load methods to make it easier to write load methods for subclasses (see, e.g., loadInternal()).

Analogously, a subclass of this class may also implement

These methods must store in compressed form a given immutable graph, using the default values for compression parameters, etc. It is likely, however, that more of store methods are available, as parameters vary wildly from subclass to subclass. The method store(Class, ImmutableGraph, CharSequence, ProgressLogger) invokes by reflection the methods above on the provided class.

The standard method to build a new immutable graph is creating a (possibly anonymous) class that extends this class, and save it using a concrete subclass (e.g., BVGraph). See the source of Transform for several examples.

Properties Conventions

To provide a simple way to load an immutable graph without knowing in advance its class, the following convention may be followed: a graph with basename name may feature a Java property file name.properties with a property graphclass containing the actual class of the graph. In this case, you can use the implementation of the load/store methods contained in this class, similarly to the standard Java serialisation scheme. BVGraph, for instance, follows this convention, but ASCIIGraph does not.

The reason why this convention is not enforced is that it is sometimes useful to write lightweight classes, mostly for debugging purposes, whose graph representation is entirely contained in a single file (e.g., ASCIIGraph), so that loadOnce(InputStream) can be easily implemented.

Facilities for loading an immutable graph

ImmutableGraph provides ready-made implementations of the load methods that work as follows: they opens a property file with the given basename, and look for the graphclass property; then, they simply delegates the actual load to the specified graph class by reflection.

Thread-safety and flyweight copies

Implementations of this class need not be thread-safe. However, they implement the FlyweightPrototype pattern: the copy() method is thread-safe and will return a lightweight copy of the graph—usually, all immutable data will be shared between copies. Concurrent access to different copies is safe.

Note that by contract copy() is guaranteed to work only if randomAccess() returns true.


Nested Class Summary
static class ImmutableGraph.LoadMethod
          A list of the methods that can be used to load a graph.
 
Field Summary
static String GRAPHCLASS_PROPERTY_KEY
           
static String PROPERTIES_EXTENSION
          The standard extension of property files.
 
Constructor Summary
ImmutableGraph()
           
 
Method Summary
 CharSequence basename()
          Returns a symbolic basename for this graph (optional operation).
abstract  ImmutableGraph copy()
          Returns a flyweight copy of this immutable graph.
 boolean equals(Object o)
          Compare this immutable graph to another object.
 int hashCode()
          Returns a hash code for this immutable graph.
static ImmutableGraph load(CharSequence basename)
          Creates a new ImmutableGraph by loading a graph file from disk to memory, with all offsets, using no progress logger.
static ImmutableGraph load(CharSequence basename, ProgressLogger pl)
          Creates a new ImmutableGraph by loading a graph file from disk to memory, with all offsets, using a progress logger.
protected static ImmutableGraph load(ImmutableGraph.LoadMethod method, CharSequence basename, InputStream is, ProgressLogger pl)
          Creates a new immutable graph by loading a graph file from disk to memory, delegating the actual loading to the class specified in the graphclass property within the property file (named basename.properties).
static ImmutableGraph loadMapped(CharSequence basename)
          Creates a new ImmutableGraph by memory-mapping a graph file.
static ImmutableGraph loadMapped(CharSequence basename, ProgressLogger pl)
          Creates a new ImmutableGraph by memory-mapping a graph file.
static ImmutableGraph loadOffline(CharSequence basename)
          Creates a new ImmutableGraph by loading offline a graph file.
static ImmutableGraph loadOffline(CharSequence basename, ProgressLogger pl)
          Creates a new ImmutableGraph by loading offline a graph file.
static ImmutableGraph loadOnce(InputStream is)
          Creates a new ImmutableGraph by loading a read-once graph from an input stream.
static ImmutableGraph loadSequential(CharSequence basename)
          Creates a new ImmutableGraph by loading a graph file from disk to memory, without offsets.
static ImmutableGraph loadSequential(CharSequence basename, ProgressLogger pl)
          Creates a new ImmutableGraph by loading a graph file from disk to memory, without offsets.
 NodeIterator nodeIterator()
          Returns a node iterator for scanning the graph sequentially, starting from the first node.
 NodeIterator nodeIterator(int from)
          Returns a node iterator for scanning the graph sequentially, starting from the given node.
 long numArcs()
          Returns the number of arcs of this graph (optional operation).
abstract  int numNodes()
          Returns the number of nodes of this graph.
abstract  int outdegree(int x)
          Returns the outdegree of a node.
 IntIterator outdegrees()
          Returns an iterator enumerating the outdegrees of the nodes of this graph.
abstract  boolean randomAccess()
          Checks whether this graph provides random access to successor lists.
static void store(Class<?> graphClass, ImmutableGraph graph, CharSequence basename)
          Stores an immutable graph using a specified subclass.
static void store(Class<?> graphClass, ImmutableGraph graph, CharSequence basename, ProgressLogger pl)
          Stores an immutable graph using a specified subclass and a progress logger.
 int[] successorArray(int x)
          Returns a reference to an array containing the successors of a given node.
 LazyIntIterator successors(int x)
          Returns a lazy iterator over the successors of a given node.
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

GRAPHCLASS_PROPERTY_KEY

public static final String GRAPHCLASS_PROPERTY_KEY
See Also:
Constant Field Values

PROPERTIES_EXTENSION

public static final String PROPERTIES_EXTENSION
The standard extension of property files.

See Also:
Constant Field Values
Constructor Detail

ImmutableGraph

public ImmutableGraph()
Method Detail

numNodes

public abstract int numNodes()
Returns the number of nodes of this graph.

Albeit this method is not optional, it is allowed that this method throws an UnsupportedOperationException if this graph has never been entirely traversed using a node iterator. This apparently bizarre behaviour is necessary to support implementations as ArcListASCIIGraph, which do not know the actual number of nodes until a traversal has been completed.

Returns:
the number of nodes.

numArcs

public long numArcs()
Returns the number of arcs of this graph (optional operation).

Returns:
the number of arcs.

randomAccess

public abstract boolean randomAccess()
Checks whether this graph provides random access to successor lists.

Returns:
true if this graph provides random access to successor lists.

basename

public CharSequence basename()
Returns a symbolic basename for this graph (optional operation).

Implementors of this class may provide a basename (usually a pathname from which various files storing the graph are stemmed). This method is optional because it is sometimes unmeaningful (e.g., for one-off anonymous classes).

Returns:
the basename.

successors

public LazyIntIterator successors(int x)
Returns a lazy iterator over the successors of a given node. The iteration terminates when -1 is returned.

This implementation just wraps the array returned by successorArray(int). Subclasses are encouraged to override this implementation.

The semantics of this method has been significantly modified in WebGraph 2.0 to take advantage of the new, faster lazy architecture.

Parameters:
x - a node.
Returns:
a lazy iterator over the successors of the node.

successorArray

public int[] successorArray(int x)
Returns a reference to an array containing the successors of a given node.

The returned array may contain more entries than the outdegree of x. However, only those with indices from 0 (inclusive) to the outdegree of x (exclusive) contain valid data.

This implementation just unwraps the iterator returned by successors(int). Subclasses are encouraged to override this implementation.

Warning: all implementations must guarantee that a distinct array is returned for each node. The caller, in turn, must treat the array as a read-only object.

Parameters:
x - a node.
Returns:
an array whose first elements are the successors of the node; the array must not be modified by the caller.

outdegree

public abstract int outdegree(int x)
Returns the outdegree of a node.

Parameters:
x - a node.
Returns:
the outdegree of the given node.
Throws:
IllegalStateException - if called without offsets.

nodeIterator

public NodeIterator nodeIterator(int from)
Returns a node iterator for scanning the graph sequentially, starting from the given node.

This implementation just calls the random-access methods (successors(int) and outdegree(int)). More specific implementations may choose to maintain some extra state to make the enumeration more efficient.

Parameters:
from - the node from which the iterator will iterate.
Returns:
a NodeIterator for accessing nodes and successors sequentially.

nodeIterator

public NodeIterator nodeIterator()
Returns a node iterator for scanning the graph sequentially, starting from the first node.

Returns:
a NodeIterator for accessing nodes and successors sequentially.

copy

public abstract ImmutableGraph copy()
Returns a flyweight copy of this immutable graph.

Specified by:
copy in interface FlyweightPrototype<ImmutableGraph>
Returns:
a flyweight copy of this immutable graph.
Throws:
UnsupportedOperationException - if flyweight copies are not supported: support is guaranteed only if randomAccess() returns true.
See Also:
FlyweightPrototype

toString

public String toString()
Overrides:
toString in class Object

outdegrees

public IntIterator outdegrees()
Returns an iterator enumerating the outdegrees of the nodes of this graph.

Returns:
an iterator enumerating the outdegrees of the nodes of this graph.

loadSequential

public static ImmutableGraph loadSequential(CharSequence basename)
                                     throws IOException
Creates a new ImmutableGraph by loading a graph file from disk to memory, without offsets.

This method uses the properties convention described in the introduction.

Parameters:
basename - the basename of the graph.
Returns:
an ImmutableGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the graph.

loadSequential

public static ImmutableGraph loadSequential(CharSequence basename,
                                            ProgressLogger pl)
                                     throws IOException
Creates a new ImmutableGraph by loading a graph file from disk to memory, without offsets.

This method uses the properties convention described in the introduction.

Parameters:
basename - the basename of the graph.
pl - a progress logger used while loading the graph, or null.
Returns:
an ImmutableGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the graph.

loadOffline

public static ImmutableGraph loadOffline(CharSequence basename)
                                  throws IOException
Creates a new ImmutableGraph by loading offline a graph file.

This method uses the properties convention described in the introduction.

Parameters:
basename - the basename of the graph.
Returns:
an ImmutableGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the graph.

loadOffline

public static ImmutableGraph loadOffline(CharSequence basename,
                                         ProgressLogger pl)
                                  throws IOException
Creates a new ImmutableGraph by loading offline a graph file.

This method uses the properties convention described in the introduction.

Parameters:
basename - the basename of the graph.
pl - a progress logger used while loading the graph, or null.
Returns:
an ImmutableGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the graph.

loadMapped

public static ImmutableGraph loadMapped(CharSequence basename,
                                        ProgressLogger pl)
                                 throws IOException
Creates a new ImmutableGraph by memory-mapping a graph file.

This method uses the properties convention described in the introduction.

Parameters:
basename - the basename of the graph.
pl - a progress logger used while loading the offsets, or null.
Returns:
an ImmutableGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while memory-mapping the graph or reading the offsets.

loadMapped

public static ImmutableGraph loadMapped(CharSequence basename)
                                 throws IOException
Creates a new ImmutableGraph by memory-mapping a graph file.

This method uses the properties convention described in the introduction.

Parameters:
basename - the basename of the graph.
Returns:
an ImmutableGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while memory-mapping the graph or reading the offsets.

loadOnce

public static ImmutableGraph loadOnce(InputStream is)
                               throws IOException
Creates a new ImmutableGraph by loading a read-once graph from an input stream.

This implementation just throws a UnsupportedOperationException. There is no way to write a generic implementation, because there is no way to know in advance the class that should read the graph.

Parameters:
is - an input stream containing the graph.
Returns:
an ImmutableGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the graph.
UnsupportedOperationException - if this graph class does not support read-once graphs.

load

public static ImmutableGraph load(CharSequence basename)
                           throws IOException
Creates a new ImmutableGraph by loading a graph file from disk to memory, with all offsets, using no progress logger.

This method uses the properties convention described in the introduction.

Parameters:
basename - the basename of the graph.
Returns:
an ImmutableGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the graph.

load

public static ImmutableGraph load(CharSequence basename,
                                  ProgressLogger pl)
                           throws IOException
Creates a new ImmutableGraph by loading a graph file from disk to memory, with all offsets, using a progress logger.

This method uses the properties convention described in the introduction.

Parameters:
basename - the basename of the graph.
pl - a progress logger used while loading the graph, or null.
Returns:
an ImmutableGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the graph.

load

protected static ImmutableGraph load(ImmutableGraph.LoadMethod method,
                                     CharSequence basename,
                                     InputStream is,
                                     ProgressLogger pl)
                              throws IOException
Creates a new immutable graph by loading a graph file from disk to memory, delegating the actual loading to the class specified in the graphclass property within the property file (named basename.properties). The exact load method to be used depends on the method argument.

This method uses the properties convention described in the introduction.

Parameters:
method - the method to be used to load the graph.
basename - the basename of the graph, if method is not ImmutableGraph.LoadMethod.ONCE.
is - an input stream the containing the graph, if method is ImmutableGraph.LoadMethod.ONCE.
pl - the progress logger; it can be null.
Returns:
an ImmutableGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the graph.

store

public static void store(Class<?> graphClass,
                         ImmutableGraph graph,
                         CharSequence basename,
                         ProgressLogger pl)
                  throws IOException
Stores an immutable graph using a specified subclass and a progress logger.

This method is a useful shorthand that invoke by reflection the store method of a given subclass. Note, however, that usually a subclass will provide more refined store methods with more parameters.

Parameters:
graphClass - the subclass of ImmutableGraph that should store the graph.
graph - the graph to store.
basename - the basename.
pl - a progress logger, or null.
Throws:
IOException

store

public static void store(Class<?> graphClass,
                         ImmutableGraph graph,
                         CharSequence basename)
                  throws IOException
Stores an immutable graph using a specified subclass.

Parameters:
graphClass - the subclass of ImmutableGraph that should store the graph.
graph - the graph to store.
basename - the basename.
Throws:
IOException
See Also:
store(Class, ImmutableGraph, CharSequence, ProgressLogger)

equals

public boolean equals(Object o)
Compare this immutable graph to another object.

Overrides:
equals in class Object
Returns:
true iff the given object is an immutable graph of the same size, and the successor list of every node of this graph is equal to the successor list of the corresponding node of o.

hashCode

public int hashCode()
Returns a hash code for this immutable graph.

Overrides:
hashCode in class Object
Returns:
a hash code for this immutable graph.