it.unimi.dsi.webgraph
Class BVGraph

java.lang.Object
  extended by it.unimi.dsi.webgraph.ImmutableGraph
      extended by it.unimi.dsi.webgraph.BVGraph
All Implemented Interfaces:
FlyweightPrototype<ImmutableGraph>, CompressionFlags

public class BVGraph
extends ImmutableGraph
implements CompressionFlags

An immutable graph represented using the techniques described in “The WebGraph Framework I: Compression Techniques”, by Paolo Boldi and Sebastiano Vigna, in Proc. of the Thirteenth World–Wide Web Conference, pages 595−601, 2004, ACM Press.

This class provides a flexible and configurable way to store and access web graphs in a compressed form. Its main method can load an ImmutableGraph and compress it. The resulting compressed BVGraph is described by a graph file (with extension .graph), an offset file (with extension .offsets) and a property file (with extension .properties). The latter, not surprisingly, is a Java property file. Optionally, an offset big-list file (with extension .obl) can be created to load graphs faster.

As a rule of thumb, random access is faster using successors(int), whereas while iterating using a NodeIterator it is better to use NodeIterator.successorArray().

The Graph File

This class stores a graph as an bit stream. The bit stream format depends on a number of parameters and encodings that can be mixed orthogonally. The parameters are:

Successor Lists

The graph file is a sequence of successor lists, one for each node. The list of node x can be thought of as a sequence of natural numbers (even though, as we will explain later, this sequence is further coded suitably as a sequence of bits):

  1. The outdegree of the node; if it is zero, the list ends here.
  2. If the window size is not zero, the reference part, that is:
    1. a nonnegative integer, the reference, which never exceeds the window size; if the reference is r, the list of successors will be specified as a modified version of the list of successors of xr; if r is 0, then the list of successors will be specified explicitly;
    2. if r is nonzero:
      1. a natural number b, the block count;
      2. a sequence of b natural numbers B1, …, Bb, called the copy-block list; only the first number can be zero.
  3. Then comes the extra part, specifying some more entries that the list of successors contains (or all of them, if r is zero), that is:
    1. If the minimum interval length is finite,
      1. an integer i, the interval count;
      2. a sequence of i pairs, whose first component is the left extreme of an interval, and whose second component is the length of the interval (i.e., the number of integers contained in the interval).
    2. Finally, the list of residuals, which contain all successors not specified by previous methods.

The above data should be interpreted as follows:

How Successor Lists Are Coded

As we said before, the list of integers corresponding to each successor list should be coded into a sequence of bits. This is (ideally) done in two phases: we first modify the sequence in a suitable manner (as explained below) so to obtain another sequence of integers (some of them might be negative). Then each single integer is coded, using a coding that can be specified as an option; the integers that may be negative are first turned into natural numbers using Fast.int2nat(int).

  1. The outdegree of the node is left unchanged, as well as the reference and the block count;
  2. all blocks are decremented by 1, except for the first one;
  3. the interval count is left unchanged;
  4. all interval lengths are decremented by the minimum interval length;
  5. the first left extreme is expressed as its difference from x (it will be negative if the first extreme is less than x); the remaining left extremes are expressed as their distance from the previous right extreme plus 2 (e.g., if the interval is [5..11] and the previous one was [1..3], then the left extreme 5 is expressed as 5-(3+2)=5-5=0);
  6. the first residual is expressed as its difference from x (it will be negative if the first residual is less than x); the remaining residuals are expressed as decremented differences from the previous residual.

The Offset File

Since the graph is stored as a bit stream, we must have some way to know where each successor list starts. This information is stored in the offset file, which contains the bit offset of each successor list (in particular, the offset of the first successor list will be zero). As a commodity, the offset file contains an additional offset pointing just after the last successor list (providing, as a side-effect, the actual bit length of the graph file). Each offset (except for the first one) is stored as a suitably coded difference from the previous offset.

The list of offsets can be additionally stored as a serialised EliasFanoMonotoneLongBigList using a suitable command-line option. If the serialised big list is detected, it is loaded instead of parsing the offset list.

The Property File

This file contains self-explaining entries that are necessary to correctly decode the graph and offset files, and moreover give some statistical information about the compressed graph (e.g., the number of bits per link).

nodes
the number of nodes of the graph.
nodes
the number of arcs of the graph.
version
a version number.
graphclass
the name of the class that should load this graph (ImmutableGraph convention).
bitsperlink
the number of bits per link (overall graph size in bits divided by the number of arcs).
bitspernode
the number of bits per node (overall graph size in bits divided by the number of nodes).
compratio
the ratio between the graph size and the information-theoretical lower bound (the binary logarithm of the number of subsets of size arcs out of a universe of nodes2 elements).
compressionflags
flags specifying the codes used for the components of the compression algorithm.
zetak
if ζ codes are selected for residuals, the parameter k.
windowsize
the window size.
maxref
the maximum reference count.
minintervallength
the minimum length of an interval.
avgdist
the average distance of a reference.
avgref
the average length of reference chains.
bitsfor*
number of bits used by a specific compoenent of the algorithm (the sum is the number of bits used to store the graph).
avgbitsfor*
number of bits used by a specific compoenent of the algorithm, divided by the number of nodes (the sum is the number of bits per node).
*arcs
the number of arcs stored by each component of the algorithm (the sum is the number of arcs).
*expstats
frequencies of the floor of the logarithm of successor gaps and residual gaps, separated by a comma; the statistics include the gap between each node and its first successor, after it has been passed through Fast.int2nat(int), but discarding zeroes (which happen in very rare circumstance, and should be considered immaterial).
*avg[log]gap
the average of the gaps (or of their logarithm) of successors and residuals: note that this data is computed from the exponential statistics above, and thus it is necessarily approximate.

How The Graph File Is Loaded Into Memory

The natural way of using a graph file is to load it into a byte array and then index its bits using the suitable offset. This class will use a byte array for graphs smaller than Integer.MAX_VALUE bytes, and a FastMultiByteArrayInputStream otherwise: in the latter case, expect a significant slowdown (as an InputBitStream can wrap directly a byte array).

Offsets are loaded using an EliasFanoMonotoneLongBigList, which occupies exponentially less space than the graph itself (unless your graph is pathologically sparse). There is of course a cost involved in accessing the list with respect to accessing an array of longs.

Note that by default the EliasFanoMonotoneLongBigList instance is created from scratch using the file of offsets. This is a long and tedious process, in particular with large graphs. The main method of this class has an option that will generate such a list once for all and serialise it in a file with extension .obl. The list will be quickly deserialised if its modification date is later than that of the offset file.

Optionally, this class may load no offsets at all (see loadSequential(CharSequence). In this case, the only way to access the graph is by creating a ImmutableGraph.nodeIterator().

Not Loading the Graph File at All

For some applications (such as transposing a graph) it is not necessary to load the graph file in memory. Since this class is able to enumerate the links of a graph without using random access, it is possible not to load in memory any information at all, and obtain iterators that directly read from the graph file. To obtain this effect, you must call loadOffline(CharSequence).

Memory–Mapping a Graph

Another interesting alternative is memory mapping. When using loadMapped(CharSequence), the graph will be mapped into memory, and the offsets loaded. The graph will provide random access and behave as if it was loaded into memory, but of course the access will be slower.


Nested Class Summary
 
Nested classes/interfaces inherited from class it.unimi.dsi.webgraph.ImmutableGraph
ImmutableGraph.LoadMethod
 
Field Summary
protected  CharSequence basename
          The basename of the graph.
static int BLOCK_COUNT_DELTA
          Flag: write block counts using δ coding.
static int BLOCK_COUNT_GAMMA
          Flag: write block counts using γ coding (default).
static int BLOCK_COUNT_UNARY
          Flag: write block counts using unary coding.
protected  int blockCoding
          The coding for copy-block lists.
protected  int blockCountCoding
          The coding for block counts.
static int BLOCKS_DELTA
          Flag: write copy-block lists using δ coding.
static int BLOCKS_GAMMA
          Flag: write copy-block lists using γ coding (default).
static int BVGRAPH_VERSION
          This number classifies the present graph format.
static int DEFAULT_MAX_REF_COUNT
          Default backward reference maximum length.
static int DEFAULT_MIN_INTERVAL_LENGTH
          Default minimum interval length.
static int DEFAULT_WINDOW_SIZE
          Default window size.
static int DEFAULT_ZETA_K
          Default value of k.
static String GRAPH_EXTENSION
          The standard extension for the graph bit stream.
protected  byte[] graphMemory
          The byte array storing the compressed graph, if isMemory is true and offsetType is not -1.
protected  FastMultiByteArrayInputStream graphStream
          The multi-byte array input stream storing the compressed graph, if isMemory is false, isMapped is false and offsetType is not -1.
protected static int INITIAL_SUCCESSOR_LIST_LENGTH
          The initial length of an array that will contain a successor list.
protected  boolean isMapped
          When offsetType is not -1, whether this graph is directly loaded into graphMemory, or rather memory-mapped.
protected  boolean isMemory
          When offsetType is not -1, whether this graph is directly loaded into graphMemory, or rather wrapped in a FastMultiByteArrayInputStream specified by graphStream.
protected  long m
          The number of arcs of the graph.
protected  ByteBufferInputStream mappedGraphStream
          The memory-mapped input stream storing the compressed graph, if isMapped is true.
protected  int maxRefCount
          The maximum reference count.
protected  int minIntervalLength
          The minimum interval length.
protected  int n
          The number of nodes of the graph.
protected static int NO_INTERVALS
          A special value for minIntervalLength interpreted as meaning that the minimum interval length is infinity.
static int OFFLINE
          The offset step parameter corresponding to offline load.
protected  int offsetCoding
          The coding for offsets.
protected  LongBigList offsets
          This variable is null iff offsetType is zero or less (implying that offsets have not been loaded).
static String OFFSETS_BIG_LIST_EXTENSION
          The standard extension for the cached LongBigList containing the graph offsets.
static int OFFSETS_DELTA
          Flag: write offsets using δ coding.
static String OFFSETS_EXTENSION
          The standard extension for the graph-offsets bit stream.
static int OFFSETS_GAMMA
          Flag: write offsets using γ coding (default).
protected  int offsetType
          The offset type: 2 is memory-mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that the we do not want even load the graph file.
protected  int outdegreeCoding
          The coding for outdegrees.
protected  InputBitStream outdegreeIbs
          A bit stream wrapping graphMemory, or graphStream, used only by outdegree(int).
static int OUTDEGREES_DELTA
          Flag: write outdegrees using δ coding.
static String OUTDEGREES_EXTENSION
          The standard extension for the stream of node outdegrees.
static int OUTDEGREES_GAMMA
          Flag: write outdegrees using γ coding (default).
protected  int referenceCoding
          The coding for references.
static int REFERENCES_DELTA
          Flag: write references using δ coding.
static int REFERENCES_GAMMA
          Flag: write references using γ coding.
static int REFERENCES_UNARY
          Flag: write references using unary coding (default).
protected  int residualCoding
          The coding for residuals.
static int RESIDUALS_DELTA
          Flag: write residuals using δ coding.
static int RESIDUALS_GAMMA
          Flag: write residuals using γ coding.
static int RESIDUALS_GOLOMB
          Flag: write residuals using &golomb; coding.
static int RESIDUALS_NIBBLE
          Flag: write residuals using variable-length nibble coding.
static int RESIDUALS_ZETA
          Flag: write residuals using ζk coding (default).
static int SEQUENTIAL
          The offset step parameter corresponding to sequential load.
protected  int windowSize
          The window size.
protected  int zetaK
          The value of k for ζk coding (for residuals).
 
Fields inherited from class it.unimi.dsi.webgraph.ImmutableGraph
GRAPHCLASS_PROPERTY_KEY, PROPERTIES_EXTENSION
 
Fields inherited from interface it.unimi.dsi.webgraph.CompressionFlags
CODING_NAME, DELTA, GAMMA, GOLOMB, NIBBLE, SKEWED_GOLOMB, UNARY, ZETA
 
Constructor Summary
protected BVGraph()
           
 
Method Summary
 CharSequence basename()
          Returns a symbolic basename for this graph (optional operation).
 BVGraph copy()
          Returns a flyweight copy of this immutable graph.
protected static int intervalize(IntArrayList x, int minInterval, IntArrayList left, IntArrayList len, IntArrayList residuals)
          This method tries to express an increasing sequence of natural numbers x as a union of an increasing sequence of intervals and an increasing sequence of residual elements.
static BVGraph load(CharSequence basename)
          Creates a new BVGraph by loading a compressed graph file from disk to memory, with no progress logger and all offsets.
static BVGraph load(CharSequence basename, int offsetType)
          Creates a new BVGraph by loading a compressed graph file from disk to memory, with no progress logger.
static BVGraph load(CharSequence basename, int offsetType, ProgressLogger pl)
          Creates a new BVGraph by loading a compressed graph file from disk to memory.
static BVGraph load(CharSequence basename, ProgressLogger pl)
          Creates a new BVGraph by loading a compressed graph file from disk to memory, with all offsets.
protected  BVGraph loadInternal(CharSequence basename, int offsetType, ProgressLogger pl)
          Loads a compressed graph file from disk into this graph.
static BVGraph loadMapped(CharSequence basename)
          Creates a new BVGraph by memory-mapping a graph file.
static BVGraph loadMapped(CharSequence basename, ProgressLogger pl)
          Creates a new BVGraph by memory-mapping a graph file.
static BVGraph loadOffline(CharSequence basename)
          Creates a new BVGraph by loading just the metadata of a compressed graph file.
static BVGraph loadOffline(CharSequence basename, ProgressLogger pl)
          Creates a new BVGraph by loading just the metadata of a compressed graph file.
static BVGraph loadSequential(CharSequence basename)
          Creates a new BVGraph by loading a compressed graph file from disk to memory, with no progress logger and without offsets.
static BVGraph loadSequential(CharSequence basename, ProgressLogger pl)
          Creates a new BVGraph by loading a compressed graph file from disk to memory, without offsets.
static void main(String[] args)
          Reads an immutable graph and stores it as a BVGraph.
 int maxRefCount()
          Returns the maximum reference count of this graph.
 NodeIterator nodeIterator(int from)
          This method returns a node iterator for scanning the graph sequentially, starting from the given node.
 long numArcs()
          Returns the number of arcs of this graph (optional operation).
 int numNodes()
          Returns the number of nodes of this graph.
 int outdegree(int x)
          Returns the outdegree of a node.
 boolean randomAccess()
          Checks whether this graph provides random access to successor lists.
protected  int readBlock(InputBitStream ibs)
          Reads a block from the given stream.
protected  int readBlockCount(InputBitStream ibs)
          Reads a block count from the given stream.
protected  long readLongResidual(InputBitStream ibs)
          Reads a long residual from the given stream.
protected  long readOffset(InputBitStream ibs)
          Reads an offset difference from the given stream.
protected  int readOutdegree(InputBitStream ibs)
          Reads an outdegree from the given stream.
protected  int readOutdegree(InputBitStream ibs, long offset)
          Reads an outdegree from the given stream at a given offset.
protected  int readReference(InputBitStream ibs)
          Reads a reference from the given stream.
protected  int readResidual(InputBitStream ibs)
          Reads a residual from the given stream.
static void store(ImmutableGraph graph, CharSequence basename)
          Writes the given graph using a given base name, without any progress logger and with all parameters set to their default values.
static void store(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength, int zetaK, int flags)
          Writes the given graph using a given base name, without any progress logger.
static void store(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength, int zetaK, int flags, ProgressLogger pl)
          Writes the given graph using a given base name.
static void store(ImmutableGraph graph, CharSequence basename, ProgressLogger pl)
          Writes the given graph using a given base name, with all parameters set to their default values.
 LazyIntIterator successors(int x)
          Returns an iterator over the successors of a given node.
protected  LazyIntIterator successors(int x, long offset, InputBitStream ibs, int[][] window, int[] outd)
          Given an InputBitStream wrapping a graph file, returns an iterator over the successors of a given node x.
 int windowSize()
          Returns the window size of this graph.
protected  int writeBlock(OutputBitStream obs, int block)
          Writes a block to the given stream.
protected  int writeBlockCount(OutputBitStream obs, int count)
          Writes a block count to the given stream.
protected  int writeOffset(OutputBitStream obs, long x)
          Writes an offset difference to the given stream.
 void writeOffsets(OutputBitStream obs, ProgressLogger pl)
          Write the offset file to a given bit stream.
protected  int writeOutdegree(OutputBitStream obs, int d)
          Writes an outdegree to the given stream.
protected  int writeReference(OutputBitStream obs, int ref)
          Writes a reference to the given stream.
protected  int writeResidual(OutputBitStream obs, int residual)
          Writes a residual to the given stream.
protected  int writeResidual(OutputBitStream obs, long residual)
          Writes a residual to the given stream.
 
Methods inherited from class it.unimi.dsi.webgraph.ImmutableGraph
equals, hashCode, load, loadOnce, nodeIterator, outdegrees, store, store, successorArray, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

SEQUENTIAL

public static final int SEQUENTIAL
The offset step parameter corresponding to sequential load.

See Also:
Constant Field Values

OFFLINE

public static final int OFFLINE
The offset step parameter corresponding to offline load.

See Also:
Constant Field Values

GRAPH_EXTENSION

public static final String GRAPH_EXTENSION
The standard extension for the graph bit stream.

See Also:
Constant Field Values

OFFSETS_EXTENSION

public static final String OFFSETS_EXTENSION
The standard extension for the graph-offsets bit stream.

See Also:
Constant Field Values

OFFSETS_BIG_LIST_EXTENSION

public static final String OFFSETS_BIG_LIST_EXTENSION
The standard extension for the cached LongBigList containing the graph offsets.

See Also:
Constant Field Values

OUTDEGREES_EXTENSION

public static final String OUTDEGREES_EXTENSION
The standard extension for the stream of node outdegrees.

See Also:
Constant Field Values

BVGRAPH_VERSION

public static final int BVGRAPH_VERSION
This number classifies the present graph format. When new features require introducing binary incompatibilities, this number is bumped so to ensure that old classes do not try to read graphs they cannot understand.

See Also:
Constant Field Values

INITIAL_SUCCESSOR_LIST_LENGTH

protected static final int INITIAL_SUCCESSOR_LIST_LENGTH
The initial length of an array that will contain a successor list.

See Also:
Constant Field Values

NO_INTERVALS

protected static final int NO_INTERVALS
A special value for minIntervalLength interpreted as meaning that the minimum interval length is infinity.

See Also:
Constant Field Values

basename

protected CharSequence basename
The basename of the graph. This may be null, but trying to load the graph with an offset step of -1 will cause an exception.


n

protected int n
The number of nodes of the graph.


m

protected long m
The number of arcs of the graph.


isMemory

protected boolean isMemory
When offsetType is not -1, whether this graph is directly loaded into graphMemory, or rather wrapped in a FastMultiByteArrayInputStream specified by graphStream.


isMapped

protected boolean isMapped
When offsetType is not -1, whether this graph is directly loaded into graphMemory, or rather memory-mapped.


graphMemory

protected byte[] graphMemory
The byte array storing the compressed graph, if isMemory is true and offsetType is not -1.

This variable is loaded with a copy of the graph file, or with a rearrangement of the latter, depending on whether offsetType is smaller than or equal to one. If offsetType is -1, this variable is null, and node iterators are generated by opening streams directly on the graph file.


graphStream

protected FastMultiByteArrayInputStream graphStream
The multi-byte array input stream storing the compressed graph, if isMemory is false, isMapped is false and offsetType is not -1.

It is loaded with a copy of the graph file. If offsetType is -1, this variable is null, and node iterators are generated by opening streams directly on the graph file.


mappedGraphStream

protected ByteBufferInputStream mappedGraphStream
The memory-mapped input stream storing the compressed graph, if isMapped is true.

It is loaded with a copy of the graph file. If offsetType is -1, this variable is null, and node iterators are generated by opening streams directly on the graph file.


offsets

protected LongBigList offsets
This variable is null iff offsetType is zero or less (implying that offsets have not been loaded). Otherwise, it is an Elias–Fano monotone list containing the pointers of the bit streams of one each offsetType nodes.


offsetType

protected int offsetType
The offset type: 2 is memory-mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that the we do not want even load the graph file.


maxRefCount

protected int maxRefCount
The maximum reference count.


DEFAULT_MAX_REF_COUNT

public static final int DEFAULT_MAX_REF_COUNT
Default backward reference maximum length.

See Also:
Constant Field Values

windowSize

protected int windowSize
The window size. Zero means no references.


DEFAULT_WINDOW_SIZE

public static final int DEFAULT_WINDOW_SIZE
Default window size.

See Also:
Constant Field Values

minIntervalLength

protected int minIntervalLength
The minimum interval length.


DEFAULT_MIN_INTERVAL_LENGTH

public static final int DEFAULT_MIN_INTERVAL_LENGTH
Default minimum interval length.

See Also:
Constant Field Values

zetaK

protected int zetaK
The value of k for ζk coding (for residuals).


DEFAULT_ZETA_K

public static final int DEFAULT_ZETA_K
Default value of k.

See Also:
Constant Field Values

OUTDEGREES_GAMMA

public static final int OUTDEGREES_GAMMA
Flag: write outdegrees using γ coding (default).

See Also:
Constant Field Values

OUTDEGREES_DELTA

public static final int OUTDEGREES_DELTA
Flag: write outdegrees using δ coding.

See Also:
Constant Field Values

BLOCKS_GAMMA

public static final int BLOCKS_GAMMA
Flag: write copy-block lists using γ coding (default).

See Also:
Constant Field Values

BLOCKS_DELTA

public static final int BLOCKS_DELTA
Flag: write copy-block lists using δ coding.

See Also:
Constant Field Values

RESIDUALS_GAMMA

public static final int RESIDUALS_GAMMA
Flag: write residuals using γ coding.

See Also:
Constant Field Values

RESIDUALS_ZETA

public static final int RESIDUALS_ZETA
Flag: write residuals using ζk coding (default).

See Also:
Constant Field Values

RESIDUALS_DELTA

public static final int RESIDUALS_DELTA
Flag: write residuals using δ coding.

See Also:
Constant Field Values

RESIDUALS_NIBBLE

public static final int RESIDUALS_NIBBLE
Flag: write residuals using variable-length nibble coding.

See Also:
Constant Field Values

RESIDUALS_GOLOMB

public static final int RESIDUALS_GOLOMB
Flag: write residuals using &golomb; coding.

See Also:
Constant Field Values

REFERENCES_GAMMA

public static final int REFERENCES_GAMMA
Flag: write references using γ coding.

See Also:
Constant Field Values

REFERENCES_DELTA

public static final int REFERENCES_DELTA
Flag: write references using δ coding.

See Also:
Constant Field Values

REFERENCES_UNARY

public static final int REFERENCES_UNARY
Flag: write references using unary coding (default).

See Also:
Constant Field Values

BLOCK_COUNT_GAMMA

public static final int BLOCK_COUNT_GAMMA
Flag: write block counts using γ coding (default).

See Also:
Constant Field Values

BLOCK_COUNT_DELTA

public static final int BLOCK_COUNT_DELTA
Flag: write block counts using δ coding.

See Also:
Constant Field Values

BLOCK_COUNT_UNARY

public static final int BLOCK_COUNT_UNARY
Flag: write block counts using unary coding.

See Also:
Constant Field Values

OFFSETS_GAMMA

public static final int OFFSETS_GAMMA
Flag: write offsets using γ coding (default).

See Also:
Constant Field Values

OFFSETS_DELTA

public static final int OFFSETS_DELTA
Flag: write offsets using δ coding.

See Also:
Constant Field Values

outdegreeCoding

protected int outdegreeCoding
The coding for outdegrees. By default, we use γ coding.


blockCoding

protected int blockCoding
The coding for copy-block lists. By default, we use γ coding.


residualCoding

protected int residualCoding
The coding for residuals. By default, we use ζ coding.


referenceCoding

protected int referenceCoding
The coding for references. By default, we use unary coding.


blockCountCoding

protected int blockCountCoding
The coding for block counts. By default, we use γ coding.


offsetCoding

protected int offsetCoding
The coding for offsets. By default, we use γ coding.


outdegreeIbs

protected InputBitStream outdegreeIbs
A bit stream wrapping graphMemory, or graphStream, used only by outdegree(int). It is declared here for efficiency reasons.

Constructor Detail

BVGraph

protected BVGraph()
Method Detail

copy

public BVGraph copy()
Description copied from class: ImmutableGraph
Returns a flyweight copy of this immutable graph.

Specified by:
copy in interface FlyweightPrototype<ImmutableGraph>
Specified by:
copy in class ImmutableGraph
Returns:
a flyweight copy of this immutable graph.
See Also:
FlyweightPrototype

numNodes

public int numNodes()
Description copied from class: ImmutableGraph
Returns the number of nodes of this graph.

Albeit this method is not optional, it is allowed that this method throws an UnsupportedOperationException if this graph has never been entirely traversed using a node iterator. This apparently bizarre behaviour is necessary to support implementations as ArcListASCIIGraph, which do not know the actual number of nodes until a traversal has been completed.

Specified by:
numNodes in class ImmutableGraph
Returns:
the number of nodes.

numArcs

public long numArcs()
Description copied from class: ImmutableGraph
Returns the number of arcs of this graph (optional operation).

Overrides:
numArcs in class ImmutableGraph
Returns:
the number of arcs.

randomAccess

public boolean randomAccess()
Description copied from class: ImmutableGraph
Checks whether this graph provides random access to successor lists.

Specified by:
randomAccess in class ImmutableGraph
Returns:
true if this graph provides random access to successor lists.

basename

public CharSequence basename()
Description copied from class: ImmutableGraph
Returns a symbolic basename for this graph (optional operation).

Implementors of this class may provide a basename (usually a pathname from which various files storing the graph are stemmed). This method is optional because it is sometimes unmeaningful (e.g., for one-off anonymous classes).

Overrides:
basename in class ImmutableGraph
Returns:
the basename.

maxRefCount

public int maxRefCount()
Returns the maximum reference count of this graph.

Returns:
the maximum reference count.

windowSize

public int windowSize()
Returns the window size of this graph.

Returns:
the window size.

readOffset

protected final long readOffset(InputBitStream ibs)
                         throws IOException
Reads an offset difference from the given stream.

Parameters:
ibs - an offset-file input bit stream.
Returns:
the next offset difference.
Throws:
IOException

writeOffset

protected final int writeOffset(OutputBitStream obs,
                                long x)
                         throws IOException
Writes an offset difference to the given stream.

Parameters:
obs - an offset-file output bit stream.
x - an offset difference to be stored in the stream.
Returns:
the number of bits written.
Throws:
IOException

readOutdegree

protected final int readOutdegree(InputBitStream ibs)
                           throws IOException
Reads an outdegree from the given stream.

Parameters:
ibs - a graph-file input bit stream.
Returns:
the next outdegree.
Throws:
IOException

readOutdegree

protected final int readOutdegree(InputBitStream ibs,
                                  long offset)
                           throws IOException
Reads an outdegree from the given stream at a given offset.

Parameters:
ibs - a graph-file input bit stream.
offset - the offset at which the stream must be positioned.
Returns:
the next outdegree.
Throws:
IOException

writeOutdegree

protected final int writeOutdegree(OutputBitStream obs,
                                   int d)
                            throws IOException
Writes an outdegree to the given stream.

Parameters:
obs - a graph-file output bit stream.
d - an outdegree to be stored in the stream.
Returns:
the number of bits written.
Throws:
IOException

readReference

protected final int readReference(InputBitStream ibs)
                           throws IOException
Reads a reference from the given stream.

Parameters:
ibs - a graph-file input bit stream.
Returns:
the next reference.
Throws:
IOException

writeReference

protected final int writeReference(OutputBitStream obs,
                                   int ref)
                            throws IOException
Writes a reference to the given stream.

Parameters:
obs - a graph-file output bit stream.
ref - the reference.
Returns:
the number of bits written.
Throws:
IOException

readBlockCount

protected final int readBlockCount(InputBitStream ibs)
                            throws IOException
Reads a block count from the given stream.

Parameters:
ibs - a graph-file input bit stream.
Returns:
the next block count.
Throws:
IOException

writeBlockCount

protected final int writeBlockCount(OutputBitStream obs,
                                    int count)
                             throws IOException
Writes a block count to the given stream.

Parameters:
obs - a graph-file output bit stream.
count - the block count.
Returns:
the number of written bits.
Throws:
IOException

readBlock

protected final int readBlock(InputBitStream ibs)
                       throws IOException
Reads a block from the given stream.

Parameters:
ibs - a graph-file input bit stream.
Returns:
the next block.
Throws:
IOException

writeBlock

protected final int writeBlock(OutputBitStream obs,
                               int block)
                        throws IOException
Writes a block to the given stream.

Parameters:
obs - a graph-file output bit stream.
block - the block.
Returns:
the number of written bits.
Throws:
IOException

readResidual

protected final int readResidual(InputBitStream ibs)
                          throws IOException
Reads a residual from the given stream.

Parameters:
ibs - a graph-file input bit stream.
Returns:
the next residual.
Throws:
IOException

readLongResidual

protected final long readLongResidual(InputBitStream ibs)
                               throws IOException
Reads a long residual from the given stream.

Parameters:
ibs - a graph-file input bit stream.
Returns:
the next residual.
Throws:
IOException

writeResidual

protected final int writeResidual(OutputBitStream obs,
                                  int residual)
                           throws IOException
Writes a residual to the given stream.

Parameters:
obs - a graph-file output bit stream.
residual - the residual.
Returns:
the number of written bits.
Throws:
IOException

writeResidual

protected final int writeResidual(OutputBitStream obs,
                                  long residual)
                           throws IOException
Writes a residual to the given stream.

Parameters:
obs - a graph-file output bit stream.
residual - the residual.
Returns:
the number of written bits.
Throws:
IOException

outdegree

public int outdegree(int x)
              throws IllegalStateException
Description copied from class: ImmutableGraph
Returns the outdegree of a node.

Specified by:
outdegree in class ImmutableGraph
Parameters:
x - a node.
Returns:
the outdegree of the given node.
Throws:
IllegalStateException - if called without offsets.

successors

public LazyIntIterator successors(int x)
Returns an iterator over the successors of a given node.

Overrides:
successors in class ImmutableGraph
Parameters:
x - a node.
Returns:
an iterator over the successors of the node.

successors

protected LazyIntIterator successors(int x,
                                     long offset,
                                     InputBitStream ibs,
                                     int[][] window,
                                     int[] outd)
                              throws IllegalStateException
Given an InputBitStream wrapping a graph file, returns an iterator over the successors of a given node x.

This method can be used in two different ways:

  1. by providing a node and an input bit stream wrapping a graph file, it is possible to access the successor list of the node (provided that offsets have been loaded);
  2. by providing additional data, which essentially are used to keep some state about the graph, it is possible to perform an efficient sequential visit of all successor lists (even when no offsets were loaded).

This method may modify the offset and the outdegree caches if window is null.

Parameters:
x - a node.
offset - the offset of x; can be -1 if window is not null.
ibs - an input bit stream wrapping a graph file. After this method returns, the state of ibs is undefined: however, after the iterator returned is exhausted, ibs will positioned just after the successor list of x.
window - either null, or a double array with the following meaning: window[(x-i) mod windowSize] contains, for all i between 1 (inclusive) and windowSize (exclusive), the list of successors of node xi. If window is not null then ibs must be positioned before the successor list of x. This parameter will not be modified.
outd - if window is not null, this is an array with as many elements as windowSize; outd[(x-i) mod windowSize] contains the outdegree of node xi for i greater than 0; at the end, this will be true also for i equal to 0.
Returns:
an iterator over the successors of x.
Throws:
IllegalStateException - if window is null and offsetType is 0.

nodeIterator

public NodeIterator nodeIterator(int from)
This method returns a node iterator for scanning the graph sequentially, starting from the given node. It keeps track of a sliding window of windowSize() previous successor lists to speed up the iteration of graphs with significant referentiation.

Overrides:
nodeIterator in class ImmutableGraph
Parameters:
from - the node from which the iterator will iterate.
Returns:
a NodeIterator for accessing nodes and successors sequentially.

load

public static BVGraph load(CharSequence basename,
                           int offsetType,
                           ProgressLogger pl)
                    throws IOException
Creates a new BVGraph by loading a compressed graph file from disk to memory.

Parameters:
basename - the basename of the graph.
offsetType - the desired offset type (2 is memory mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that the we do not want even load the graph file).
pl - a progress logger used while loading the graph, or null.
Returns:
a BVGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the graph.

load

public static BVGraph load(CharSequence basename,
                           int offsetType)
                    throws IOException
Creates a new BVGraph by loading a compressed graph file from disk to memory, with no progress logger.

Parameters:
basename - the basename of the graph.
offsetType - the desired offset type (2 is memory mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that the we do not want even load the graph file).
Returns:
a BVGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the graph.

load

public static BVGraph load(CharSequence basename)
                    throws IOException
Creates a new BVGraph by loading a compressed graph file from disk to memory, with no progress logger and all offsets.

Parameters:
basename - the basename of the graph.
Returns:
a BVGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the graph.

load

public static BVGraph load(CharSequence basename,
                           ProgressLogger pl)
                    throws IOException
Creates a new BVGraph by loading a compressed graph file from disk to memory, with all offsets.

Parameters:
basename - the basename of the graph.
pl - a progress logger used while loading the graph, or null.
Returns:
a BVGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the graph.

loadMapped

public static BVGraph loadMapped(CharSequence basename,
                                 ProgressLogger pl)
                          throws IOException
Creates a new BVGraph by memory-mapping a graph file.

Parameters:
basename - the basename of the graph.
pl - a progress logger used while loading the offsets, or null.
Returns:
an BVGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while memory-mapping the graph or reading the offsets.

loadMapped

public static BVGraph loadMapped(CharSequence basename)
                          throws IOException
Creates a new BVGraph by memory-mapping a graph file.

Parameters:
basename - the basename of the graph.
Returns:
an BVGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while memory-mapping the graph or reading the offsets.

loadSequential

public static BVGraph loadSequential(CharSequence basename,
                                     ProgressLogger pl)
                              throws IOException
Creates a new BVGraph by loading a compressed graph file from disk to memory, without offsets.

Parameters:
basename - the basename of the graph.
pl - a progress logger used while loading the graph, or null.
Returns:
a BVGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the graph.

loadSequential

public static BVGraph loadSequential(CharSequence basename)
                              throws IOException
Creates a new BVGraph by loading a compressed graph file from disk to memory, with no progress logger and without offsets.

Parameters:
basename - the basename of the graph.
Returns:
a BVGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the graph.

loadOffline

public static BVGraph loadOffline(CharSequence basename,
                                  ProgressLogger pl)
                           throws IOException
Creates a new BVGraph by loading just the metadata of a compressed graph file.

Parameters:
basename - the basename of the graph.
pl - a progress logger, or null.
Returns:
a BVGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the metadata.

loadOffline

public static BVGraph loadOffline(CharSequence basename)
                           throws IOException
Creates a new BVGraph by loading just the metadata of a compressed graph file.

Parameters:
basename - the basename of the graph.
Returns:
a BVGraph containing the specified graph.
Throws:
IOException - if an I/O exception occurs while reading the metadata.

loadInternal

protected BVGraph loadInternal(CharSequence basename,
                               int offsetType,
                               ProgressLogger pl)
                        throws IOException
Loads a compressed graph file from disk into this graph. Note that this method should be called only on a newly created graph.

Parameters:
basename - the basename of the graph.
offsetType - the desired offset type (2 is memory-mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that the we do not want even load the graph file).
pl - a progress logger used while loading the graph, or null.
Returns:
this graph.
Throws:
IOException - if an I/O exception occurs while reading the graph.

intervalize

protected static int intervalize(IntArrayList x,
                                 int minInterval,
                                 IntArrayList left,
                                 IntArrayList len,
                                 IntArrayList residuals)
This method tries to express an increasing sequence of natural numbers x as a union of an increasing sequence of intervals and an increasing sequence of residual elements. More precisely, this intervalization works as follows: first, one looks at x as a sequence of intervals (i.e., maximal sequences of consecutive elements); those intervals whose length is ≥ minInterval are stored in the lists left (the list of left extremes) and len (the list of lengths; the length of an integer interval is the number of integers in that interval). The remaining integers, called residuals are stored in the residual list.

Note that the previous content of left, len and residual is lost.

Parameters:
x - the list to be intervalized (an increasing list of natural numbers).
minInterval - the least length that a maximal sequence of consecutive elements must have in order for it to be considered as an interval.
left - the resulting list of left extremes of the intervals.
len - the resulting list of interval lengths.
residuals - the resulting list of residuals.
Returns:
the number of intervals.

store

public static void store(ImmutableGraph graph,
                         CharSequence basename,
                         int windowSize,
                         int maxRefCount,
                         int minIntervalLength,
                         int zetaK,
                         int flags,
                         ProgressLogger pl)
                  throws IOException
Writes the given graph using a given base name.

Parameters:
graph - a graph to be compressed.
basename - a base name.
windowSize - the window size (-1 for the default value).
maxRefCount - the maximum reference count (-1 for the default value).
minIntervalLength - the minimum interval length (-1 for the default value, NO_INTERVALS to disable).
zetaK - the parameter used for residual ζ-coding, if used (-1 for the default value).
flags - the flag mask.
pl - a progress logger to log the state of compression, or null if no logging is required.
Throws:
IOException - if some exception is raised while writing the graph.

store

public static void store(ImmutableGraph graph,
                         CharSequence basename,
                         int windowSize,
                         int maxRefCount,
                         int minIntervalLength,
                         int zetaK,
                         int flags)
                  throws IOException
Writes the given graph using a given base name, without any progress logger.

Parameters:
graph - a graph to be compressed.
basename - a base name.
windowSize - the window size (-1 for the default value).
maxRefCount - the maximum reference count (-1 for the default value).
minIntervalLength - the minimum interval length (-1 for the default value, NO_INTERVALS to disable).
zetaK - the parameter used for residual ζ-coding, if used (-1 for the default value).
flags - the flag mask.
Throws:
IOException - if some exception is raised while writing the graph.

store

public static void store(ImmutableGraph graph,
                         CharSequence basename,
                         ProgressLogger pl)
                  throws IOException
Writes the given graph using a given base name, with all parameters set to their default values.

Parameters:
graph - a graph to be compressed.
basename - a base name.
pl - a progress logger to log the state of compression, or null if no logging is required.
Throws:
IOException - if some exception is raised while writing the graph.

store

public static void store(ImmutableGraph graph,
                         CharSequence basename)
                  throws IOException
Writes the given graph using a given base name, without any progress logger and with all parameters set to their default values.

Parameters:
graph - a graph to be compressed.
basename - a base name.
Throws:
IOException - if some exception is raised while writing the graph.

writeOffsets

public void writeOffsets(OutputBitStream obs,
                         ProgressLogger pl)
                  throws IOException
Write the offset file to a given bit stream.

Parameters:
obs - the output bit stream to which offsets will be written.
pl - a progress logger, or null.
Throws:
IOException

main

public static void main(String[] args)
                 throws SecurityException,
                        IllegalAccessException,
                        InvocationTargetException,
                        NoSuchMethodException,
                        IOException,
                        JSAPException,
                        ClassNotFoundException,
                        InstantiationException
Reads an immutable graph and stores it as a BVGraph.

Throws:
SecurityException
IllegalAccessException
InvocationTargetException
NoSuchMethodException
IOException
JSAPException
ClassNotFoundException
InstantiationException