it.unimi.dsi.webgraph
Class ScatteredArcsASCIIGraph

java.lang.Object
  extended by it.unimi.dsi.webgraph.ImmutableGraph
      extended by it.unimi.dsi.webgraph.ImmutableSequentialGraph
          extended by it.unimi.dsi.webgraph.ScatteredArcsASCIIGraph
All Implemented Interfaces:
FlyweightPrototype<ImmutableGraph>

public class ScatteredArcsASCIIGraph
extends ImmutableSequentialGraph

An ImmutableGraph that corresponds to a graph stored as a scattered list of arcs.

A scattered list of arcs describes a graph in a fairly loose way. Each line contains an arc specified as two node identifiers separated by whitespace (but we suggest exactly one TAB character). Sources and targets can be in any order. Node identifiers can be in the range [-263..263): they will be remapped in a compact identifier space by assigning to each newly seen identifier a new node number. The list of identifiers in order of appearance is available in ids. Lines can be empty, or comments starting with #. Characters following the target will be discarded with a warning.

Warning: Lines not conforming the above specification will cause an error to be logged, but will be otherwise ignored.

Additionally, the resulting graph can be symmetrized, and its loops be removed, using suitable constructor options.

This class has no load method, and its main method converts an scattered-arcs representation directly into a BVGraph.

Using ScatteredArcsASCIIGraph to convert your data

A simple (albeit rather inefficient) way to import data into WebGraph is using ASCII graphs specified by scattered arcs. Suppose you create the following file, named example.arcs:

  # My graph
  -1 15
  15 2
  2 -1 This will cause a warning to be logged
  OOPS! (This will cause an error to be logged)
  -1 2
  
Then, the command
  java it.unimi.dsi.webgraph.ScatteredArcsASCIIGraph example <example.arcs
  
will produce a compressed graph in BVGraph format with basename example. The file example.ids will contain the list of longs -1, 15, 2. The node with identifer -1 will be the node 0 in the output graph, the node with identifier 15 will be node 1, and the node with identifier 2 will be node 2. The graph example will thus have three nodes and four arcs (viz., <0,1>, <0,2>, <1,2> and <2,0>).

Memory requirements

To convert node identifiers to node numbers, instances of this class use a custom map that in the worst case will require 19.5×2log(4n/3) ≤ 52n bytes, where n is the number of distinct identifiers. Storing batches of arc in memory requires 12 bytes per arc.


Nested Class Summary
 
Nested classes/interfaces inherited from class it.unimi.dsi.webgraph.ImmutableGraph
ImmutableGraph.LoadMethod
 
Field Summary
static int DEFAULT_BATCH_SIZE
          The default batch size.
 long[] ids
          The list of identifiers in order of appearance.
 
Fields inherited from class it.unimi.dsi.webgraph.ImmutableGraph
GRAPHCLASS_PROPERTY_KEY, PROPERTIES_EXTENSION
 
Constructor Summary
ScatteredArcsASCIIGraph(InputStream is)
          Creates a scattered-arcs ASCII graph.
ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize)
          Creates a scattered-arcs ASCII graph.
ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops)
          Creates a scattered-arcs ASCII graph.
ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops, int batchSize)
          Creates a scattered-arcs ASCII graph.
ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops, int batchSize, File tempDir)
          Creates a scattered-arcs ASCII graph.
ScatteredArcsASCIIGraph(InputStream is, boolean symmetrize, boolean noLoops, int batchSize, File tempDir, ProgressLogger pl)
          Creates a scattered-arcs ASCII graph.
 
Method Summary
static void main(String[] args)
           
 NodeIterator nodeIterator(int from)
          Returns a node iterator for scanning the graph sequentially, starting from the given node.
 long numArcs()
          Returns the number of arcs of this graph (optional operation).
 int numNodes()
          Returns the number of nodes of this graph.
 
Methods inherited from class it.unimi.dsi.webgraph.ImmutableSequentialGraph
copy, outdegree, randomAccess, successorArray
 
Methods inherited from class it.unimi.dsi.webgraph.ImmutableGraph
basename, equals, hashCode, load, load, load, loadMapped, loadMapped, loadOffline, loadOffline, loadOnce, loadSequential, loadSequential, nodeIterator, outdegrees, store, store, successors, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_BATCH_SIZE

public static final int DEFAULT_BATCH_SIZE
The default batch size.

See Also:
Constant Field Values

ids

public long[] ids
The list of identifiers in order of appearance.

Constructor Detail

ScatteredArcsASCIIGraph

public ScatteredArcsASCIIGraph(InputStream is)
                        throws IOException
Creates a scattered-arcs ASCII graph.

Parameters:
is - an input stream containing a scattered list of arcs.
Throws:
IOException

ScatteredArcsASCIIGraph

public ScatteredArcsASCIIGraph(InputStream is,
                               boolean symmetrize)
                        throws IOException
Creates a scattered-arcs ASCII graph.

Parameters:
is - an input stream containing a scattered list of arcs.
symmetrize - the new graph will be forced to be symmetric.
Throws:
IOException

ScatteredArcsASCIIGraph

public ScatteredArcsASCIIGraph(InputStream is,
                               boolean symmetrize,
                               boolean noLoops)
                        throws IOException
Creates a scattered-arcs ASCII graph.

Parameters:
is - an input stream containing a scattered list of arcs.
symmetrize - the new graph will be forced to be symmetric.
noLoops - the new graph will have no loops.
Throws:
IOException

ScatteredArcsASCIIGraph

public ScatteredArcsASCIIGraph(InputStream is,
                               boolean symmetrize,
                               boolean noLoops,
                               int batchSize)
                        throws IOException
Creates a scattered-arcs ASCII graph.

Parameters:
is - an input stream containing a scattered list of arcs.
symmetrize - the new graph will be forced to be symmetric.
noLoops - the new graph will have no loops.
batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
Throws:
IOException

ScatteredArcsASCIIGraph

public ScatteredArcsASCIIGraph(InputStream is,
                               boolean symmetrize,
                               boolean noLoops,
                               int batchSize,
                               File tempDir)
                        throws IOException
Creates a scattered-arcs ASCII graph.

Parameters:
is - an input stream containing a scattered list of arcs.
symmetrize - the new graph will be forced to be symmetric.
noLoops - the new graph will have no loops.
batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
Throws:
IOException

ScatteredArcsASCIIGraph

public ScatteredArcsASCIIGraph(InputStream is,
                               boolean symmetrize,
                               boolean noLoops,
                               int batchSize,
                               File tempDir,
                               ProgressLogger pl)
                        throws IOException
Creates a scattered-arcs ASCII graph.

Parameters:
is - an input stream containing a scattered list of arcs.
symmetrize - the new graph will be forced to be symmetric.
noLoops - the new graph will have no loops.
batchSize - the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
tempDir - a temporary directory for the batches, or null for File.createTempFile(java.lang.String, java.lang.String)'s choice.
pl - a progress logger, or null.
Throws:
IOException
Method Detail

numNodes

public int numNodes()
Description copied from class: ImmutableGraph
Returns the number of nodes of this graph.

Albeit this method is not optional, it is allowed that this method throws an UnsupportedOperationException if this graph has never been entirely traversed using a node iterator. This apparently bizarre behaviour is necessary to support implementations as ArcListASCIIGraph, which do not know the actual number of nodes until a traversal has been completed.

Specified by:
numNodes in class ImmutableGraph
Returns:
the number of nodes.

numArcs

public long numArcs()
Description copied from class: ImmutableGraph
Returns the number of arcs of this graph (optional operation).

Overrides:
numArcs in class ImmutableGraph
Returns:
the number of arcs.

nodeIterator

public NodeIterator nodeIterator(int from)
Description copied from class: ImmutableGraph
Returns a node iterator for scanning the graph sequentially, starting from the given node.

This implementation just calls the random-access methods (ImmutableGraph.successors(int) and ImmutableGraph.outdegree(int)). More specific implementations may choose to maintain some extra state to make the enumeration more efficient.

Overrides:
nodeIterator in class ImmutableSequentialGraph
Parameters:
from - the node from which the iterator will iterate.
Returns:
a NodeIterator for accessing nodes and successors sequentially.

main

public static void main(String[] args)
                 throws IllegalArgumentException,
                        SecurityException,
                        IOException,
                        JSAPException
Throws:
IllegalArgumentException
SecurityException
IOException
JSAPException