There are little statistical tools to perform exploratory data analysis in graphs databases.
The basic observation is the graph, but it may be also a connex component, or simply a node,
an edge, a cycle, a path, a concentric layer, etc. [2].
Many univariate and multivariate distributions may be generated from these populations.
One of the most useful distributions is the number of connex components having a given
radius R and a given diameter D. It is recalled that the radius is the lower bound of the
eccentricities of the nodes of a connex component, and the diameter is the upper bound,
the eccentricity of a node being the upper bound of the distances from the node to all
the nodes of the connex component. It is known that D takes values between R and 2R,
such that the bivariate distribution in the (R,D) plane takes place in an angular sector
limited by the two lines D=R and D=2R. Displaying the clusters in this bivariate distribution
offers a schematic graphical summary of the population which is called the Radius Diameter
Diagram [6,7].
The quantity I=(D-R)/R takes values in [0..1]. It is used as a shape index, and its
distribution can be plotted (see example in [7]).
|
|