The scripts in dm_graphs.py provide a way to explore the relationships between words in a distributional model. It relies on the NetworkX package. Please see the paper for more context. For an interactive demo, go here.
This work was presented as a poster at the ADS workshop. Bibtex:
@unpublished{miltenburg2015exploring,
Author = {Emiel van Miltenburg},
Date-Added = {2016-07-17 09:22:11 +0000},
Date-Modified = {2016-07-17 16:43:09 +0000},
Note = {Presented as a poster at the Advances in Distributional Semantics workshop, collocated with IWCS. GitHub page: \url{https://github.com/evanmiltenburg/dm-graphs}},
Title = {Exploring and visualizing distributional models using graphs},
Year = {2015}}
The file googlenews.py shows how to use the dm_graphs module. Just create an iterable that contains tuples (u,v,w) corresponding to edges between u and v with weight w. The weight in this case is the cosine similarity between u and v. Then create a graph, and fill it with the data. The rest of the code shows how to make the network easier to visualize by using dm_graphs.graph_reduce(G) and dm_graphs.MST_pathfinder(G). These return a sparser version of the network.
The folder googlenews-demo contains some output files (.gexf) that you can open with Gephi. The pdf files contain visualizations of this data. For these, I used YifanHu's MultiLevel layout algorithm, followed by ForceAtlas2. The colors were randomly generated by Gephi. I used the modularity detection function to detect clusters, and then applied partition coloring by modularity class.
Main functions
MST_pathfinder()is an implementation of MST-pathfinder algorithm (Quirin et al. 2008).graph_reduce()reduces the graph by only including the edges that link each node to its top-n similar neighbors. It also has an optional restriction such that every edge should have a weight above a particular threshold.maxmax_transform()is an implementation of phase 1 of the MaxMax algorithm (Hope & Keller 2013). It transforms a weighted undirected graph into a directed graph.maxmax_clusters()is a function inspired by the MaxMax algorithm that produces soft clusters of words. Clusters should correspond to word senses.
Graph analysis
main_graph()returns the largest connected component.graph_analysis()returns some statistics about the graph, including a suggested partition based on the Louvain method.
Utilities
add_partition_data()adds partition data from the analysis to the graph.invert_weights()changes weights on all edges to 1-weight.rank_reweight()reweights the edges based on the similarity ranks of the nodes.remove_weights()removes the weights from the graph. Useful if you don't want Gephi to make the edges thicker.write_functions()displays the available methods to write out the graphs.