Skip to content

Commit 1f72f69

Browse files
committed
Update to notebooks.
1 parent 598a51e commit 1f72f69

File tree

2 files changed

+135
-46
lines changed

2 files changed

+135
-46
lines changed

notebooks/gitnet_walkthrough.ipynb

Lines changed: 35 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,16 @@
11
{
22
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": 1,
6+
"metadata": {
7+
"collapsed": true
8+
},
9+
"outputs": [],
10+
"source": [
11+
"%matplotlib inline"
12+
]
13+
},
314
{
415
"cell_type": "markdown",
516
"metadata": {},
@@ -382,17 +393,17 @@
382393
"outputs": [],
383394
"source": [
384395
"# Try calling one of the advanced graph methods, such as *collapse_edges*\n",
385-
"basic_graph = logs_tensor.generate_network('author', 'files')\n",
396+
"basic_graph = logs_tensor.generate_network('author', 'files', colours=\"simple\")\n",
386397
"# Sum_weights = True is an optional argument that creates a weighted multigraph.\n",
387398
"collapsed_graph = basic_graph.collapse_edges(sum_weights = True)\n",
388-
"collapsed_graph.quickplot(fname = \"ok_net.pdf\")"
399+
"collapsed_graph.quickplot(\"ok_net.pdf\", layout=\"spring\")"
389400
]
390401
},
391402
{
392403
"cell_type": "markdown",
393404
"metadata": {},
394405
"source": [
395-
"Optional: try reading an output file into R.\n",
406+
"Optional: try reading a file into R.\n",
396407
"\n",
397408
"Use the edge list created earlier, or create a new *tnet file* or *graphml file* and try reading it into R."
398409
]
@@ -407,14 +418,33 @@
407418
"source": [
408419
"# The graphml file will be saved at the directed path, while the tnet file will be saved in the current directory.\n",
409420
"basic_graph.write_tnet('filename')\n",
410-
"basic_graph.write_graphml('path/to/file')"
421+
"basic_graph.write_graphml('filename')"
422+
]
423+
},
424+
{
425+
"cell_type": "markdown",
426+
"metadata": {},
427+
"source": [
428+
"If you prefer, you can use the write_edges() function to export a weighted edgelist which can be read into R.\n",
429+
"These edgelists also contain datetime entries, as a fourth column, which can be used to order nodes and create dynamic networks."
430+
]
431+
},
432+
{
433+
"cell_type": "code",
434+
"execution_count": null,
435+
"metadata": {
436+
"collapsed": true
437+
},
438+
"outputs": [],
439+
"source": [
440+
"basic_graph.write_edges('filename.txt', weighted=True)"
411441
]
412442
},
413443
{
414444
"cell_type": "markdown",
415445
"metadata": {},
416446
"source": [
417-
"If you prefer, you can use two columns of the TSV file as the 'source' and 'target' of a networkx graph object in R."
447+
"As you may have noticed, there is a colour argument in the `generate_network()` function. It is used at the time of network creation to specify if the user wants to create colour tags for the nodes. These colours are based on the type of node, and by extension on the contents of the \"file\" node type."
418448
]
419449
},
420450
{

notebooks/gitnet_walkthrough.py

Lines changed: 100 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,41 @@
11

22
# coding: utf-8
33

4+
# In[1]:
5+
6+
get_ipython().magic('matplotlib inline')
7+
8+
49
# # User Testing gitnet
5-
#
6-
# #### *June 2016, using version 0.0.8 of gitnet on testpypi*
7-
#
10+
#
11+
# #### *Written in June 2016, using version 0.0.8 of gitnet on testpypi*
12+
#
813

9-
#
10-
# ## *Introduction*
11-
#
14+
#
15+
# ## *Introduction*
16+
#
1217

13-
# To follow this exercise successfully, you need to have:
18+
# ## To follow this exercise successfully, you need to have:
1419
# - Python 3 (Anacondas 3.5 is the best bet)
1520
# - Git (you can update git by running in the terminal: pip install git --upgrade)
16-
# - The current version of gitnet is 0.0.8.
21+
# - The current version of git is 2.9.
1722
# - NetworkX (you can install by running in ther terminal: pip install networkx)
1823
# - Matplotlib (you can install by running in the terminal: pip install matplotlib)
1924
# - Pygraphviz (not neccessarily required, only for the default layout, which happens to be the best one we could find)
20-
#
25+
#
2126
# **Note:** Unfortunately, Pygraphviz can potentially be difficult to install on Windows. If pip is not able to find vcvarsall.bat, then avoid editing the environment variables and use this website: http://www.lfd.uci.edu/~gohlke/pythonlibs/ to download the binary for Python 3.4. Unfortunately, although Pygraphviz will install, there still may be errors with the graph output.
22-
#
27+
#
2328
# Installing gitnet with pip will automatically install bash if you do not already have it installed
2429
# To install gitnet, open a terminal window and type:
25-
#
30+
#
2631
# `pip install -i https://testpypi.python.org/pypi gitnet`
2732

33+
# In[ ]:
34+
2835
# For all sections of this exercise, you will need to use the following libraries:
2936

3037
import os
31-
# import pygraph # Needed for defaults used by quickplot, if you can't install, use layout='spring'.
38+
# import pygraphviz # Needed for defaults used by quickplot, if you can't install, use layout='spring'.
3239
import gitnet as gn
3340
import networkx as nx
3441
import matplotlib.pyplot as plt
@@ -37,24 +44,32 @@
3744
# ## *1. Write-Good Repo*
3845

3946
# For this exercise, we are going to use the project: https://github.com/btford/write-good
40-
#
47+
#
4148
# In a new terminal window, type:
42-
#
49+
#
4350
# `git clone https://github.com/btford/write-good.git`
44-
#
51+
#
4552
# OR open the page in a browser and download the zip folder.
4653

54+
# In[ ]:
55+
4756
# Set the current working directory, so that all files created will be stored there.
4857
# The best bet is to create a folder named 'temp' on your desktop.
4958
os.chdir('path')
5059

60+
61+
# In[ ]:
62+
5163
# Insert the path to the write-good folder on your machine.
5264
mylogs = gn.get_log('path')
5365
# You can generate a network using any two tags that exist in the log. For a list of tags, just call .attributes() on your log object.
5466
graph = mylogs.generate_network('author', 'files')
5567
# Quickplot is a preset function that can be used to quickly visualize a network.
5668
graph.quickplot('write_good_net.pdf', layout = 'spring')
5769

70+
71+
# In[ ]:
72+
5873
# You can get a list of all of the values of any tag in the log object.
5974
# First, lets take a look at all of the possible tags.
6075
print(mylogs.attributes())
@@ -65,33 +80,42 @@
6580
# ## *2. NetworkX*
6681

6782
# For this exercise, we are going to use this project: https://github.com/networkx/networkx
68-
#
83+
#
6984
# In a new terminal window, type:
70-
#
85+
#
7186
# `git clone https://github.com/networkx/networkx.git`
72-
#
87+
#
7388
# OR open the page in a browser and download the zip folder.
7489

90+
# In[ ]:
91+
7592
# First, we are going to create another log object.
7693
networkx_log = gn.get_log('path')
7794

95+
96+
# In[ ]:
97+
7898
# Now you can export the log as a TSV file.
7999
networkx_log.tsv(fname = 'networkx_data.tsv')
80100

81101

82102
# Take a minute to open this file and look at the contents.
83-
#
103+
#
84104
# Notice that there are similar author names that use the same email address.
85-
#
86-
# **Hint:** since version 0.0.8, we have simplified the process of identifying duplicate authors.
87-
# Use `author_email_list` along with `detect_dup_emails` to find potentially duplicate authors. See the cheat sheet for more details.
105+
#
106+
# **Hint:** since version 0.0.8, we have simplified the process of identifying duplicate authors. Use `author_email_list` along with `detect_dup_emails` to find potentially duplicate authors. See the cheat sheet for more details.
107+
108+
# In[ ]:
88109

89110
# Gitnet cannot automatically predict when a single author uses two different names to commit to a repo.
90111
# For this reason, you may need to use replace one of their aliases with the other.
91112
replaced_netx = networkx_log.replace_val('author', 'aric', 'Aric Hagburg')
92113
# To make sure that this worked, just create a new TSV and look at the contents.
93114
replaced_netx.tsv(fname = 'replaced_data.tsv')
94115

116+
117+
# In[ ]:
118+
95119
# You can also create an edgelist from any two tags.
96120
# Check the possible tags.
97121
print(replaced_netx.attributes())
@@ -104,85 +128,120 @@
104128
# ## *3. Tensorflow*
105129

106130
# For this exercise, we are going to use this project: https://github.com/tensorflow/tensorflow
107-
#
131+
#
108132
# In a new terminal window, type:
109-
#
133+
#
110134
# `git clone https://github.com/tensorflow/tensorflow.git`
111-
#
135+
#
112136
# OR open the page in a browser and download the zip folder.
113137

138+
# In[ ]:
139+
114140
# Lets start by creating a log object and a graph object, just as in the first exercise.
115141
logs_tensor = gn.get_log('path')
116142
graph_tensor = logs_tensor.generate_network('author', 'files')
117143

144+
118145
# For now, hold off on plotting or exporting, and try out some of the advanced methods
119-
#
146+
#
120147
# Below are some usage examples for filter and ignore
121148

149+
# In[ ]:
150+
122151
# Filter seems to have an error in IPYNB format.
123152

153+
124154
# Filter records based on the email domain.
125155
filtered_email = logs_tensor.filter('email', 'has', '@gmail.com')
126156
# Filter records based on the author name.
127157
filtered_author = logs_tensor.filter('author', 'equals', 'Martin Wicke')
128158
# Filter records based on commits that have occured after a certain date.
129159
filtered_date = logs_tensor.filter('date', 'since', 'Fri Jun 10 15:41:25 2016 -0400')
130160

131-
# One of the limitations of filter is that because of the date-string format used by git, you need to type a pattern that at least partially matches the appearance of date-strings in the actually commits.
132-
#
161+
162+
# One of the limitations of filter is that because of the date-string format used by git, you need to type a pattern that at least partially matches the appearance of date-strings in the actually commits.
163+
#
133164
# However, it is still possible to use expressions such as `Fri June 10 *`, so there is still some room for flexible filtering.
134165

166+
# In[ ]:
167+
135168
# Save one of these to a TSV file to check that it worked.
136169
filtered_author.tsv(fname = 'tensorflow_martin.tsv')
137170

171+
172+
# In[ ]:
173+
138174
# You can also ignore files and file edits that match any specified patter.
139175
# Ignore python files:
140176
ignore_python = logs_tensor.ignore('.py')
141177
# Ignore files with the _ prefix:
142178
ignore_prefix = logs_tensor.ignore('_*')
143179

144180

145-
# Keep in mind that both `filter` and `ignore` can have a significant impact on the network graph.
146-
#
147-
# It is best to use them sparingly, and only when it is certainly useful to remove certain information.
148-
# In many cases, it makes more sense to simply export the full graph and all its data (as a graphml file, for example) and then prune the data in R.
181+
# Keep in mind that both `filter` and `ignore` can have a significant impact on the network graph.
182+
#
183+
# It is best to use them sparingly, and only when it is certainly useful to remove certain information. In many cases, it makes more sense to simply export the full graph and all its data (as a graphml file, for example) and then prune the data in R.
184+
185+
# In[ ]:
149186

150187
# Save one of these to a TSV file to check that it worked.
151188
ignore_python.tsv(fname = 'nopy_data.tsv')
152189

190+
191+
# In[ ]:
192+
153193
# Try generating a network using one of these modified log objects, and compare it to previous results.
154194
modified_graph = ignore_python.generate_network('author', 'files')
155195
modified_graph.quickplot('modified_graph.pdf', layout = 'spring') # this runs very slow.
156196

157197

158-
# One note about the quickploy function is that it typically uses the `neato` layout from `matplotlib`.
159-
#
198+
# One note about the quickploy function is that it typically uses the `neato` layout from `matplotlib`.
199+
#
160200
# Here we are using the `spring` layout from `NetworkX`, but if you did get matplotlib installed, then you can simply leave
161201
# out the layout argument. It defaults to `neato`.
162202

203+
# In[ ]:
204+
163205
# Try calling describe on both a log object and a graph object.
164206
# Is there any other information you would like to see in the describe output?
165207
ignore_python.describe()
166208
modified_graph.describe()
167209

168210

169-
# The last advanced method we have to show you is collapse graph. This quickly creates a one-mode network, using *mode1* of the
211+
# The last advanced method we have to show you is collapse graph. This quickly creates a one-mode network, using *mode1* of the
170212
# original graph object.
171213

214+
# In[ ]:
215+
172216
# Try calling one of the advanced graph methods, such as *collapse_edges*
173-
basic_graph = logs_tensor.generate_network('author', 'files')
217+
basic_graph = logs_tensor.generate_network('author', 'files', colours="simple")
174218
# Sum_weights = True is an optional argument that creates a weighted multigraph.
175219
collapsed_graph = basic_graph.collapse_edges(sum_weights = True)
176-
collapsed_graph.quickplot(fname = "ok_net.pdf")
220+
collapsed_graph.quickplot("ok_net.pdf", layout="spring")
177221

178222

179-
# Optional: try reading an output file into R.
180-
#
223+
# Optional: try reading a file into R.
224+
#
181225
# Use the edge list created earlier, or create a new *tnet file* or *graphml file* and try reading it into R.
182226

227+
# In[ ]:
228+
183229
# The graphml file will be saved at the directed path, while the tnet file will be saved in the current directory.
184230
basic_graph.write_tnet('filename')
185-
basic_graph.write_graphml('path/to/file')
231+
basic_graph.write_graphml('filename')
232+
233+
234+
# If you prefer, you can use the write_edges() function to export a weighted edgelist which can be read into R.
235+
# These edgelists also contain datetime entries, as a fourth column, which can be used to order nodes and create dynamic networks.
236+
237+
# In[ ]:
238+
239+
basic_graph.write_edges('filename.txt', weighted=True)
240+
241+
242+
# As you may have noticed, there is a colour argument in the `generate_network()` function. It is used at the time of network creation to specify if the user wants to create colour tags for the nodes. These colours are based on the type of node, and by extension on the contents of the "file" node type.
243+
244+
# In[ ]:
245+
186246

187247

188-
# If you prefer, you can use two columns of the TSV file as the 'source' and 'target' of a networkx graph object in R.

0 commit comments

Comments
 (0)