I wanted find out a minimal conda-requirements.txt
file for my projects
using only the information from the
conda-recipes repository.
In order to do that I had to parse all conda recipes, to extract the dependencies
of each package, from my conda-recepis files.
A typical conda recipes file (meta.yaml
) looks like this:
package:
name: seawater
version: !!str 3.3.2
source:
fn: seawater-3.3.2.tar.gz
url: https://pypi.python.org/packages/source/s/seawater/seawater-3.3.2.tar.gz
md5: d2aa85c5b80f5dde84e0046468609be2
requirements:
build:
- python
- setuptools
- numpy
run:
- python
- numpy
test:
imports:
- seawater
- seawater.test
about:
home: http://pypi.python.org/pypi/seawater/
license: MIT License
summary: "Seawater Library for Python"
The next two cells finds all the meta.yaml
files present at a certain
directory, eliminates packages with no dependencies or dependencies that are
not connected with other python modules like testing modules and python itself
(python is always present).
import os
import yaml
import fnmatch
def get_meta_yaml(directory=os.getcwd()):
metas = []
for root, dirs, fnames in os.walk(directory):
for fname in fnmatch.filter(fnames, 'meta.yaml'):
metas.append(os.path.join(root, fname))
return metas
ignore = ['python', 'setuptools', 'conda_build', 'python.app', 'osx',
'None', 'nose', 'coverage', 'pytest', 'cov-core', 'pytest-cov']
def parse_depency(fname):
with open(fname) as f:
data = yaml.load(f)
name = data['package']['name']
deps = data.get('requirements', {}).get('run', ['None'])
# Weird workaround when run is defined but empty.
if not deps:
deps = ['None']
deps = [d.split()[0] for d in deps]
for pack in ignore:
if pack in deps: # No dependencies but python.
deps.remove(pack)
return dict({name: deps})
import os.path
path = os.path.join(os.path.expanduser("~"), 'IOOS', 'conda-recipes')
metas = get_meta_yaml(path)
packages = {}
for fname in metas:
packages.update(parse_depency(fname))
for pack in ignore:
packages.pop(pack, None)
for pack, deps in packages.items():
if not deps:
packages.pop(pack)
Here is where I few in love with networkX. Look how simple it is to create a directional graph using the dictionary parsed above.
%matplotlib inline
import networkx as nx
G = nx.DiGraph()
for pac in packages:
deps = packages[pac]
for dep in deps:
G.add_edge(pac, dep)
kw = dict(node_size=5, node_color='w', edge_color='b', alpha=0.25)
nx.draw(G, with_labels=True, **kw)
OK, it is easy, but that graph is way too confusing with all those packages. Maybe we should trim a few degrees of relations for a better view.
import matplotlib.pyplot as plt
def trim_nodes(G, d):
"""Returns a copy of G without the nodes with a degree less than d.
http://glowingpython.blogspot.com/2012/11/first-steps-with-networx.html
"""
Gt = G.copy()
dn = nx.degree(Gt)
for n in Gt.nodes():
if dn[n] <= d:
Gt.remove_node(n)
return Gt
fig, ax = plt.subplots(figsize=(9, 9))
Gt = trim_nodes(G, d=3)
pos = nx.graphviz_layout(Gt, prog="twopi", root='numpy')
nx.draw(Gt, pos, with_labels=True, **kw)
The main conclusion, as expected, is that all paths leads to NumPy!
However, my original question could actually be answered very quickly using only python set()
.
deps = set([item for sublist in packages.values() for item in sublist])
packs = set(packages.keys())
print('\n'.join(sorted(packs.difference(deps))))
But that is no fun. I wanted to explore this list graphically or at least with
more verbosity. For examples, what are the dependencies for iris
or oceans
:
def plot_deps(G, node):
h = nx.from_dict_of_lists({node: G.neighbors(node)})
nx.draw(h, with_labels=True, **kw)
plot_deps(G, 'iris')
plot_deps(G, 'oceans')
And while reading about networkX I found this answer on stackoverflow and voilĂ :
for pac in sorted(packs):
try:
deps_pacs = list(nx.dfs_edges(G, pac))
except KeyError: # No dep.
continue
print(pac)
spacer = {pac: 0}
for prereq, target in deps_pacs:
spacer[target] = spacer[prereq] + 2
fmt = '{spacer}+-{t}'.format
print(fmt(spacer=' ' * spacer[prereq], t=target))
print('')
That is some nice and simple text-graph visualization. Bare in mind the dependency information is as good as the guy who wrote the
meta.yaml
(ipython, for example, has several other dependencies that are not listed in the file.)
To end the post I will leave a cool D3 graph rendered inside the IPython notebook. Courtesy of the IPython Cookbook by Cyrille Rossant. (Sadly I could not figure out how to show it in the rendered HTML, you will have to run the notebook locally to see it.)
import json
from IPython.display import HTML
from networkx.readwrite import json_graph
for n in G:
G.node[n]['name'] = n
data = json_graph.node_link_data(G)
with open('graph.json', 'w') as f:
json.dump(data, f, indent=4)
%%html
<div id="d3-example"></div>
<style>
.node {stroke: #fff; stroke-width: 1.5px;}
.link {stroke: #999; stroke-opacity: .6;}
</style>
%%javascript
// http://nbviewer.ipython.org/github/davidrpugh/cookbook-code/blob/master/notebooks/chapter06_viz/04_d3.ipynb
// We load the d3.js library from the Web.
require.config({paths: {d3: "http://d3js.org/d3.v3.min"}});
require(["d3"], function(d3) {
// The code in this block is executed when the
// d3.js library has been loaded.
// First, we specify the size of the canvas containing
// the visualization (size of the <div> element).
var width = 600,
height = 600;
// We create a color scale.
var color = d3.scale.category10();
// We create a force-directed dynamic graph layout.
var force = d3.layout.force()
.charge(-120)
.linkDistance(30)
.size([width, height]);
// In the <div> element, we create a <svg> graphic
// that will contain our interactive visualization.
var svg = d3.select("#d3-example").select("svg")
if (svg.empty()) {
svg = d3.select("#d3-example").append("svg")
.attr("width", width)
.attr("height", height);
}
// We load the JSON file.
d3.json("graph.json", function(error, graph) {
// In this block, the file has been loaded
// and the 'graph' object contains our graph.
// We load the nodes and links in the force-directed
// graph.
force.nodes(graph.nodes)
.links(graph.links)
.start();
// We create a <line> SVG element for each link
// in the graph.
var link = svg.selectAll(".link")
.data(graph.links)
.enter().append("line")
.attr("class", "link");
// We create a <circle> SVG element for each node
// in the graph, and we specify a few attributes.
var node = svg.selectAll(".node")
.data(graph.nodes)
.enter().append("circle")
.attr("class", "node")
.attr("r", 5) // radius
.style("fill", function(d) {
// The node color depends on the club.
return color(d.club);
})
.call(force.drag);
// The name of each node is the node number.
node.append("title")
.text(function(d) { return d.name; });
// We bind the positions of the SVG elements
// to the positions of the dynamic force-directed graph,
// at each time step.
force.on("tick", function() {
link.attr("x1", function(d) { return d.source.x; })
.attr("y1", function(d) { return d.source.y; })
.attr("x2", function(d) { return d.target.x; })
.attr("y2", function(d) { return d.target.y; });
node.attr("cx", function(d) { return d.x; })
.attr("cy", function(d) { return d.y; });
});
});
});
HTML(html)