I wanted find out a minimal conda-requirements.txt file for my projects using only the information from the conda-recipes repository. In order to do that I had to parse all conda recipes, to extract the dependencies of each package, from my conda-recepis files.

A typical conda recipes file (meta.yaml) looks like this:

package:
  name: seawater
  version: !!str 3.3.2

source:
  fn: seawater-3.3.2.tar.gz
  url: https://pypi.python.org/packages/source/s/seawater/seawater-3.3.2.tar.gz
  md5: d2aa85c5b80f5dde84e0046468609be2

requirements:
  build:
    - python
    - setuptools
    - numpy

  run:
    - python
    - numpy

test:
  imports:
    - seawater
    - seawater.test

about:
  home: http://pypi.python.org/pypi/seawater/
  license: MIT License
  summary: "Seawater Library for Python"

The next two cells finds all the meta.yaml files present at a certain directory, eliminates packages with no dependencies or dependencies that are not connected with other python modules like testing modules and python itself (python is always present).

In [2]:

import os
import yaml
import fnmatch


def get_meta_yaml(directory=os.getcwd()):
    metas = []
    for root, dirs, fnames in os.walk(directory):
        for fname in fnmatch.filter(fnames, 'meta.yaml'):
            metas.append(os.path.join(root, fname))
    return metas



ignore = ['python', 'setuptools', 'conda_build', 'python.app', 'osx',
          'None', 'nose', 'coverage', 'pytest', 'cov-core', 'pytest-cov']

def parse_depency(fname):
    with open(fname) as f:
        data = yaml.load(f)
    name = data['package']['name']
    deps = data.get('requirements', {}).get('run', ['None'])
    # Weird  workaround when run is defined but empty.
    if not deps:
        deps = ['None']
    deps = [d.split()[0] for d in deps]
    for pack in ignore:
        if pack in deps:  # No dependencies but python.
            deps.remove(pack)
    return dict({name: deps})

In [3]:

import os.path

path = os.path.join(os.path.expanduser("~"), 'IOOS', 'conda-recipes')
metas = get_meta_yaml(path)

packages = {}
for fname in metas:
    packages.update(parse_depency(fname))

for pack in ignore:
    packages.pop(pack, None)

for pack, deps in packages.items():
    if not deps:
        packages.pop(pack)

Here is where I few in love with networkX. Look how simple it is to create a directional graph using the dictionary parsed above.

In [4]:

%matplotlib inline
import networkx as nx

G = nx.DiGraph()

for pac in packages:
    deps = packages[pac]
    for dep in deps:
        G.add_edge(pac, dep)

kw = dict(node_size=5, node_color='w', edge_color='b', alpha=0.25)
nx.draw(G, with_labels=True, **kw)

OK, it is easy, but that graph is way too confusing with all those packages. Maybe we should trim a few degrees of relations for a better view.

In [5]:

import matplotlib.pyplot as plt

def trim_nodes(G, d):
    """Returns a copy of G without the nodes with a degree less than d.
    http://glowingpython.blogspot.com/2012/11/first-steps-with-networx.html
    """
    Gt = G.copy()
    dn = nx.degree(Gt)
    for n in Gt.nodes():
        if dn[n] <= d:
            Gt.remove_node(n)
    return Gt

fig, ax = plt.subplots(figsize=(9, 9))
Gt = trim_nodes(G, d=3)
pos = nx.graphviz_layout(Gt, prog="twopi", root='numpy')
nx.draw(Gt, pos, with_labels=True, **kw)

The main conclusion, as expected, is that all paths leads to NumPy!

However, my original question could actually be answered very quickly using only python set().

In [6]:

deps = set([item for sublist in packages.values() for item in sublist])
packs = set(packages.keys())

print('\n'.join(sorted(packs.difference(deps))))

airsea
compliance-checker
folium
gensim
ipython
iris
mplleaflet
oceans
octant
prettyplotlib
pymc
pyoos
pyugrid
rasterio
sparqlwrapper
tappy
thredds_crawler
ulmo

But that is no fun. I wanted to explore this list graphically or at least with more verbosity. For examples, what are the dependencies for iris or oceans:

In [7]:

def plot_deps(G, node):
    h = nx.from_dict_of_lists({node: G.neighbors(node)})
    nx.draw(h, with_labels=True, **kw)

plot_deps(G, 'iris')

In [8]:

plot_deps(G, 'oceans')

And while reading about networkX I found this answer on stackoverflow and voilà:

In [9]:

for pac in sorted(packs):
    try:
        deps_pacs = list(nx.dfs_edges(G, pac))
    except KeyError:  # No dep.
        continue
    print(pac)
    spacer = {pac: 0}
    for prereq, target in deps_pacs:
        spacer[target] = spacer[prereq] + 2
        fmt = '{spacer}+-{t}'.format
        print(fmt(spacer=' ' * spacer[prereq], t=target))
    print('')

airsea
+-numpy

compliance-checker
+-lxml
+-owslib
+-python-dateutil
+-wicken
  +-petulant-bear
    +-numpy
    +-netcdf4
+-udunitspy
  +-udunits2
  +-numexpr
+-requests

fiona
+-six
+-gdal

folium
+-numpy
+-jinja2
+-openpyxl
+-pandas

gensim
+-scipy
+-six

ipython
+-pyreadline

iris
+-netcdf4
+-geos
+-matplotlib
+-biggus
+-pyke
+-scipy
+-shapely
+-cartopy
+-numpy
+-udunits

mplexporter
+-matplotlib

mplleaflet
+-jinja2
+-mplexporter
  +-matplotlib

oceans
+-netcdf4
+-gsw
+-matplotlib
+-ctd
+-seawater
  +-numpy
+-scipy
+-pandas
+-shapely
  +-geos

octant
+-basemap
+-numpy
+-netcdf4
+-matplotlib

paegan
+-netcdf4
+-scipy
+-pytz
+-python-dateutil
+-numpy
+-shapely
  +-geos

petulant-bear
+-lxml
+-numpy
+-netcdf4

prettyplotlib
+-brewer2mpl
+-matplotlib

pymc
+-scipy
+-numpy

pyoos
+-paegan
  +-netcdf4
  +-scipy
  +-pytz
  +-python-dateutil
  +-numpy
  +-shapely
    +-geos
+-lxml
+-owslib
+-fiona
  +-six
  +-gdal
+-requests
+-beautifulsoup4

pyugrid
+-numpy
+-netcdf4

rasterio
+-enum34
+-affine
+-click
+-gdal

rdflib
+-pyparsing
+-isodate
+-html5lib

seawater
+-numpy

shapely
+-geos

sparqlwrapper
+-rdflib
  +-pyparsing
  +-isodate
  +-html5lib

tappy
+-scipy
+-numpy

thredds_crawler
+-requests
+-lxml

udunitspy
+-udunits2
+-numexpr

ulmo
+-lxml
+-pytables
+-isodate
+-appdirs
+-suds
+-requests
+-beautifulsoup4
+-numpy
+-pandas
+-mock

wicken
+-petulant-bear
  +-lxml
  +-numpy
  +-netcdf4

That is some nice and simple text-graph visualization. Bare in mind the dependency information is as good as the guy who wrote the meta.yaml (ipython, for example, has several other dependencies that are not listed in the file.)

To end the post I will leave a cool D3 graph rendered inside the IPython notebook. Courtesy of the IPython Cookbook by Cyrille Rossant. (Sadly I could not figure out how to show it in the rendered HTML, you will have to run the notebook locally to see it.)

In [10]:

import json
from IPython.display import HTML
from networkx.readwrite import json_graph

for n in G:
    G.node[n]['name'] = n

data = json_graph.node_link_data(G)
with open('graph.json', 'w') as f:
    json.dump(data, f, indent=4)

In [11]:

%%html
<div id="d3-example"></div>
<style>
.node {stroke: #fff; stroke-width: 1.5px;}
.link {stroke: #999; stroke-opacity: .6;}
</style>

In [12]:

%%javascript
// http://nbviewer.ipython.org/github/davidrpugh/cookbook-code/blob/master/notebooks/chapter06_viz/04_d3.ipynb
// We load the d3.js library from the Web.
require.config({paths: {d3: "http://d3js.org/d3.v3.min"}});
require(["d3"], function(d3) {
    // The code in this block is executed when the 
    // d3.js library has been loaded.
    
    // First, we specify the size of the canvas containing
    // the visualization (size of the <div> element).
    var width = 600,
        height = 600;

    // We create a color scale.
    var color = d3.scale.category10();

    // We create a force-directed dynamic graph layout.
    var force = d3.layout.force()
        .charge(-120)
        .linkDistance(30)
        .size([width, height]);

    // In the <div> element, we create a <svg> graphic
    // that will contain our interactive visualization.
    var svg = d3.select("#d3-example").select("svg")
    if (svg.empty()) {
        svg = d3.select("#d3-example").append("svg")
                    .attr("width", width)
                    .attr("height", height);
    }
        
    // We load the JSON file.
    d3.json("graph.json", function(error, graph) {
        // In this block, the file has been loaded
        // and the 'graph' object contains our graph.
        
        // We load the nodes and links in the force-directed
        // graph.
        force.nodes(graph.nodes)
            .links(graph.links)
            .start();

        // We create a <line> SVG element for each link
        // in the graph.
        var link = svg.selectAll(".link")
            .data(graph.links)
            .enter().append("line")
            .attr("class", "link");

        // We create a <circle> SVG element for each node
        // in the graph, and we specify a few attributes.
        var node = svg.selectAll(".node")
            .data(graph.nodes)
            .enter().append("circle")
            .attr("class", "node")
            .attr("r", 5)  // radius
            .style("fill", function(d) {
                // The node color depends on the club.
                return color(d.club); 
            })
            .call(force.drag);

        // The name of each node is the node number.
        node.append("title")
            .text(function(d) { return d.name; });

        // We bind the positions of the SVG elements
        // to the positions of the dynamic force-directed graph,
        // at each time step.
        force.on("tick", function() {
            link.attr("x1", function(d) { return d.source.x; })
                .attr("y1", function(d) { return d.source.y; })
                .attr("x2", function(d) { return d.target.x; })
                .attr("y2", function(d) { return d.target.y; });

            node.attr("cx", function(d) { return d.x; })
                .attr("cy", function(d) { return d.y; });
        });
    });
});

In [13]:

HTML(html)

Out[13]:

This post was written as an IPython notebook. It is available for download or as a static html.

python4oceanographers by Filipe Fernandes is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://ocefpaf.github.io/.

python4oceanographers

Turning ripples into waves

Analyzing software dependencies with networkX

Comments