Python Scripts, Modules and Packages

16th October 2018 - Version 27


Yesterday:

  • Introduction to the shell

In this lecture:

  • The Python Interpreter
    • Notebooks
    • The command line interpreter
    • The IPython console
  • Python Scripts
    • ways to run a script, shebangs
    • Text encoding
    • PEP8 and pylint - code linters
    • Options parsers
    • `matplotlib` in scripts
  • Python Modules
    • Python docstrings, PEP 257 & numpydoc
    • APIs
    • `import`, `sys.path` & `$PYTHONPATH`
    • Extension Modules
  • Python Packages
    • Directory Structure
    • `setup.py` & `setuptools`
    • `pip` & `conda` installation
    • Introduction to virtual environments
    • Automatic documentation via sphinx
    • Release of Assessment 2: Cellular Automata due __Friday 26th October__

Contact details

:

  • Dr. James Percival
  • Room 4.85 RSM building
  • email: j.percival@imperial.ac.uk
  • Slack: `@James Percival` in #General & Random, or DM me.

By the end of this lecture you should:

  • Understand the difference between Python scripts, modules and packages
  • Know about coding standards, PEP8 and linters
  • Be able to make & install your own Python package.
  • Understand the basics of automatic documentation generation

The many ways to use Python

Python code gets used in at least five different ways:

  1. Jupyter notebooks
  2. Hacking in the interpreter/ipython console
  3. Small, frequently modified scripts
  4. Module files, grouping together useful code for reuse
  5. Large, stable project packages

Please follow along as we look at some of these methods.

Jupyter notebooks

320px-Jupyter_logo.svg.png

These should need no further introduction, since you're currently reading one. Jupyter notebooks combine data permenance, editable code and text comments in the same place.

When a cell is marked as a code cell, and a python kernel is running, it becomes an editable coding environment.

In [ ]:
# we can write and run code here

The Python interpreter

320px-Python_logo_and_wordmark.svg.png

On Windows Anaconda you can type

python

from the Anaconda command prompt to start a basic, no frills python interpreter session. On linux/Mac you may need to use python3 instead.

Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:14:23) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

This is probably the least user friendly way to run an interactive python session, although it is the best supported (most Mac and Linux systems come with Python as a default installation, so it has a very high probability of being installed on machines you are asked to ssh into). The easiest way to quit is to call exit()

The IPython console

IPython_Logo.png

IPython (or Interactive Python) provides a much more "batteries included" Python experience, with a built in history editor, tab completions and inline matplotlib support. Anaconda provides a version, QtConsole in its own Qt window, so that the user experience on Windows, Mac and Linux is virtually identical.

These features should be familiar even to those of you who have only used Jupyter notebooks before, since they are also available inside Jupyter notebook code cells. In fact "under the hood" Jupyter is running an ipython console (the "kernel") to process Python3 code.

Exercise:

Run the following commands

def square_and_cube(n):
    return sorted([i**2 for i in range(n)]+[i**3 for i in range(n)])
print(square_and_cube(3))
print(square_and_cube(10))

in a notebook, in a vanilla Python interpreter and in a QtConsole/IPython console.

  • In each case, try modifying it to also include the 4th power of n.
  • The `sorted` function returns a new sorted list from an iterable. Try accessing the Python online help in the ordinary interpreter to invert the order of the list. Note you'll need to use the `help` function, since the `sorted?` syntax is in IPython only.

Tip: for the IPYthon console, you may find the ipython magic command %paste useful.

Python Scripts

A Python script file is just a regular plain text file containing only valid python code and comments (i.e lines starting with the hash/pound character, #), which the Python interpreter turns into instructions for the computer to perform. Script files are written in the same way you would write Python code in an interactive interpreter.

An example script, rot13.py might look like

#!/usr/bin/env python3
# -*- coding: ascii -*-

import codecs
import sys

print(codecs.encode(sys.argv[1], 'rot13'))

This file reads a string from the command line and applies the ROT-13 cypher, which cycles letters in the Latin alphabet through to the one 13 places forward/backward (i.e. maps A => N, N => A, g => t and so on). This cypher is its own inverse.

Warning

ROT-13 is useful to make text hard to read casually, but is not remotely crypographically secure. Never use it in a situation with any risk attached.

In [ ]:
#inside a notebook, the ! allows calls out to the OS shell
!curl https://msc-acse.github.io/ACSE-1/lectures/rot13.py -o rot13.py
!python rot13.py "Uryyb rirelobql!" 

The above command will only work if the script is in the same directory as the notebook, or your computer is connected to the internet. Inside the IPython console and in notebooks, we can also use the run statement:

Warning

The ! command lets Jupyter notebooks run commands in the operating system with the same privileges that the user (i.e. you) have. Don't just run random notebooks off the internet unless you understand what they're doing.

In [ ]:
%run rot13.py "Uryyb rirelobql!"

Now that we can run a file, lets look at the contents.

Reviewing the contents of a script

Shebangs and executable files

The "shebang line", #!/usr/bin/env python3 tells Linux/MacOSX systems that this script should be run with Python 3. If present This means that on those systems we can also turn the script into an executable file and run it straight off

chmod 755 rot13.py
./rot13.py "This works on Linux/Mac systems"

Warning

Note that the shebang line refers to Python 3 explicitly as python3. This is typical behaviour on computer systems with both Python 3 & Python 2 installed, where python will run Python 2. For those of you running Anaconda on Windows, python means Python 3 there, and the python3 executable does not exist.

Text encoding

The next line # -*- coding: ascii -*- tells python (and possibly your text editor) that the script uses the ASCII (American Standard Code for Information Interchange) text encoding. Text encodings map the numbers that computers are able to store onto the characters that humans can read. If a file is opened using the wrong encoding, then it will either read as nonsense, or contain many blank "unknown" characters.

ascii-table-1.1.png Table above by Tom Gibara CC-BY-SA.

The file doesn't have to be in ASCII. In fact the Python3 default is to use Unicode encoding (utf-8) if no explicit encoding is given. This gives access to characters from You can even use letter-like symobls from the Unicode standard as well as the more usual Latin characters in the names of functions and objects. For example, let's write a more international "Hello World function".

In [ ]:
def 你好(x):
    print('Hello', x)

你好('World!')

Similarly with the default utf-8 encoding you can use any Unicode characters from the standard you like in comments and strings.

In [ ]:
def sorry():
    """😊"""
    return "不好意思, 我不会说中文."
sorry()

Fortunately, you can't actually use emoji in function names.

In [ ]:
def 😊(x):
    return "This doesn't work"

This code will raise a syntax error exception.

Writing a Python script

Since a python script is just a text file, you just need a text editor to write them. Indeed, providing you save it as Plain Text, you could even write it in Microsoft Word (please, please don't). Your lecture on the shell introduced some console text editors which can be used on remote systems, but the Anaconda installation you made also includes Spyder, an Integrated Development Environment (IDE) which makes writing, running and understanding Python scripts easier.

There are many reasons not to write code in Microsoft Word, including the autocorrect tool, which has an annoying tendency to "fix" code keywords like elif in a way which tends to break code. However the most incidious feature (which also affects many code listings on the web) is "smart quotes". Using unicode punctuation like “ and ” or ‘ and ’ instead of the unidirectional ascii " and ' turns Python strings into nonsense.

Some other IDEs (multilanguage):

Text editors with syntax highlighting:

  • Jupyter - as well as notebooks, it can edit plain text files.
  • Emacs (cross platform) Console/Windowed text editor.
  • Nano (cross platform) Console text editor.
  • Notepad++ (Windows only)
  • Vim (cross platform) Console text editor.

Your choice of editting platform is personal, and each individual should find out what works for them. Don't be afraid to experiment, but if you have already spent a lot of time writing code using a tool which supports Python, then we recommend you carry on using it.

240px-Spyder_logo.svg.png Let's open up the Spyder IDE and have a look at the interface.

Exercise: Find the primes

Using Spyder (or your own prefered method), write a Python script to output the first 20 prime numbers.

Tips:

  • One way of doing this uses an outer loop counting how many primes you have, and then code to find the next prime number.
  • Note that a number cannot be prime if it divides by a prime number and that 1 is not prime.
  • If a number is not prime, it must have at least one factor smaller than its own square root. This can be used to improve the efficiency of your search.
  • If a divides b exactly, then a%b==0, which gives a quick test.

When testing your code, you should expect the output for the first 5 primes to be [2, 3, 5, 7, 11].

If you have time, convert the script into a function to calculate all prime numbers smaller than an input, $n$.

A model solution for the script is available. </div>

Reading from the command line with sys.argv

The sys.argv variable is a list of the string arguments given when executing the script, with the first variable (sys.argv[0]) being the name of the script itself. We can use this to communicate with the script from the command line, so that one file can do many things without needing to edit it. For example, the following script counts the number of uses of the letter 'e' in a file:

import sys

e_count = 0
with open(sys.argv[1],'r') as infile:
    for line in infile.readline():
        e_count += line.count('e')
print("There were %d letter e's"%e_count)

argparse and options parsing

To pass more complicated options to ascripy, there is the argparse module, part of the standard python library. This module gives python scripts the (relatively) simple ability to take flags and process (or parse) other complicated inputs.

For full details, you should read the documentation linked to above, but as a simple example, we can write a program which says hello or goodbye depending on how it is called, which takes a name as input, and optionally a title

hello.py:

import argparse
parser = argparse.ArgumentParser()
parser.add_argument("name", nargs='+', help="names to talk to")
parser.add_argument("-t", "--title", type=str,
                    help="title to use", default="")
parser.add_argument("-g", "--goodbye", help="say goodbye",
                    action="store_true")
args = parser.parse_args()

if args.title:
    fullname = " ".join(args.title, *args.name)
else:
    fullname = args.name

if args.goodbye:
    word = "Goodbye"
else:
    word = "Hello"

print("%s %s"%(word, fullname))

Exercise: Find the mean

Write a script to calculate the mean of a sequence of numbers. If you have time, make it take extra options (using the `argparse` module) -b, -o and -x to work with with binary (i.e. base 2, with 101 == 5 decimal), octal (i.e. base 8, with 31 == 25 decimal) and hexadecimal (i.e. base 16 2A == 42 decimal) numbers.

Test your basic script on the following sequences: 1 (mean 1) 1 5 7 13 8 (mean 6.8), 2.5, 4 ,-3.2, 9.3 (mean 3.15).

Also try feeding it no input.

Tips:

For the longer version can use the 2 argument version of the int function to change the base of numbers. For example int('11',2) == 3 and int('3A', 16)==58.

Model answers are provided for the short exercise and for the long version which takes options flags.

interlude

Let's take a break from talking about Python scripts to point out a weird way that python behaves that can sometimes catch people out when writing code.

If you haven't seen this before, try and guess the output produced by repeatedly calling these functions in the cells below.

In [ ]:
def f(tmp=[]):
    """Try to default to have tmp as an empty list."""
    for i in range(4):
        tmp.append(i)
    return tmp

def g(tmp=None):
    """Doing the same thing explicitly."""
    tmp = tmp or []
    for i in range(4):
        tmp.append(i)
    return tmp
In [ ]:
print('f()', f())
print('g()', g())
In [ ]:
print('f()', f())
print('g()', g())

A reminder on using matplotlib in scripts

In scripts run in the terminal, rather than in a notebook or an IPython console, matplotlib may not automatically put interactive plots on screen. In this case, you will need to use the matplotlib.show() or pyplot.show() command to see your figures.

Alternatively, as you learnt last week, you can just use a command like matplotlib.savefig('mycoolplot.png') to write the images to disk without any interaction. The output format is guessed from the filename that you give.

Exercise: Plots in scripts

Write a script to plot the functions $y=\sin(x)$, $y=\cos(x)$ and $y=\tan(x)$ to screen over the range [0,$2\pi$] and then run it in a terminal/prompt.

Make sure to include labels on your axes.

Change the script to output a .png file to disk.

Next do the same to write a .pdf.

Model answers are available.

PEP8 - The Python style guide

Although as you saw earlier non-ACSII function names and comments are allowed in the Python 3 standards, you are strongly discouraged from using them in code which other people are going to see (including the assignments on this Masters course). That is actually one of the recommendations of the Python Style Guide, known as PEP8.

Python Enhancement Proposals (PEPs) are the mechanism through which Python expands and improves, with suggestions discussed and debated before being implemented or rejected. PEP8 describes suggestions for good style in Python code, based on Guido van Rossum (the original Python creator) noting that (with the exception of throw-away scripts) most code is read more often than it is written. As such, the single most important aspect of code is readability, by you and by others.

Note that PEP8 does not cover every single decision necessary in generating Python code in a consistent style. As such, there are many more detailed guides, either at the project level , or for entire organizations. For an example of the former, see the discussion of numpy later in this lecture. For an example of the later, see the Google Python Style Guide. When choosing what to do on your own projects, you are the boss, but PEP8 is a useful minimum (and will gain/lose you marks during the assessed exercises in this course) and it is useful to consider the thinking in the choices other projects make.

Code linters, and static code analysis

For Python, as with many other languages, there exist automated tools which check your code against an encoding of a style guide and point out mistakes. These are known as code linters, by analogy with clothes lint and the lint rollers used in laundries. Like the cleaning tool they remove mess and 'fluff' from your code to leave things looking neat and tidy.

lint_roller_50pc.jpg By Frank C. Müller, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=636140

There are many tools to perform code linting with python, including the lightweight pep8 package, which simply checks for conformity with the basic PEP8 guidelines. Some tools, such as pyflakes and pylint also perform static code analysis. That is, they parse and examine your code, without actually running it, looking for bad "code smells", or for syntax which is guaranteed to fail.

The pylint tool is actually included by default in the Spyder IDE, available under the menu option Source>Run static code analysis, or by pressing F8. You can also elect to turn on automatic pep8 analysis as you type. This is under the Editor window in the Preferences screen. If you select the Code Introspection/Analysis tab, the checkbox is towards the bottom of the page.

Other hints for writing good Python scripts:

  • Explicit is better than implicit.
  • Don't duplicate code, use functions.
  • Try to keep things compact enough to read in one go.
  • Make variable names meaningful if used on more than one line.
  • Simple is often better than clever.
  • Practise the principle of least astonishment.
  • Add comments when they add meaning.

Exercise: Fix the script

Copy the following script into your IDE and run the static analysis tool (`pylint`) on it. Fix the errors and warnings that it gives you.

</div>
value={1:'Ace',11:'Jack',12:'Queen',13:'King'}; 
for _ in range(2,11):
    value[_]=_
suit={0:'Spades',1:'Hearts',2:'Diamonds',3:'Clubs'}
def  the_name_of_your_card(v,s = 0,*args, **kwargs):


   """Name of card as a string.
   """ 
   if (v < 1  or v > 13 or s not in (0,1,2,3)):
      raise ValueError


   return """I have read your mind and predict your card is the %s of %s."""%(value[v], suit[ s])
print( the_name_of_your_card(2,  s= 3))

interlude
If you haven't seen this before, try and guess the output produced by the functions in the cells below. Can you explain what's going on?

In [ ]:
a = [_**2 for _ in range(5)]

for i, k in enumerate(a):
    print('%s: %s'%(i, k))
print('sum:', sum(a))
In [ ]:
b = (_**2 for _ in range(5))

for i, k in enumerate(b):
    print('%s: %s'%(i, k))
print('sum:', sum(b))

Python Modules

A module file has the same format as a script, except it expects to be imported into other files, or into the interpreter directly. This means that a typical module file contains definitions for functions and classes, but doesn't produce any output by itself.

The code for a module, code_mod.py:

"""Wrapper for rot13 encoding.""" 

import codecs

def rot13(input):
    """Return the rot13 encoding of an input string."""
    return codecs.encode(str(input), 'rot13')
In [ ]:
import code_mod
code_mod.rot13("Uryyb rirelobql!")

The import command

The import search path

After looking in the current directory, Python uses the directories inside the sys.path variable, in order, when asked to find files via an import command.

The reload and %reset commands.

The python command reload tells the interpreter to update its record of the contents of a module. This can be useful in a running session if you update a module or package, whether automatically, or by editting the file by hand.

warning

The reload command only updates the contents of the module passed as an argument, not the contents of modules that are imported inside it. In the ipython console and inside Jupyter notebooks, there is also the magic %reset command, which clears the interpreter history and resets things back to their original state

In [ ]:
x=7
print(x)
In [ ]:
%reset
print(x)

Python docstrings

As you were told last week, the text between the """ blocks is called a docstring. It should appear at the top of scripts & module files, (or just below the file encoding, if one is needed) and as the first text lines inside classes or function def blocks. Python uses it to generate help information if asked. This information is store in the object __doc__ variable.

In [ ]:
code_mod.rot13?

There is a PEP, PEP257 which gives suggestions for a good docstring. In particular:

  • One line docstrings should look like
    def mod5(a):
        """Return the value of a number modulus 5."""
        return a%5
    
    I.e. the docstring is a full sentence, ending in a period, describing the effect as a command ("Do this", "Return that").
  • Multiline docstings should start with a one line summary with similar syntax and have the terminating """ on its own line.
  • The docstring of a script should be a "usage" message.
  • The docstring for a module should list the classes &functions (and any other objects) exported by the module, with a one-line summary of each.

numpydoc

The numpy package has its own standards, which are well suited to numerical code, especially code interfacing with numpy. You have already seen examples of the numpydoc style in previous lectures.

In [ ]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as pyplot

def mandelbrot(c, a=2.0, n=20):
    """
    Approximate the local Mandelbrot period of a point. 
    
    Parameters
    ----------
    
    c : complex
        Point in the complex plane
    a : float
        A positive bounding length on the horizon of the point z_n
    n : int
        Maximum number of iterations .
    
    Returns
    -------
    
    int
        i such that |z_i|>a if i < n, NaN otherwise.
    
    """
    
    z = c
    for _ in range(n):
        if abs(z)>a:
            return _
        z = z**2 + c
    return np.nan

dx = np.linspace(-2, 1, 300)
dy = np.linspace(-1.5, 1.5, 300)
x, y= np.meshgrid(dx, dy)
z = np.empty(x.shape)

for i in range(len(dx)):
    for j in range(len(dy)):
        z[i, j] = mandelbrot(x[i, j]+1j*y[i, j],100)
    
    
pyplot.pcolormesh(x, y, z)
pyplot.xlabel('$x$')
pyplot.ylabel('$y$')
pyplot.get_cmap().set_bad('black')

In the numpydoc style, the Parameters and Results sections prescribe the data types (int, float, complex str etc.) of the inputs and outputs of the method. This uses the syntax of a text markup language called reStructured text. We will revisit this later when we introuduce the documentation generator, sphinx.

By default Python practises form of dynamic typing called "duck typing", where "as long as it looks like a duck and quacks like a duck, it's a duck". This can sometimes cause strange problems when the names of functions clash.

In [ ]:
class Duck(object):
    def quack(self):
        print ("Quack!")
        return self
    def fly(self):
        print("Flap, flap, flap")
        return self
        
class Bugs(object):
    def spider(self):
        print("8 legs")
    def fly(self):
        print("6 legs")
        
def takeoff(x):
    return x.fly()

duck = Duck()
takeoff(duck).quack()
bugs = Bugs()
takeoff(bugs).quack()
        

Those of you used to strongly typed languages like C will find the numpydoc specification familiar. The numpydoc docstrings are also a weak example of a wider code design philosophy called design by contract, or programming by contract. In that system, the developer explicitly lists all the assumptions that a function makes about its inputs, as well as the guarantees that it makes about its outputs.

Exercise: Complex square root

Write a function which accepts a real number and returns the complex square roots of that number.

Your function should include a docstring conforming to the numpydoc standard.

Tips:

  • You can use the `sqrt` function in the `math` module to obtain the square root of a positive real number.
  • Python uses the notation `1j` for a unit length imaginary number (which a mathematician would typically denote $i$), where $\sqrt{-1}=\pm 1j$.

Questions: how many complex square roots does each real number have? Is it the same for **every** real number?

A model answer is available.

interlude: Code Quality

Code quality is often a balance between three things:

  • Maintainability: The code is easy to read and to understand.
  • Performance: The code is as fast and secure to run as we can make it.
  • Resources: This is both the size of the machine and the developer time available to address the problem

This is frequently a case of "which two do you want?" As such, there are compromises necessary when designing code. However, it's important that they are recognised, and only made when appropriate. To quote Donald Knuth

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

The minute that code is going to be read a second time (including by you in two months time) then it becomes unacceptable to write it as though it is disposable. Functions need docstrings, and variables should have names which make sense (and not just to you now).

Similarly, when you've tested your code, and you know that a specific function takes 90% of the runtime, it may make sense to rewrite it in a faster way, even if that is harder to maintain (more numpy, writing your own C extension modules, and so on).

Combined files

A file can be both a script and a module providing you use a special if test:

rot13m.py:

import codecs

# module definitions

def rot13(input):
    """Return the rot13 encoding of an input"""
    return codes.encode(str(input), 'rot13')

if __name__ == "__main__":
    # Code in this block runs only as a script,
    # not as an import
    import sys
    print(rot13(sys.argv[1]))

Exercise: A primes module

Make a copy of your script to calculate prime numbers and:

  1. add the ability to read the number of primes to output from the command line,
  2. turn it into a version which can also be used as a module,
  3. test this by running a copy of the interpreter and `import`ing it, then calling your routine.
  4. Try running it from the terminal/Anaconda command prompt using the following syntax:
python -m rot13m "this runs a python module"


See what happens if you change directories.
A model answer is available

Python Packages

Python packages bundle multiple modules into one place, to make installing and uninstalling them easier and to simplify usage. A simple python package just consists of python files inside a directory tree.

A typical template for a basic python package called mycoolproject might look like:

mycoolproject/
    __init__.py
    cool_module.py
    another_cool_module.py
requirements.txt
setup.py

The __init__.py file is slightly special (as is common in python with double underscore names, or dunders), in that it gets read when you run import mycoolproject (or whatever the name of the directory is). The other files can be imported by themselves as mycoolproject.cool_module, mycoolproject.another_cool_module, etc.

A typical package __init__.py file mostly consists of import commands to load functions and classes from the other modules in the package into the main namespace, as well as possibly defining a few special variables itself.

__init.py__

from cool_module import my_cool_function, my_cool_class
from another_cool_module.py import *

Exercise: A primes package

Turn your "find the primes" module file into a package called `primes` by creating a suitable directory structure and an `__init__.py` so that you can access a function to give you the first $n$ primes as well as all primes smaller than $n$.

Try `import`ing your new package from the IPython console. Check that you can call your function.

If you have time, add a function to the package to give you a list of the prime factors of an integer.

A model answer is available.

Setup.py, distutils and setuptools

The setup.py file is a standard name for an install script for python packages. Python even comes with a module in its standard library, distutils, to automate this as much as possible. We will use an enhanced version called setuptools, compatible with the Python package manager, pip. For a simple, python only package the setup.py file might look like the following:

from setuptools import setup

setup(
    name='mycoolproject',  # Name of package, required
    version='1.0.0',  # Version number, required
    packages=['mycoolproject'],  # directories to install, required
    # One-line description or tagline of what your project does
    description='A sample implementation of quaternions.',  # Optional
    url='https://www.mycoolproject.com',  # Optional
    author='James Percival',  # Optional
    author_email='j.percival@imperial.ac.uk',  # Optional
)

This script can be run in several modes. For pure python packages, the most useful is

python setup.py install

This copies the files into a directory in the standard search path.

Version Numbers

There are many formats for version numbers used in software development. These range from the absurdly simple (build 1, build 2, build 3 ...) to the complicated (the Linux kernel has versions like 4.15.0-36-generic), to the very unusual (The TeX typesetting system is currently on version 3.14159265, with a successive digit of $\pi$ added with each new version). As is often the case, there is even a PEP about it (PEP440).

Unless you have a good reason to do something different "semantic versioning" is a convenient standard to stick with. This is just an ordered set of three integers, separated by dots, e.g. 0.2.3 or 13.4.2. The structure is (major version).(minor version).(patch version), where a major version increment (e.g. from 10.2.3 to 11.0.0) implies big changes in the code, which are likely to break code built with previous versions, while a minor increment means small changes which might cause problems. Incrementing the patch version implies only bug fixes, while not changing any APIs.

Because differences in major versions can prevent people upgrading, it's commmon to "backport" fixes and features from the mainline trunk of development back to new minor versions of the previous generation of code. A good example is Python itself, where version 2.7.0 was released on July 3rd, 2010 (it's now up to 2.7.13), whereas Python 3.0 was released on December 3rd, 2008.

Some communities (e.g. the Linux kernel developers) add on additional meaning to the semantic numbers. For example a common scheme is that odd minor versions are "development" or "unstable", whereas even numbers are for general release, or "stable". That means there are more likely to be bugs (and thus more patches) in the unstable versions of releases of the code base, but new features appear there first.

The pip and conda package managers

Although you can install packages yourself by hand, it is more useful to use a tool, called a "package manager", to control things. This allows for easier installs, uninstalls and sandboxing (described in the next section).

Your Anaconda installation comes with two inbuilt package managers, conda, specially written for itself and pip, which is more widely available. Since conda understands about pip, we will describe that tool in more detail here.

Dependencies

An individual Python package typically has its own Python dependecies (i.e. other packages which this package itself imports). A requirements.txt file consists of a list of package names (one per line), possibly also indicating a minimum or exact version number to be installed.

requirements.txt

txt
jupyter
numpy >= 13.1.0
scipy == 1.0.0
mpltools

The lines with just the name allow any version, the lines with >= demand a version which is "greater than or equal to" that specified (where eg. 2.0.0 > 1.9.1 and 1.2.0 > 1.1.9) and the lines with == demand a specific version. The packages listed in the requirements.txt file, or at least suitable versions of them can then be pip installed in one go, via the compact command:

pip install -r requirements.txt

The conda manager accepts similar files in a format called .yml or .yaml (short for "yet another markup language", or possibly "YAML ain't markup language"). YAML files are normally used for software configuration, where data elements mostly consist of named strings and lists. A conda environment.yml file looks like

environment.yml

name: acse
dependencies:
 - jupyter
 - numpy
 - scipy
 - pip:
   - mpltools

Exercise:

Make a `setup.py` script for your module and try `install`ing and `uninstall`ing it using `pip`. In the directory containg the `setup.py` file run

pip install .`

and

pip uninstall <the name of your module>

From another directory, see when you can and can't import your new module.

More programming exercises.

The website Project Euler contains a large number of computational mathematics problems which can be used as exercises in any language to practise thinking algorithmically (warning, some of them use complicated mathematics). We will list a few here:

Exercise: Problem 1

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

Exercise: Problem 5

2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.

What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?

Exercise: Problem 8

Consider the 1000 digit number

73167176531330624919225119674426574742355349194934 96983520312774506326239578318016984801869478851843 85861560789112949495459501737958331952853208805511 12540698747158523863050715693290963295227443043557 66896648950445244523161731856403098711121722383113 62229893423380308135336276614282806444486645238749 30358907296290491560440772390713810515859307960866 70172427121883998797908792274921901699720888093776 65727333001053367881220235421809751254540594752243 52584907711670556013604839586446706324415722155397 53697817977846174064955149290862569321978468622482 83972241375657056057490261407972968652414535100474 82166370484403199890008895243450658541227588666881 16427171479924442928230863465674813919123162824586 17866458359124566529476545682848912883142607690042 24219022671055626321111109370544217506941658960408 07198403850962455444362981230987879927244284909188 84580156166097919133875499200524063689912560717606 05886116467109405077541002256983155200055935729725 71636269561882670428252483600823257530420752963450

The four adjacent digits in this number that have the greatest product are 9 × 9 × 8 × 9 = 5832. Find the thirteen adjacent digits in the 1000-digit number that have the greatest product. What is the value of this product?

Assessment 2: Cellular Automata

You should now know the URL to sign up for the second assessed exercise via GitHub Classrooms. Please follow it and look at your repository for a README file giving an overview of the problem and a .pdf file explaining the exercise in detail.

In this lecture we learned:

  • The difference between Python scripts, modules and packages.
  • Code standards and code linters.
  • To make & install your own Python package.

Tomorrow:

  • Version control and git.

Further Reading:

In [7]:
# This cell sets the css styles for the rest of the notebook.
# Unless you are particlarly interested in that kind of thing, you can safely ignore it

from IPython.core.display import HTML

def css_styling():
    styles = """<style>
div.warn {    
    background-color: #fcf2f2;
    border-color: #dFb5b4;
    border-left: 5px solid #dfb5b4;
    padding: 0.5em;
    }

div.exercise {    
    background-color: #B0E0E6;
    border-color: #B0E0E6;
    border-left: 5px solid #1E90FF;
    padding: 0.5em;
    }

div.info {    
    background-color: #F5F5DC;
    border-color: #F5F5DC;
    border-left: 5px solid #DAA520;
    padding: 0.5em;
    }

div.interlude {    
    background-color: #E6E6FA;
    border-color: #E6E6FA;
    border-left: 5px solid #4B0082;
    padding: 0.5em;
    }
    
div.assessment {    
    background-color: #98FB98;
    border-color: #228B22;
    border-left: 5px solid #228B22;
    padding: 0.5em;
    }

 </style>
"""
    return HTML(styles)
css_styling()
Out[7]: