The Github Workflow

<div class= info>

Yesterday:

  • git and version control

In this lecture:

  • Readme files & Markdown
  • Organizations & Teams
  • AUTHORS & CONTRIBUTING.md
  • Copyright & Licences
  • Github flow
    • Code Review & Pull Requests
    • Merging, Squashing, Rebasing
  • Github pages & Jekyll, wikis

</div>

<div class= warn>

Contact details:

  • Dr. James Percival
  • 4.85 RSM building
  • email: j.percival@imperial.ac.uk
  • Slack: @James Percival in #General & #Random, or DM me.

</div>

By the end of this lecture you should:

  • Know a little bit about Markdown.
  • Understand the basics of code licensing.
  • Know the github flow.
  • Understand the basic concepts of code review.
  • Have tried some paired programming.

Revision on git

Git is a distributed Automated Version Control/Revision Control tool, which can track, revert and tag changes in simple data files such as the text of program source code. You should have already configured your system to know who you are by running a commmand like:

git config --global user.name "Gerard Gorman"
git config --global user.email "g.gorman@imperial.ac.uk"

git cheatsheet

  • Make a new repository (place to store files)
    git init my_new_repo
    
  • Check for changes
    git status
    
    or without untracked files
    git status -uno
    
  • Copy an existing repository
    git clone https://github.com/jrper/my_new_repo
    
  • Store the current state of some files in the working directory
    git add letter_to_granny.txt *.png
    git commit -m "Update my letter to Granny."
    

Remember that for each git command help is available:

git help
git add -h

GitHub.com and the octocat

octocat

GitHub.com is a web-based repository and collaboration system for code (and other stuff) controlled via the git version control system. Octocat is the name of its mascot, who also has a useful profile page to practise cloning and forking repositories from. Recently purchased by Microsoft, GitHub is also the home for a large number of major open source projects, including numpy.

Similar websites:

  • bitbucket Originally limited to the Mercurial (hg) revision control tool. Now services git as well.
  • gitlab Perhaps the most similar to github.
  • sourceforge One of the earliest free code-hosting sites.
  • launchpad Built around the bazaar (bzr) revision control tool.

<div class= exercise>

Exercise:

  1. Create a new public repository on GitHub.
  2. Create a new private repository on GitHub (you need to have [registered as a student](https://education.github.com/discount_requests/new) on GitHub to get private repositories).
  3. Clone that repository to your local machine using git clone <url>.
  4. Add the Python package you worked on on Tuesday to the local repository using
    • git add <files> and
    • git commit -m "<log message>".
  5. Push it back to GitHub.
  6. Look at the repositories created by another student.
  7. [Fork that repository](https://help.github.com/articles/fork-a-repo/) using the green fork button.

</div>

Organizations, Collaborators & Teams

GitHub public repositories can be searched and read by anyone, whether logged in or anonymously. To write to a repository, or to administer to it (i.e. have control over deletion, renaming and write access) both require authentication and express permissions. Meanwhile, private repositories can only be accessed at all by those authenticated users with proper permissions. Only paying (and educational) accounts can create new private repositories.

GitHub accounts can either be individual (i.e. personal) or for organizations (i.e. companies, project communities & formal groups). Any existing GitHub account can create an manage a new organization, however only one individual account is allowed per username, and your interactions with GitHub are linked to your personal identity (i.e. to your individual account). Each code repository must exist under an account (whether an individual or an organization) and has a standard URL assigned to it

https://github.com/<account_name>/<repository_name>

Adding other users to a repository

GitHub is based around collaboration, so it's natural to want to interact with repositories you don't own. The easiest way to grant permissions for another GitHub user is to add them as an external collaborator. This works for both individual repositories and for organizations.

Members of organizations can be assigned to subgroups called "teams", each of which can be given read, write or admin rights to the repositories that organization owns. This gives better mass controls for projects with a large numbers of people. You will revisit the team structure when you start the mini-projects with Gareth Collins later this term.

<div class= exercise>

Exercises: git + GitHub revision

  1. Invite another student to collaborate on your private repository, and ask them to do the same for you.
  2. Clone the other student's repository to your local computer using git clone.
  3. Make, commit and push a change.
  4. Have a look at the updates in the web interface page for the repository.

</div>


A Python GitHub project

Extending our example project from Tuesday, a version of the project stored on GitHub might look like

.gitignore
.travis.yml
AUTHORS
CONTRIBUTING
docs/
    conf.py
    index.rst
LICENSE.txt
mycoolproject/
    __init__.py
    cool_module.p
    another_cool_module.py
    tests/
        test_mycoolproject.py
requirements.txt
README.md
setup.py

You will see several changes. The .gitignore file is just the same as that used for vanilla git, containing a list of patterns for files git shouldn't track in your working directory. These files aren't automatically listed with git status or added with git add -a:

.gitignore:

txt
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/


# Sphinx documentation
docs/_build/
/docs/html
/docs/pdf

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

The tests/ and the .travis.yml file will be covered in tomorrow's lecture on testing and continuous integration so we will not cover them in detail today.

However, let's briefly discuss the other new files.

README.md and Markdown

One of the most visible files on any GitHub project is the README.md file placed in the repository root directory (or in a docs/ or in a hidden .github/ directory). GitHub automatically runs this file through a Markdown preprocessor and displays the resulting html output on the main landing page of the product.

There exist numerous pages giving recommendations for good content, but at the very least you should address the following:

  • What is this code do?
  • How to make it work?
  • Who will use this code?

Describing what you code does (or will do) is naturally very important.

For a Python project, the usual answer to the "how?" question is to include pip installation instructions, whether from the PyPi repository, or directly from source downloaded from the GitHub project page.

"Who?" will depend on the content of the project. You may want to invite anyone to use your code, or you may be trying to solve a very specific problem, and know of another project to use for the general case.

Markdown

Github Flavoured Markdown (the easiest way to write your README.md file) is a markup language something like html, but with a very compact and human readable syntax in its native form (In fact quite similar to the reStructured text used with Sphinx for Python documentation, which GitHub also supports). GFM is a GitHub extension (See their guide) to the popular Markdown language.

Markdown uses simple punctuation characters to indicate style and formatting in a plain text file. This makes it a lot less cluttered than the equivalent HTML markup file. For example, html-formatted itemized list looks like

<ul>
    <li>Item one</li>
    <li>Item two</li>
    <li>Item three</li>
</ul>

Meanwhile the Markdown equivalent looks like

gfd
- Item one
- Item two
- Item three

<div class= info>

In fact, Markdown is also the language used to create formatted text blocks inside Jupyter notebooks. If you look at some of the text in this notebook, as well as the previous ones in this course, you will see some more examples of ways to mark up your text.

</div>

As well as the full specification linked to above, there is also a quick cheat-sheet available for many of the more common commands.

Exercise

Create and upload a README.md for your new public repository you created earlier. It should include:

  • A project title
  • A description of the project
  • installation instructions

You can add it either using the web interface (where you can preview your work) or by using git add, git commit and git push commands if you would like the practice.

In particular, try adding lists, links and headings.

Issue Tracking & Project Management

GitHub has limited support for bug tracking on repositories via its issues pages. A separate issues page exists for every repository you create. Issues could be serious ("I used your code and now I'm blind.") or minor ("There is a spelling mistake on the special screen which appears on the 29th of February"). The GitHub interface allows you to assign an "owner" to each one, who can then take charge of dealing with the problem as they see fit.

Compared to some other software on the market, the GitHub Issues interface is relatively generic and lightweight. The principal advantage it has is the integration with the code storage side of things. You can easily link to people and code branches by using GitHub Flavoured Markdown special formatting inside your issues or replies. In particular

  • People can be mentioned using the @ sign (just like Slack, e.g. @jrper).
  • Other issues and pull requests (more later) can be referred to with # (e.g "See issue #7).
  • Branches can be referred to by their SHA1 hash

There is also a (new and currently very limited) project management tool called project boards, allowing you to arrange issues and pull requests (see later) on a timeline of to do versus work in progress versus done. This can be useful to make sure all collaborators are aware of the current priorities and schedule, especially when people are working on different sites, or in different timezones.

<div class= exercise>

Exercise: Raise an issue

Create some issues on your project repository, and on another student's project repository. Try to include some Markdown in your issue.

Start a project board for the project. Organize the issues.

</div>

AUTHORS & CONTRIBUTING

An AUTHORS file is basically a credits (or blame) list for static versions of your project. The typical format is a list of the names of authors (in the contributor's own preferred format), one per line, with an optional email address/webpage as contact details following it within angled brackets.

AUTHORS

Ada Lovelace <ada@babbage.com>
Albert Einstein <a.einstein@princeton.edu>
Bill Gates
Grace Hopper <http://www.cs.yale.edu/homes/tap/Files/hopper-story.html>
Tan Jiazhen
Elon Musk <https://www.spacex.com/elon-musk>
Marie Curie

This is useful both for recognition of your collaborators work, and as a starting point if you ever need to relicense your code (see later). It is perhaps less important than it used to be in the age of popular use of Revision Control/Version Control Softwate, but ensures that authors are credited, even when users receive software via routes other than GitHub.

The CONTRIBUTING file is a recommended addition coming out of the GitHub community. It documents the standards and procedures a project expects contributors to follow.

LICENSE.txt

The LICENSE.txt file is the GitHub standard place to put information dealing with the copyright status and software license under which a project is distributed. When starting a new repository, GitHub gives the option to include a LICENSE.txt matching several popular open source licences, otherwise, you are free to add your own.

<div class= warn>

Warning

I am not a laywer! More specifically, I am not your lawyer. Lawyers spend a lot of money on insurance, so that they are safe to give specific legal advice without the fear of liability. While I will try to be as accurate as possible in the information provided here, don't plan on using these notes as a defence in court. </div>

The author, or commissioner (for work done "for hire" for an employer) of software code has certain property rights (called copyrights) to control the ability of other people to copy and distribute their work, just as the authors of a book or the producers of a film do. Depending on the juristictions involved, and the particulars of, breach of copyright may be either a civil (one person sues another for money or to stop doing something) or a criminal (the State prosecutes an individual, possibly leading to imprisonment).

country UK EU USA China India
copyright period life+70 life+70 life+70 life+50 life+60

There are some exceptions to these time periods. In the UK, "where a work is made by Her Majesty or by an officer or servant of the Crown in the course of his duties" it is placed under Crown Copyright. New Crown copyright material that is unpublished has copyright protection for 125 years from date of creation. Published Crown copyright material has protection for 50 years from date of publication. Meanwhile the copyright to the play Peter Pan (which the author J. M. Barrie gifted to the Great Ormond Street childrens hospital) is specifically legislated to last forever.

Although various methods exist to register the date at which works were created, there is now generally no need to do anything to copyright your work. Your rights exist automatically from the moment of creation (i.e. when you first wrote the code), and continue to exist unless you explicitly give them up, or until the legally mandated time has passed. In fact, in some juristictions specifically some parts of the EU) authors are unable to opt out of their moral rights over their work.

For computer software specifically (a "literary work"), UK copyright laws allow creators to control the acts of:

  • copying,
  • adapting,
  • issuing (i.e distributing),
  • renting and lending copies to the public. The specific relevant legislation is the UK Copyright, Designs and Patents Act 1988 (CPDA 1988) and the EU Directive 91/250/EC (the Software Directive).
Since they aren't paid by the Universities, students in the UK (even Ph.D students) are not employees, and always own the copyright on the code they write by default. On the other hand, work done as part of their job by University staff officially belongs to the University. When working writing software for an employer, while writing your own code in your free time, it's important to separate the two activities. There have been legal cases (particularly in the USA) over copyright when people use work-owned resources (e.g. computers) or even [worked on the same topic](https://www.michalsons.com/blog/employer-v-employee-who-owns-that-copyright/1403) while developing their own code.

Software Patents

In the US in particular (but often not the EU) software can also be patented. This gives a non-trivial idea (an 'invention') additional protections for a limited period of time (often 20 years) against others making, using, sell or import/exporting it. Unlike copyright, this doesn't apply to a specific implementation (expression) but to more general concepts (e.g. the one-click button to buy something, which Amazon held the US patent to until September 2017).

<div class= warn>

Plagurism

Although the two are sometimes confused, academic plagurism is a separate issue from copyright. While a copyright holder can give you permission to use or copy their work, that should not be assumed as permission to pass their work off as your own, which is never academically acceptable. In particular, when you use others' work during this course, you should provide proper attribution, regardless of the licence you obtained it under. Direct copy-pasting of code for assessed exercises is serious academic misconduct, and would have serious implications if discovered.

</div>

That's enough about copyright in general. Next we'll talk specifically about copyright for software, and the Open Source movement.

Free/Libre Open Source Software (FLOSS)

The word "free" in English has two main meanings

  1. Without cost : "Buy one, get one free!"
  2. Unrestrained : "They set the prisoners free."

The free software movement is aimed at ecouraging software to be distributed under terms matching the second meaning.

Stallman's four freedoms:

  • Freedom 0 The freedom to run the program, for any purpose.
  • Freedom 1 The freedom to study how the program works, and change it so it does your computing as you wish. (Access to the source code is a precondition for this.)
  • Freedom 2 The freedom to redistribute copies so you can help your neighbor.
  • Freedom 3 The freedom to distribute copies of your modified versions to others. By doing this you can give the whole community a chance to benefit from your changes. (Access to the source code is a precondition for this.)

Licences grant permissions

As a copyright holder, you can always grant others the ability to use, copy and distribute your software. The easiest and simplest way to do this is to publish a licence together with your code. As a user & developer, ensuring that software you use has a licence with terms compatible with what you intend to do with it prevents long, costly and embarrassing legal action further down the line.

Although in theory you could always write your own licence, few scientists are also lawyers. Because legal text has legal meaning, it is always safer to use one of the well known and well understood existing copyright

The public domain

The "most free" thing you may be able do with code (depending on the local legal system) is to release it into the public domain. This is the same state that literary works are left in after the legally mandated time has expired. At this point, anyone is free to use or reapply the material in any way they see fit.

Since some legal systems (particularly the civil law practised in much of the EU) can make it practically impossible for authors to give up thair "moral rights".

(Lack of) warranties

In many juristictions, especially those based on the English Common Law (including the USA), the transfer or sale of goods or services can imply an implicit warranty that they are fit for the usual purpose the product would be put to. For example an item sold as a "child's high chair" would be expected to take the weight of a child without breaking.

Many FLOSS licences include specific wording attempting (as far as they can) to explicitly deny any such warranty For example the MIT license states:

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Permissive licences versus "copyleft"

Many licences, while retaining copyright over the work and not releasing it into the public domain, otherwise give users relatively unrestricted rights to copy, modify and distribute the code. In particular, they allow the code to be used (often with attribution) as part of greater works released under more restrictive licences (for example, ones which prohibit distributing your own copy of the source for the larger project, or the modified version of the existing code). These are often called "permissive" licences.

On the other hand, a set of licences modelled after the GNU General Public License are intended to ensure that once software is released as "free software, it stays as "free software". As such, they place restrictions on the immediate recipient of the work, in order to ensure that people later down the chain retain their version of the four freedoms:

  • the freedom to use the software for any purpose,
  • the freedom to change the software to suit your needs,
  • the freedom to share the software with your friends and neighbors, and
  • the freedom to share the changes you make.

Specifically, the various versions of the GPL all require that when modified versions of GPL'd projects are distributed, the new version is placed under a GPL licence (e.g. they much also release the source code on demand, and allow other users the right to modify and distribute it). This "carry forward" operation has caused such licences to be called "copyleft" (a play on words from "copyright").

Strong versus weak copyleft.

Various bodies, including the Free Software Federation, the organization behind GNU, have recognised that software is seldom used in isolation. One component interacts with another component, which calls a third component etc. With a "strong" copyleft licence such as the GPL, this requires every piece of code in the ecosystem to also be copyleft. In most practical environments, this is impossible to ensure past a given size, since some components (e.g. the "binary blob" provided to run your graphics card) are liable to be provided under a permissive open source or proprietary commercial licence.

As such, a second class of "weak" copyleft licences, such as the GNU Lesser General Public Library allow their code to be linked to (i.e. called in automated sense) in derivative works by code not under a (L)GPL licence. Specifically, if the code is called or used as a library then no restriction is implied, but if the code of the libray itself is modified then the standard restricitions still apply. The word "lesser" is used in terms of the rights of a theoretical third party user, who may no longer be guaranteed the right to modify the code that links to the original library.

Licence compatibility

Because "copyleft" licences require derivative works to also be released under suitable "copyleft" licences, it is impossible to release packages containing GPL components entirely under more permissive licences such as BSD.

Licence BSD LGPL GPL
BSD Yes No No
LGPL Yes Yes No
GPL Yes Yes Yes

Commercial rights

Some licences make a distinction between "commercial" and "non-commercial" uses. In particular the work may be freely licenced for non-commercial use, with the right reserved to charge a fee for commercial use. In general "commercial use" can be interpretted fairly broadly as related to income-generating use of any kind, whether direct or indirect. This means that for code under a non-commercial license, not only should you not sell the work itself, you probably shouldn't use it in a way that earns money.

Fortunately, academic study and pure research uses are frequently specifically excluded as non-commercial activities, avoiding the awkward question of "the code lets me do my reseach, for which a funding body pays me, is that commercial?". However, this can be an issue when the intellectual property (IP) produced at the end of a project contractually becomes the property of an industrial partner. Many companies (including Imperial College) have lawyers on retainer to deal with this kind of question.

Choosing a licence.

The best time to choose a licence for a project is right at the beginning. Provided all the copyright holders agree, a project can always be relicenced, but collecting this agreement becomes increasingly difficult as time passes and contact details and levels of interest change. As such, the bigger and older a project is, the harder switching a licence becomes.

You may be tempted to create your own licence. For example, you may want to exclude a certain group of people (e.g. weapons manufactures or animal rights activists) from using your code. Alternatively, you may not like one specific element in an existing licence and want to use a licence which is otherwise the same, with that one thing changed.

Unless you have a very, good reason, don't do this. Even if you do have a good reason, you probably shouldn't do this. The popular existing licences are the ones which have been tested and refined in law courts, and increasing the number of licences vastly increases the potential licence incompatibility problems.

Unlicence

Apache

MIT

GPL

BSD/MIT vs GPL

This is more an ethical, rather than a technical question. The "copyleft" licenses ensure that modified versions of your code stay publicly available, at the price of removing some options for how your immediate users can apply your code.

The GitHub flow

Now that you've created a repository with a cool name, a sensible licence and some rules for how to contribute, you need to get round to actually writing some code. When adding new features, it is very easy to break existing ones, either through deliberate acts of evil (these are rare), or accidentally (this is really common). There are a couple of techniques to minimise this potential damage. One of them is to test your code, preferably automatically (we'll revisit this tomorrow). Another is to make use of the GitHub collaboration features to always practise code review, so that no new code is placed into the "production" system until someone other than the original author has examined it.

The GitHub flow can be summarised as:

  1. New work is written in git branches or under a GitHub fork of the original repository.
    git fetch
    git checkout -b feature_branch origin/master
    # do some work
    git add my_new_file.py my_old_file.py
    git commit -m "Add the ability to frotz your foobar."
    # repeat as necessary.
    git push --set-upstream origin feature_branch
    
  2. These branches are kept up-to-date with the main GitHub production branch.
    git fetch
    git merge origin/master
    ## if there are conflicts
    git mergetool
    git commit
    ### otherwise carry on working
    git push
    
  3. When a "unit" of work is finished, the work is git pushed to github (in the branch\fork) and a pull request is opened to inform collaborators with sufficient permission that there is new work to examine.
  4. Another member of the team/organization looks at the new code and checks that:
    • it does not appear malicious.
    • it matches the project code and community standards.
    • it does what it is described as doing
    • the code is tested suitably, and doesn't break anything
  5. If these standards are met, the change is merged into the target branch (often the project master), otherwise the author of the new work is requested to make changes until they are.

<div class= exercise>

Exercise:

  1. Upload a file to a fellow student's repository in a branch.
  2. Raise a pull request for it.
  3. Try and include some Markdown in your pull request message.
</div>

Doing good code reviews:

Code reviews are/can be hard work. Doing good code reviews is even harder. There are several things to keep in mind:

  • What does the pull request claim to do? Are you sure it does it?
  • Is new code placed somewhere you would expect it to be?
  • Are there suitable tests, examples and documentation for new features?
  • What are the project coding standards? Does anything in the submission break them? As much as possible this should be addressed by code linting and automated code testing.
Pair Programming

One method to both practise critiquing code and to improve the standard of code written is pair programming. Here two coders sit together at one computer, planning to write code to solve a specific problem. One programmer 'drives' by controlling the keyboard and mouse, while the other 'navigates' by watching the screen.

As the driver:

  • Concentrate on the lowest level, the "words on the page" of variables etc.
  • Tell the navigator what you're doing and why.
  • Keep typing.

As the navigator:

  • Concentrate on the higher level strategy, "what functions/loops do we need?".
  • Point out mistakes being made, or coming up.
  • Point out alternatives you know about.
  • Ask questions, say what you know.
  • Look up relevant information
  • Don't check your emails!

Above all, keep on topic and keep talking.

There are lots of successful pairing patterns:

  • Driver - Instructor: when the navigator is generally more experienced.
  • Demonstrator - Student: when the driver is generally more experienced.
  • Collaborative Learning: when both driver and navigator are novices.
  • Meeting of minds: When both driver and learner are experts.

    One final note, don't try group coding with many more people than two people.

    "A camel is a horse designed by a committee." attributed to Sir Alec Issigonis

<div class= exercise>

Exercise: Pair Programming

Find a partner and prepare to try paired programming to write some Python. There are several exercises, so be the driver at some points and the navigator at others. It's ok if you don't get the entire code finished, concentrate on the interaction with your partner.

  1. Write a script to find all the proper divisors of an integer and so find pairs of [amicable numbers](https://en.wikipedia.org/wiki/Amicable_numbers). The proper factors of a number are numbers other than itself which it divides by exactly (e.g. the proper divisors of 6 are 1, 2, and 3. Two numbers $n$ and $m$ are an amicable pair if the sum of the factors of $n$ is $m$ and the sum of the factors of $m$ is $n$. Since the sum of the factors of $6$ is $6$, it is a special kind of amicable number called a [perfect number](https://en.wikipedia.org/wiki/Perfect_number).

    For testing, the first amicable pairs are (220, 284) and (1184, 1210).

    Tips:

    • You saw on Tuesday how to find the prime factors of a number, but you need to extend this to get every factor.
    • You should only use each factor once when summing. There is a Python datatype which only stores unique entries.

  2. Write a script to list the names of the numbers from 0 to 100, as strings, in alphabetical order.

    For testing purposes, an acceptable answer for the numbers from 0 up to 5 is

    ['five', 'four', 'one', 'three', 'two', 'zero']

    Tips:

    • In theory you could do this all by hand, but loops will mean writing less code, and will work better when someone asks you to do it from 0-1,000,000
    • The numbers from 0-19 are irregular, but from 21-99 you can often form a string like '%s-%s'5(tens[2], ones[1]), e.g. 'twenty-one'.
    • Dictionaries let you look up a string from a number, see for example the card name code which you PEP8 cleaned on Tuesday.
  3. Write a script to turn a `.png` image upside down. Add a linear gradient in transparancy from bottom to top.

    Tips:

    • `matplotlib.pyplot` has a function `imread`, which can read `.png` files into an array.
    • The [RGBA colourspace](https://en.wikipedia.org/wiki/RGBA_color_space) array format stores the [alpha channel](https://en.wikipedia.org/wiki/Alpha_compositing) value in the last index of the last dimension of the data.

Model answers are available for the [amicable number](https://msc-acse.github.io/ACSE-1/lectures/lecture7-solutions.html#exercise1), [alphabetized numbers](https://msc-acse.github.io/ACSE-1/lectures/lecture9-solutions.html#exercise2) and [inverted image](https://msc-acse.github.io/ACSE-1/lectures/lecture9-solutions.html#exercise3) problems.

</div>

<div class= interlude>

Interlude: Fast Python

One of the biggest criticisms made against the Python programming language is that it is slow. Compared to compiled languages, this is often true. However, not all Python operations happen at the same speed, so that often the biggest reason for code to be slow is that you are doing slow things. In particular, Python loops are known to take a relatively long time to execute. Where possible, vectorize your code using tools like numpy.

</div>

In [11]:
import numpy as np

l = [int(1000000*np.random.random()) for _ in range(100000)]
s = set(l)

print("Time to search list")
%time 500 in l
print("Time to search set")
%time 500 in s
Time to search list
CPU times: user 3.74 ms, sys: 43 µs, total: 3.79 ms
Wall time: 3.81 ms
Time to search set
CPU times: user 7 µs, sys: 1 µs, total: 8 µs
Wall time: 13.1 µs
Out[11]:
False
In [21]:
import numpy as np

N = 1000
a = np.random.random(N)
b = np.random.random(N)

def crude_sum(x, y):
    c = np.empty(x.shape)
    for i, s in enumerate(zip(x, y)):
        c[i] = s[0]+s[1]
    return c

def numpy_sum(x, y):
    return x+y

%time c = crude_sum(a, b)
%time c = numpy_sum(a, b)
CPU times: user 586 µs, sys: 32 µs, total: 618 µs
Wall time: 801 µs
CPU times: user 33 µs, sys: 5 µs, total: 38 µs
Wall time: 41.2 µs

Github pages & wikis

Github provides every repository with free webspace and a free wiki site. These spaces can be used to give detailed information that it would be inappropriate to place in a short README.md

The GitHub pages interface uses software called Jekyll to convert your gh-pages branch into a static webpage. This means that you can rapidly add new pages just by uploading new files containing Markdown highlighted test. In particular, you can make your own personalized blog page with almost no work (apart from writing the content).

<div class= info>

In this lecture we learned:

  • The basics of Markdown
  • To recognise some standard code licensces
  • The fundamentals of the GitHub flow
  • Code review and pair programming.

</div>

Further reading:

In [1]:
# This cell sets the css styles for the rest of the notebook.
# Unless you are particlarly interested in that kind of thing, you can safely ignore it

from IPython.core.display import HTML

def css_styling():
    styles = """<style>
div.warn {    
    background-color: #fcf2f2;
    border-color: #dFb5b4;
    border-left: 5px solid #dfb5b4;
    padding: 0.5em;
    }

div.exercise {    
    background-color: #B0E0E6;
    border-color: #B0E0E6;
    border-left: 5px solid #1E90FF;
    padding: 0.5em;
    }

div.info {    
    background-color: #F5F5DC;
    border-color: #F5F5DC;
    border-left: 5px solid #DAA520;
    padding: 0.5em;
    }

div.interlude {    
    background-color: #E6E6FA;
    border-color: #E6E6FA;
    border-left: 5px solid #4B0082;
    padding: 0.5em;
    }
    
div.assessment {    
    background-color: #98FB98;
    border-color: #228B22;
    border-left: 5px solid #228B22;
    padding: 0.5em;
    }

 </style>
"""
    return HTML(styles)
css_styling()
Out[1]: