Package management


  • xarray and iris are the core Python libraries used in the atmosphere and ocean sciences.
  • Use conda to install and manage your Python environments.

Data processing and visualisation


  • Libraries such as xarray can make loading, processing and visualising netCDF data much easier.
  • The cmocean library contains colormaps custom made for the ocean sciences.

Functions


  • Define a function using def name(...params...).
  • The body of a function must be indented.
  • Call a function using name(...values...).
  • Use help(thing) to view help for something.
  • Put docstrings in functions to provide help for that function.
  • Specify default values for parameters when defining a function using name=value in the parameter list.
  • The readability of your code can be greatly enhanced by using numerous short functions.
  • Write (and import) modules to avoid code duplication.

Command line programs


  • Libraries such as argparse can be used the efficiently handle command line arguments.
  • Most Python scripts have a similar structure that can be used as a template.

Version control


  • Use git config to configure a user name, email address, editor, and other preferences once per machine.
  • git init initializes a repository.
  • git status shows the status of a repository.
  • Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).
  • git add puts files in the staging area.
  • git commit saves the staged content as a new commit in the local repository.
  • Always write a log message when committing changes.
  • git diff displays differences between commits.
  • git restore recovers old versions of files.

GitHub


  • A local Git repository can be connected to one or more remote repositories.
  • You can use the SSH protocol to connect to remote repositories.
  • git push copies changes from a local repository to a remote repository.
  • git pull copies changes from a remote repository to a local repository.

Vectorisation


  • For large arrays, looping over each element can be slow in high-level languages like Python.
  • Vectorised operations can be used to avoid looping over array elements.

Defensive programming


  • Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do.
  • You can raise exceptions in your own code.
  • Put try-except blocks in programs to catch and handle exceptions.
  • Put assertions in programs to check their state as they run.
  • Use a logging framework instead of print statements to report program activity.
  • The are more advanced lessons you can read to learn about code testing.

Data provenance


  • It is possible (in only a few lines of code) to record the provenance of a data file or image.

Large data


  • Libraries such as dask and xarray can make loading, processing and visualising netCDF data much easier.
  • Dask can speed up processing through parallelism but care may be needed particularly with data chunking.