Package management
- xarray and iris are the core Python libraries used in the atmosphere and ocean sciences.
- Use conda to install and manage your Python environments.
Data processing and visualisation
- Libraries such as xarray can make loading, processing and visualising netCDF data much easier.
- The cmocean library contains colormaps custom made for the ocean sciences.
Functions
- Define a function using
def name(...params...)
. - The body of a function must be indented.
- Call a function using
name(...values...)
. - Use
help(thing)
to view help for something. - Put docstrings in functions to provide help for that function.
- Specify default values for parameters when defining a function using
name=value
in the parameter list. - The readability of your code can be greatly enhanced by using numerous short functions.
- Write (and import) modules to avoid code duplication.
Command line programs
- Libraries such as
argparse
can be used the efficiently handle command line arguments. - Most Python scripts have a similar structure that can be used as a template.
Version control
- Use
git config
to configure a user name, email address, editor, and other preferences once per machine. -
git init
initializes a repository. -
git status
shows the status of a repository. - Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).
-
git add
puts files in the staging area. -
git commit
saves the staged content as a new commit in the local repository. - Always write a log message when committing changes.
-
git diff
displays differences between commits. -
git restore
recovers old versions of files.
GitHub
- A local Git repository can be connected to one or more remote repositories.
- You can use the SSH protocol to connect to remote repositories.
-
git push
copies changes from a local repository to a remote repository. -
git pull
copies changes from a remote repository to a local repository.
Vectorisation
- For large arrays, looping over each element can be slow in high-level languages like Python.
- Vectorised operations can be used to avoid looping over array elements.
Defensive programming
- Program defensively, i.e., assume that errors are going to arise, and write code to detect them when they do.
- You can raise exceptions in your own code.
- Put try-except blocks in programs to catch and handle exceptions.
- Put assertions in programs to check their state as they run.
- Use a logging framework instead of
print
statements to report program activity. - The are more advanced lessons you can read to learn about code testing.
Data provenance
- It is possible (in only a few lines of code) to record the provenance of a data file or image.
Large data
- Libraries such as dask and xarray can make loading, processing and visualising netCDF data much easier.
- Dask can speed up processing through parallelism but care may be needed particularly with data chunking.