Using git to edit and publish code#

Andrew Delman, 2024-09-29

For newcomers to Github, this tutorial covers some of the basic git commands that you can use to manage code changes in a given repository, and publish those changes. To illustrate this process, we will make some “test” changes to the example tutorial notebook. For further reference, the Atlassian Git tutorial is a good resource.

Getting ready to make changes to a repo
Making changes in a git repo
Updating your repo with changes from upstream
Sharing your changes with the upstream repo

Getting ready to make changes to a repo#

The following schematic illustrates the steps that a user might take before making code changes that will be contributed back to the shared remote repository.

Getting ready to write and edit

In the previous tutorial the setup included the first two steps: (1) forking the repo so that you have an individually-owned repo to base your changes, and (2) cloning that repo so you have it on your local machine.

Creating a branch#

When writing or revising code (including Jupyter notebooks), it is a good practice to first create a branch. Each branch is a place where you make a coherent set of changes to a repo (e.g., writing a new notebook, adding a new feature to your code). This helps organize contributions when they are merged into the remote repo. Let’s open a terminal window in JupyterHub and navigate to the ecco-2024 directory/repo: cd ~/ecco-2024. We can make sure we are in the ecco-2024 repo by running the command:

(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status

On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

This confirms that you are in the main branch of the ecco-2024 repository, and any changes you make will be implemented in that branch. But now we create a “topic” branch called test_changes:

(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git branch test_changes
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git branch -a
  git_tutorial
* main
  test_changes
  remotes/origin/HEAD -> origin/main
  remotes/origin/main

The last command git branch -a shows all the branches that git is aware of for this repository, on both local and remote repos. We can see the new branch created in this list.

Note that a quick check of git status shows that you are not in the new branch yet, and any changes you make will still be associated with the main branch. So let’s move to the new branch using git checkout.

(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status
On branch main
Your branch is ahead of 'origin/main' by 2 commits.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean

(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git checkout test_changes
Switched to branch 'test_changes'
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status
On branch test_changes
nothing to commit, working tree clean

We have confirmed that we are now working in the test_changes branch.

[!NOTE] It’s a good idea to check git status frequently while working in your repo, and definitely before you add or commit any changes you have made. git status will quickly tell you what branch you are working in and which files have changes pending, or are not yet tracked by git.

Making changes in a git repo#

Change the example notebook#

Now we’re going to make a small change to the example/tutorial-notebook.ipynb notebook. In the JupyterHub left sidebar, navigate to the ecco-2024/book/tutorials/example folder if you are not there already, and double-click on tutorial-notebook.ipynb to open it.

Scroll down to the cell with bbox = [-108.3, 39.2, -107.8, 38.8] and change the bounding box limits to bbox = [-118.2, 34.2, -118.1, 34.1] so that the box is more or less centered on Pasadena, CA. Run the notebook at least through the Interactive visualization section to confirm this change. Save your changes (using Ctrl-S or equivalent).

Move between branches#

Moving around your local repo

A check of git status in the terminal window shows that tutorial-notebook.ipynb has indeed been modified in your working directory, but these changes have not been staged or committed.

(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status
On branch test_changes
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   book/tutorials/example/tutorial-notebook.ipynb

no changes added to commit (use "git add" and/or "git commit -a")

To put it another way, the HEAD “reference pointer” for the current branch is still pointing to the notebook before it was changed. If we leave this branch and move back to main or another branch, the changes that you have made could in theory disappear. (In practice, git is very good about warning you and not letting you make mistakes like this.) So before we leave this branch we can “stash” our working directory changes, so we can restore them when we return.

(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git stash
Saved working directory and index state WIP on test_changes: bdc8fed Merge remote-tracking branch 'origin/main'

The state of the working directory has been reset to HEAD, but the changes you were just working on have been stored. Now it is safe to leave the branch (you can check git status to be sure). Let’s go back to the main branch:

(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git checkout main
Switched to branch 'main'
Your branch is ahead of 'origin/main' by 2 commits.
  (use "git push" to publish your local commits)

(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status
On branch main
Your branch is ahead of 'origin/main' by 2 commits.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean

You can see that the main branch doesn’t have any of the changes we made on the other branch. You can open up tutorial-notebook.ipynb to verify this.

Now return to the test_changes branch:

(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git checkout test_changes
Switched to branch 'test_changes'
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status
On branch test_changes
nothing to commit, working tree clean

Let’s restore the stashed changes to our working directory:

(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git stash pop
On branch test_changes
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   book/tutorials/example/tutorial-notebook.ipynb

no changes added to commit (use "git add" and/or "git commit -a")
Dropped refs/stash@{0} (69b55857c19738e45c134e447b19febaf22aa970)

And now we can see our changes again.

Updating your repo with changes from upstream#

Updating your repos

While you are working, other contributors may be merging their changes into the shared upstream repo. If you and other contributor(s) are working on the same file, then you may have a merge conflict between the two sets of changes. It’s best to update your local repos from the remote before pushing your own changes…if you do have a merge conflict, these are easier to work out locally.

Updating main with upstream changes#

Here we demonstrate three ways of updating your local repos with remote changes:

  • git pull

  • git fetch and then git merge

  • git fetch and then git reset --hard

The first two methods are essentially the same. While the first method is simpler, the second can be useful if you want to “fetch” a record of the changes to your local machine first, and then merge later (e.g., when you may not have an Internet connection). The last method is different in that it will overwrite the local branch with the remote; this can result in cleaner updates but local work can potentially be lost, so use with caution.

Note: git pull and git merge will both attempt what is called a fast-forward merge, but this is only possible if updated commits from the remote branch can be smoothly replayed on the local branch. If not, there are other types of merges that involve a specific commit dedicated to the merge.

git pull#

Assuming you already added the upstream repo as a remote, you can pull from the main branch at the upstream repo. To pull to your local version of the main branch, you first need to check out main if you are not already on it.

(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git checkout main
Switched to branch 'main'
Your branch is ahead of 'origin/main' by 2 commits.
  (use "git push" to publish your local commits)
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git pull upstream main
Enter passphrase for key '/home/jovyan/.ssh/id_ed25519':

[!NOTE] If you get an error message when using your SSH key such as WARNING: UNPROTECTED PRIVATE KEY FILE!, then it means the permissions for your private key are too open. You may keep getting this message periodically as OSS seems to reset the permissions on it periodically for some reason. This can be resolved by making your private key read-only by you (the owner of the file): chmod 400 ~/.ssh/id_ed25519

When invoking git pull or git merge, the terminal window may show a “merge commit” message window which looks something like

  GNU nano 6.2                                                /home/jovyan/ecco-2024/.git/MERGE_MSG  
Merge remote-tracking branch 'upstream/main'
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.

The uncommented line Merge remote-tracking branch 'upstream/main' is the merge commit message and can be changed (but does not need to be). When ready to complete the merge, press Ctrl-x to “commit” the merge.

Merge made by the 'ort' strategy.
 book/_toc.yml                                         |   1 +
 book/tutorials/pcluster/Run_MITgcm_on_P-Cluster.ipynb |  10 ++++++++-
 book/tutorials/pcluster/example.bashrc                |   7 ++++++
 book/tutorials/pcluster/pcluster-login.ipynb          |  36 +++++++++++++++++++------------
 book/tutorials/pcluster/reproducing_v4r4.ipynb        | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 book/tutorials/pcluster/run_script_slurm.bash         |  75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 book/tutorials/pcluster_tutorial_index.md             |   1 +
 7 files changed, 225 insertions(+), 15 deletions(-)
 create mode 100644 book/tutorials/pcluster/reproducing_v4r4.ipynb
 create mode 100644 book/tutorials/pcluster/run_script_slurm.bash

git fetch + git merge#

Alternatively you can do the equivalent of git pull in two steps by fetching changes on the upstream repo first, and then merging:

git checkout main
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git fetch upstream
Enter passphrase for key '/home/jovyan/.ssh/id_ed25519': 
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git merge upstream/main
From github.com:ECCO-Hackweek/ecco-2024
 * branch            main       -> FETCH_HEAD
Already up to date.

git reset –hard#

git pull and git merge will work smoothly if the history of the remote branch can be cleanly added to that of the local branch. If this is not possible, merges (while still manageable) can be a little messier. One way to handle this is that you use the main branch of your local repo as a mirror of the upstream repo, and then work out any conflicts with your topic branches locally. A hard reset of your local main to the upstream main will force your local main to match the upstream version.

(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git checkout main
Already on 'main'
Your branch is ahead of 'origin/main' by 13 commits.
  (use "git push" to publish your local commits)
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git fetch upstream
Enter passphrase for key '/home/jovyan/.ssh/id_ed25519': 
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git reset --hard upstream/main
HEAD is now at 4470e10 Merge pull request #42 from owang01/fix_toc

Using git reset --hard will never result in a merge commit, since no merge is happening–the local main or branch is just overwritten by the remote.

Updating origin with changes from upstream (via local repo)#

As the local main has been updated from the upstream repo, git status shows that it is ahead of origin/main, the equivalent branch on your Github fork of the remote repo. To update origin we can use git push with the --force option to override any potential merge conflicts (similar to the --hard option with git reset).

(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status
On branch main
Your branch and 'origin/main' have diverged,
and have 10 and 1 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)

nothing to commit, working tree clean
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git push origin main --force
Enter passphrase for key '/home/jovyan/.ssh/id_ed25519': 
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
To github.com:andrewdelman/ecco-2024.git
 + c383dc5...4470e10 main -> main (forced update)

Merging local main to a local topic branch#

In a linear workflow, you would not need to merge the main branch to a topic branch; instead you would create a new branch from the latest version of the upstream repo, commit your changes to the topic branch quickly, and push them to the remote repo before any potential merge conflicts can occur. But in a collaborative workflow environment, this may not always be possible. So it is good to have the ability to merge updates from other contributors to your topic branch and working directory, especially where they might affect your code development.

After having pulled/merged changes from remote into our local main, we can merge those changes into the topic branch:

git checkout test_changes
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git checkout test_changes
Switched to branch 'test_changes'
Your branch is up to date with 'origin/test_changes'.
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git merge main

A merge commit page may open again. The message can be edited or not. Press Ctrl-x to exit the page and finalize the merge commit.

Merge made by the 'ort' strategy.
 book/_toc.yml                                         |   1 +
 book/tutorials/pcluster/Run_MITgcm_on_P-Cluster.ipynb |  10 ++++++++-
 book/tutorials/pcluster/example.bashrc                |   7 ++++++
 book/tutorials/pcluster/pcluster-login.ipynb          |  36 +++++++++++++++++++------------
 book/tutorials/pcluster/reproducing_v4r4.ipynb        | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 book/tutorials/pcluster/run_script_slurm.bash         |  75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 book/tutorials/pcluster_tutorial_index.md             |   1 +
 7 files changed, 225 insertions(+), 15 deletions(-)
 create mode 100644 book/tutorials/pcluster/reproducing_v4r4.ipynb
 create mode 100644 book/tutorials/pcluster/run_script_slurm.bash
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status
On branch test_changes

nothing to commit, working tree clean

Sharing your changes with the upstream repo#

Sharing your changes

In order to get the changes we just made into the source “upstream” repo, there are four steps:

  • Stage your changes (preparing to make a commit)

  • Commit your changes

  • Push your branch with the changes to your remote repo (“origin”)

  • Merge the branch into the source “upstream” repo with a pull request

Stage and commit changes#

Before we can share the changes we have made with the source repo, we need to “commit” them to the branch we have been working on. Making a “commit” means we are moving the HEAD reference pointer for the branch to include the changes, so that when the branch is pushed or merged elsewhere these changes are included. Making a “commit” also provides a chance to include a short message explaining what feature is being added, or bug is being fixed, etc.

Before committing the changes we need to stage them. Use git add to stage the changes in our notebook.

(notebook) jovyan@jupyter-adelman:~$ cd ~/ecco-2024/book/tutorials/
(notebook) jovyan@jupyter-adelman:~/ecco-2024/book/tutorials$ git add example/tutorial-notebook.ipynb

[!NOTE] Why are “staging” and “committing” two separate steps? Sometimes you may have made changes in a working directory that you want to log as two or more commits, since they deal with different features or issues. So you can add the files for one commit and carry out that commit, and then repeat the process for another commit.

Now check git status:

(notebook) jovyan@jupyter-adelman:~/ecco-2024/book/tutorials$ git status
On branch test_changes
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   example/tutorial-notebook.ipynb

We are ready to commit! Typically git commit is used with -m to include a brief message summarizing the change(s).

(notebook) jovyan@jupyter-adelman:~/ecco-2024/book/tutorials$ git commit -m "changed map bounds"
[test_changes 6fe52a1] changed map bounds
 1 file changed, 21 insertions(+), 5 deletions(-)

The commit message that was included can be seen in the “log”, along with the alphanumeric identifier of the commit:

(notebook) jovyan@jupyter-adelman:~/ecco-2024/book/tutorials$ git log

commit 6fe52a1e20b1ec1dd908089de746964ce56f9b2f (HEAD -> test_changes)
Author: Andrew Delman <andrew.s.delman@gmail.com>
Date:   Sun Sep 29 07:43:41 2024 +0000

    changed map bounds
...

And git status tells us that there are no untracked, unstaged, or uncommitted changes

(notebook) jovyan@jupyter-adelman:~/ecco-2024/book/tutorials$ git status
On branch test_changes
nothing to commit, working tree clean

Push changes to remote repo#

Now the change has been committed, but only to the local repo. To push the new branch with its committed changes to the “origin” repo (i.e., your fork of the online GitHub repository), use git push -u origin *branchname*:

Note: if the branch already exists on the origin repo, the -u and branchname are not needed; new commits can be pushed with git push origin

(notebook) jovyan@jupyter-adelman:~/ecco-2024/book/tutorials$ git push -u origin test_changes
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0760 for '/home/jovyan/.ssh/id_ed25519' are too open.
It is required that your private key files are NOT accessible by others.
This private key will be ignored.
Load key "/home/jovyan/.ssh/id_ed25519": bad permissions
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Uh oh, for some reason the permissions on the private key get changed periodically–more restrictive permissions are needed to authenticate with GitHub. Setting chmod 400 ~/.ssh/id_ed25519 does the trick:

(notebook) jovyan@jupyter-adelman:~/ecco-2024/book/tutorials$ chmod 400 ~/.ssh/id_ed25519
(notebook) jovyan@jupyter-adelman:~/ecco-2024/book/tutorials$ git push -u origin test_changes
Enter passphrase for key '/home/jovyan/.ssh/id_ed25519': 
Enumerating objects: 11, done.
Counting objects: 100% (11/11), done.
Delta compression using up to 4 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 816 bytes | 816.00 KiB/s, done.
Total 6 (delta 4), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (4/4), completed with 4 local objects.
remote: 
remote: Create a pull request for 'test_changes' on GitHub by visiting:
remote:      https://github.com/andrewdelman/ecco-2024/pull/new/test_changes
remote: 
To github.com:andrewdelman/ecco-2024.git
 * [new branch]      test_changes -> test_changes
Branch 'test_changes' set up to track remote branch 'test_changes' from 'origin'.

Create a pull request to merge with upstream repo#

Now that the new branch is in your forked repo online, you can create a “pull request” to merge it with the shared upstream repo. You can see in the output above that a URL is provided to initiate the pull request. Alternatively, if you are signed in to GitHub online and in the repo (either your fork or upstream) you will often see a yellow box that allows you to initiate the pull request.

Pull request 1

Either way, you should arrive at a screen that shows you the branch that will be merged, and you can enter a title and (optional) description for the pull request.

Pull request 2

Once the pull request has been created, a test build of the website with the changes from your branch will be initiated. You should check the diagnostic output at the Details link to see if there are any errors that will prevent your notebook(s) from being rendered correctly. If all looks OK, you can merge your changes into the upstream repo.

Pull request 3

This time, the actual website will be updated with your changes!