Using git to edit and publish code#
Andrew Delman, 2024-09-29
For newcomers to Github, this tutorial covers some of the basic git commands that you can use to manage code changes in a given repository, and publish those changes. To illustrate this process, we will make some “test” changes to the example
tutorial notebook. For further reference, the Atlassian Git tutorial is a good resource.
Getting ready to make changes to a repo
Making changes in a git repo
Updating your repo with changes from upstream
Sharing your changes with the upstream repo
Getting ready to make changes to a repo#
The following schematic illustrates the steps that a user might take before making code changes that will be contributed back to the shared remote repository.
In the previous tutorial the setup included the first two steps: (1) forking the repo so that you have an individually-owned repo to base your changes, and (2) cloning that repo so you have it on your local machine.
Creating a branch#
When writing or revising code (including Jupyter notebooks), it is a good practice to first create a branch. Each branch is a place where you make a coherent set of changes to a repo (e.g., writing a new notebook, adding a new feature to your code). This helps organize contributions when they are merged into the remote repo. Let’s open a terminal window in JupyterHub and navigate to the ecco-2024
directory/repo: cd ~/ecco-2024
. We can make sure we are in the ecco-2024
repo by running the command:
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
This confirms that you are in the main
branch of the ecco-2024
repository, and any changes you make will be implemented in that branch. But now we create a “topic” branch called test_changes
:
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git branch test_changes
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git branch -a
git_tutorial
* main
test_changes
remotes/origin/HEAD -> origin/main
remotes/origin/main
The last command git branch -a
shows all the branches that git is aware of for this repository, on both local and remote repos. We can see the new branch created in this list.
Note that a quick check of git status
shows that you are not in the new branch yet, and any changes you make will still be associated with the main
branch. So let’s move to the new branch using git checkout
.
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status
On branch main
Your branch is ahead of 'origin/main' by 2 commits.
(use "git push" to publish your local commits)
nothing to commit, working tree clean
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git checkout test_changes
Switched to branch 'test_changes'
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status
On branch test_changes
nothing to commit, working tree clean
We have confirmed that we are now working in the test_changes
branch.
[!NOTE] It’s a good idea to check
git status
frequently while working in your repo, and definitely before you add or commit any changes you have made.git status
will quickly tell you what branch you are working in and which files have changes pending, or are not yet tracked by git.
Making changes in a git repo#
Change the example notebook#
Now we’re going to make a small change to the example/tutorial-notebook.ipynb
notebook. In the JupyterHub left sidebar, navigate to the ecco-2024/book/tutorials/example
folder if you are not there already, and double-click on tutorial-notebook.ipynb
to open it.
Scroll down to the cell with bbox = [-108.3, 39.2, -107.8, 38.8]
and change the bounding box limits to bbox = [-118.2, 34.2, -118.1, 34.1]
so that the box is more or less centered on Pasadena, CA. Run the notebook at least through the Interactive visualization
section to confirm this change. Save your changes (using Ctrl-S or equivalent).
Move between branches#
A check of git status
in the terminal window shows that tutorial-notebook.ipynb
has indeed been modified in your working directory, but these changes have not been staged or committed.
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status
On branch test_changes
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: book/tutorials/example/tutorial-notebook.ipynb
no changes added to commit (use "git add" and/or "git commit -a")
To put it another way, the HEAD
“reference pointer” for the current branch is still pointing to the notebook before it was changed. If we leave this branch and move back to main
or another branch, the changes that you have made could in theory disappear. (In practice, git is very good about warning you and not letting you make mistakes like this.) So before we leave this branch we can “stash” our working directory changes, so we can restore them when we return.
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git stash
Saved working directory and index state WIP on test_changes: bdc8fed Merge remote-tracking branch 'origin/main'
The state of the working directory has been reset to HEAD
, but the changes you were just working on have been stored. Now it is safe to leave the branch (you can check git status
to be sure). Let’s go back to the main
branch:
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git checkout main
Switched to branch 'main'
Your branch is ahead of 'origin/main' by 2 commits.
(use "git push" to publish your local commits)
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status
On branch main
Your branch is ahead of 'origin/main' by 2 commits.
(use "git push" to publish your local commits)
nothing to commit, working tree clean
You can see that the main
branch doesn’t have any of the changes we made on the other branch. You can open up tutorial-notebook.ipynb
to verify this.
Now return to the test_changes
branch:
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git checkout test_changes
Switched to branch 'test_changes'
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status
On branch test_changes
nothing to commit, working tree clean
Let’s restore the stashed changes to our working directory:
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git stash pop
On branch test_changes
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: book/tutorials/example/tutorial-notebook.ipynb
no changes added to commit (use "git add" and/or "git commit -a")
Dropped refs/stash@{0} (69b55857c19738e45c134e447b19febaf22aa970)
And now we can see our changes again.
Updating your repo with changes from upstream#
While you are working, other contributors may be merging their changes into the shared upstream repo. If you and other contributor(s) are working on the same file, then you may have a merge conflict between the two sets of changes. It’s best to update your local repos from the remote before pushing your own changes…if you do have a merge conflict, these are easier to work out locally.
Updating main
with upstream changes#
Here we demonstrate three ways of updating your local repos with remote changes:
git pull
git fetch
and thengit merge
git fetch
and thengit reset --hard
The first two methods are essentially the same. While the first method is simpler, the second can be useful if you want to “fetch” a record of the changes to your local machine first, and then merge later (e.g., when you may not have an Internet connection). The last method is different in that it will overwrite the local branch with the remote; this can result in cleaner updates but local work can potentially be lost, so use with caution.
Note:
git pull
andgit merge
will both attempt what is called a fast-forward merge, but this is only possible if updated commits from the remote branch can be smoothly replayed on the local branch. If not, there are other types of merges that involve a specific commit dedicated to the merge.
git pull#
Assuming you already added the upstream repo as a remote, you can pull from the main branch at the upstream repo. To pull to your local version of the main
branch, you first need to check out main
if you are not already on it.
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git checkout main
Switched to branch 'main'
Your branch is ahead of 'origin/main' by 2 commits.
(use "git push" to publish your local commits)
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git pull upstream main
Enter passphrase for key '/home/jovyan/.ssh/id_ed25519':
[!NOTE] If you get an error message when using your SSH key such as
WARNING: UNPROTECTED PRIVATE KEY FILE!
, then it means the permissions for your private key are too open. You may keep getting this message periodically as OSS seems to reset the permissions on it periodically for some reason. This can be resolved by making your private key read-only by you (the owner of the file):chmod 400 ~/.ssh/id_ed25519
When invoking git pull
or git merge
, the terminal window may show a “merge commit” message window which looks something like
GNU nano 6.2 /home/jovyan/ecco-2024/.git/MERGE_MSG
Merge remote-tracking branch 'upstream/main'
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
The uncommented line Merge remote-tracking branch 'upstream/main'
is the merge commit message and can be changed (but does not need to be). When ready to complete the merge, press Ctrl-x
to “commit” the merge.
Merge made by the 'ort' strategy.
book/_toc.yml | 1 +
book/tutorials/pcluster/Run_MITgcm_on_P-Cluster.ipynb | 10 ++++++++-
book/tutorials/pcluster/example.bashrc | 7 ++++++
book/tutorials/pcluster/pcluster-login.ipynb | 36 +++++++++++++++++++------------
book/tutorials/pcluster/reproducing_v4r4.ipynb | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
book/tutorials/pcluster/run_script_slurm.bash | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
book/tutorials/pcluster_tutorial_index.md | 1 +
7 files changed, 225 insertions(+), 15 deletions(-)
create mode 100644 book/tutorials/pcluster/reproducing_v4r4.ipynb
create mode 100644 book/tutorials/pcluster/run_script_slurm.bash
git fetch + git merge#
Alternatively you can do the equivalent of git pull
in two steps by fetching changes on the upstream repo first, and then merging:
git checkout main
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git fetch upstream
Enter passphrase for key '/home/jovyan/.ssh/id_ed25519':
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git merge upstream/main
From github.com:ECCO-Hackweek/ecco-2024
* branch main -> FETCH_HEAD
Already up to date.
git reset –hard#
git pull
and git merge
will work smoothly if the history of the remote branch can be cleanly added to that of the local branch. If this is not possible, merges (while still manageable) can be a little messier. One way to handle this is that you use the main
branch of your local repo as a mirror of the upstream repo, and then work out any conflicts with your topic branches locally. A hard reset of your local main
to the upstream main
will force your local main
to match the upstream version.
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git checkout main
Already on 'main'
Your branch is ahead of 'origin/main' by 13 commits.
(use "git push" to publish your local commits)
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git fetch upstream
Enter passphrase for key '/home/jovyan/.ssh/id_ed25519':
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git reset --hard upstream/main
HEAD is now at 4470e10 Merge pull request #42 from owang01/fix_toc
Using git reset --hard
will never result in a merge commit, since no merge is happening–the local main
or branch is just overwritten by the remote.
Updating origin
with changes from upstream (via local repo)#
As the local main
has been updated from the upstream repo, git status
shows that it is ahead of origin/main
, the equivalent branch on your Github fork of the remote repo. To update origin
we can use git push
with the --force
option to override any potential merge conflicts (similar to the --hard
option with git reset
).
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status
On branch main
Your branch and 'origin/main' have diverged,
and have 10 and 1 different commits each, respectively.
(use "git pull" to merge the remote branch into yours)
nothing to commit, working tree clean
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git push origin main --force
Enter passphrase for key '/home/jovyan/.ssh/id_ed25519':
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
To github.com:andrewdelman/ecco-2024.git
+ c383dc5...4470e10 main -> main (forced update)
Merging local main
to a local topic branch#
In a linear workflow, you would not need to merge the main
branch to a topic branch; instead you would create a new branch from the latest version of the upstream repo, commit your changes to the topic branch quickly, and push them to the remote repo before any potential merge conflicts can occur. But in a collaborative workflow environment, this may not always be possible. So it is good to have the ability to merge updates from other contributors to your topic branch and working directory, especially where they might affect your code development.
After having pulled/merged changes from remote into our local main
, we can merge those changes into the topic branch:
git checkout test_changes
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git checkout test_changes
Switched to branch 'test_changes'
Your branch is up to date with 'origin/test_changes'.
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git merge main
A merge commit page may open again. The message can be edited or not. Press Ctrl-x
to exit the page and finalize the merge commit.
Merge made by the 'ort' strategy.
book/_toc.yml | 1 +
book/tutorials/pcluster/Run_MITgcm_on_P-Cluster.ipynb | 10 ++++++++-
book/tutorials/pcluster/example.bashrc | 7 ++++++
book/tutorials/pcluster/pcluster-login.ipynb | 36 +++++++++++++++++++------------
book/tutorials/pcluster/reproducing_v4r4.ipynb | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
book/tutorials/pcluster/run_script_slurm.bash | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
book/tutorials/pcluster_tutorial_index.md | 1 +
7 files changed, 225 insertions(+), 15 deletions(-)
create mode 100644 book/tutorials/pcluster/reproducing_v4r4.ipynb
create mode 100644 book/tutorials/pcluster/run_script_slurm.bash
(notebook) jovyan@jupyter-adelman:~/ecco-2024$ git status
On branch test_changes
nothing to commit, working tree clean