Git Basics
Git is a distributed version control system. Originally written by Linus Torvalds to be used with the development of the Linux kernel, it has now become the go-to way to share work between multiple developers.
In this article I will summarise what I feel to be the next-step
basics of git
, explaining each notion along the way.
I assume at least passing knowledge of git
, and will therefore skip the
justifications for using git
instead of flinging tarballs at one another.
I will also be skipping the explanation for the basic workflow of git add
,
git commit
, and git push
. You can consider this guide to be aimed at 3rd
year students at EPITA, who have used git
for a whole year to submit their
project but have not explored some of its more powerful features.
Starting out with branches and references
To me, this is the most essential thing you need to remember when you using
git
. It is part of what makes it special, and will be used though-out your
career.
Why you should use branches in git
What makes git
so useful, and so powerful, is the fact that it was conceived
from the ground up to operate in a decentralised manner, to accommodate the
Linux kernel programming workflow.
That model de facto means that branching must be a lightweight operation, and merging should not be hassle. Indeed, as soon as you start having people work in parallel on a decentralised system, you end up creating “hidden branches”: each person’s development tree is a branch on its own.
If you try merging branches that do not have any conflict, the operation is basically instantaneous: to take advantage of that fact I encourage you to use branches in your workflow when using git.
Where is my HEAD
The notion of HEAD
in git
can seem strange. You might first have encountered
it when checking out an older commit. git status
helpfully tells you that you
have been guillotined: HEAD detached at 78f604b
.
To make it short, HEAD
is a reference pointing to the commit that you are
currently working on top of. It usually points to a branch name (e.g: main
or
master
), but can also point to a specific commit (such as in the checkout
scenario I just mentioned).
Revisions
Most of the commands I will show you need you to provide them with what git
calls a revision. This is usually means a way to specify a commit.
There are multiple ways to specify a revision
, you should know at least two
of them: refname
and describeOutput
which loosely correspond to branch
names and git tags respectively. Note that @
is a shortcut for referring to
HEAD
.
You can also specify the sha1
commit hash directly, or relative revisions.
A relative revision allows you to select the parent of a specific commit,
you can use the following revisions specifiers:
~
: select the first-parent commit^
: select the nth-parent commit (useful for merge commits)
You can append numbers to those two specifiers, they differ in how they handle
merges. If you are applying them to a merge commit, ~2
will give you the
grand-parent of your commit, following the “first parent”, whereas ^2
will
give you the “second parent” of your commit.
History manipulation
Once you start using git
for non-trivial projects, using some of the
practices that I aim to teach you, rewriting history will become your secret
weapon for productivity.
I have to insist on one point though, which is that re-writing history that was published and used by other people is often seen as a faux-pas, or worse! You should only use it on private branches, making sure to never rewrite published history unless absolutely necessary.
Picking cherries
The easiest way to manipulate history is the cherry-pick
command. It allows
you to “lift” a commit any other place in history, and plop it down in your
current branch.
It’s the easiest way to manipulate history, allowing you for example to pick a
commit which fixes a bug in another branch and apply it onto yours: simply do
git cherry-pick <my-commit-with-the-bugfix>
.
It is however most likely not what you want to do if you later intend to merge
your branch with the one you lifted the commit from. Both sets of commits will
have the exact same change, and git
will not be able to resolve the conflict.
In those cases, consider merging from a common branch whose purpose is applying
the fix. In that case, git
will happily merge your branches later on without
making a fuss.
All your rebase are belong to us
This is probably the single best command in all of git
in my mind. Having the
access to git rebase
allows you to commit as you work, without caring about
atomicity, commit messages, or even having working/compiling code.
Rebasing allows you to make various changes to your branch’s history:
- Rewording a commit’s message.
- Reordering commits
- Removing commits
- Squashing: merging a commit into another one
This tool allows you to work on your own, commit early and commit often as you work on your changes, and keep a clean result before merging back into the main branch.
Fixup, a practical example
A specific kind of squashing which I use frequently is the notion of fixup
s.
Say you’ve commited a change (A), and later on notice that it is missing
a part of the changeset. You can decide to commit that missing part (A-bis)
and annotate it to mean that it is linked to A.
Let’s say you have this history:
42sh$ git log --oneline
* 787dd36 (HEAD -> master) Add README
* 8d08529 Add baz
* 7188fb1 Frobulate bar
* 961d8fb Fix foo
And notice that missed a change that belongs to Add baz
. You can add
it to
your staged changes, and issue commit --fixup @~
. This will create a commit
named fixup! Add baz
.
42sh$ git log --oneline
* 92912ee (HEAD -> master) fixup! Add baz
* 787dd36 Add README
* 8d08529 Add baz
* 7188fb1 Frobulate bar
* 961d8fb Fix foo
If you then rebase using -i --autosquash
will result in this interactive
rebase screen.
pick 961d8fb Fix foo
pick 7188fb1 Frobulate bar
pick 8d08529 Add baz
fixup 92912ee fixup! Add baz
pick 787dd36 Add README
After applying the rebase, you find yourself with the complete change inside
Add baz
, which can be confirmed with another git log
* 0174e54 (HEAD -> master) Add README
* b0a47ae Add baz
* 7188fb1 Frobulate bar
* 961d8fb Fix foo
This is especially useful when you want to apply suggestion on a merge request
after it was reviewed. You can keep a clean history without those pesky Apply suggestion ...
commmits being part of your history.
Lost commits and the reflog
When doing this kind of history manipulation, you might end up making a mistake and lose a commit that was very important.
Obviously, git
has a way to save us in this situation. If we look at the man
page for git reflog
, we can read the following sentence:
Reference logs, or "reflogs", record when the tips of branches and other
references were updated in the local repository.
What does this mean exactly? Simply put, you can use it to checkout a previous version of your repository, in the state it was in before you manipulated the history. Let’s illustrate with a small example.
Mapping lost commits: a practical example
Let’s say you have this repository state at the beginning.
42sh$ git log --oneline
* 524de22 (HEAD -> master) Documentation update
* d60ddb5 USELESS COMMIT
* e81b5fb Remove baz dependency
* 44cea7d VERY IMPORTANT COMMIT
* 58eb2d9 Use foo without bar
* dab7792 Simplify frobulation
And decide to drop c581d4d
(USELESS COMMIT
), but inadvertently drop
377921c
(VERY IMPORTANT COMMIT
) at the same time. For this example,
I simply dropped
both commits in a rebase
operation.
I notice now that I am missing my VERY IMPORTANT COMMIT
in my history:
42sh$ git log --oneline
* ec8508b (HEAD -> master) Documentation update
* 3866067 Remove baz dependency
* 58eb2d9 Use foo without bar
* dab7792 Simplify frobulation
If I now use try to see what happened to my HEAD
reference using reflog
,
I can find the last update I did before starting my rebase
to cancel the
whole operation.
42sh$ git reflog
ec8508b (HEAD -> master) HEAD@{0}: rebase (finish): returning to refs/heads/master
ec8508b (HEAD -> master) HEAD@{1}: rebase (pick): Documentation update
3866067 HEAD@{2}: rebase (pick): Remove baz dependency
58eb2d9 HEAD@{3}: rebase: fast-forward
dab7792 HEAD@{4}: rebase: fast-forward
612e6f5 HEAD@{5}: rebase (start): checkout 612e6f5a055280aac1d7608af2dd2443aed6875c
524de22 HEAD@{6}: commit: Documentation update
d60ddb5 HEAD@{7}: commit: USELESS COMMIT
e81b5fb HEAD@{8}: commit: Remove baz dependency
44cea7d HEAD@{9}: commit: VERY IMPORTANT COMMIT
58eb2d9 HEAD@{10}: commit: Use foo without bar
dab7792 HEAD@{11}: commit (initial): Simplify frobulation
By reading the reflog
, I can see that my rebase
started at HEAD@{5}
(reads: HEAD
’s fifth prior value). If I want to return to the state of my
repository before starting that rebase, I can simply do git checkout HEAD@6
which will take me back to the state prior to the rebase
.
42sh$ git checkout HEAD@{6} # Checkout my `HEAD`'s 6th prior value
42sh$ git log --oneline # Are we back before the rebase?
* 524de22 (HEAD) Documentation update
* d60ddb5 USELESS COMMIT
* e81b5fb Remove baz dependency
* 44cea7d VERY IMPORTANT COMMIT
* 58eb2d9 Use foo without bar
* dab7792 Simplify frobulation
Now, I want to make sure that I have my master
branch back to that state too,
and not simply my disembodied HEAD
.
42sh$ git branch -f master # Change where `master` is pointing at
42sh$ git checkout master # Checkout `master` branch
42sh$ git log --oneline # Is everything in order?
* 524de22 (HEAD -> master) Documentation update
* d60ddb5 USELESS COMMIT
* e81b5fb Remove baz dependency
* 44cea7d VERY IMPORTANT COMMIT
* 58eb2d9 Use foo without bar
* dab7792 Simplify frobulation
And voila! I can now try my rebase
again, and be careful not to lose VERY IMPORTANT COMMIT
this time.
Tips and tricks
Here are some basic pieces of knowledge which don’t really belong to any other section, which I think needs to be said.
The importance of small commits
You might have noticed that people keep saying that commits should be kept atomic. What does that mean and why should it matter?
Keeping commits atomic means that you should strive to commit your changes in
the smallest unit of work possible. Instead of making one commit named WIP: add
stuff at the end of the day, you should instead try to cut your work up into
small units: add tests for frobulator
, account for foo in bar processing
,
etc…
This way of working has multiple things going for it once you start taking
advantage of git
’s power: you can more easily reason about a line of code by
using blame
, you can more easily squash bugs using revert
, you can more
easily review the changes in an MR and keep its scope narrow.
One very useful command you can add to your tool belt is git add -p
, which
prompts you interactively for each patch in your working directory : you can
easily choose which parts of your changes should end up in the same commit.
Miscellaneous commands
Here’s a list of commands that you should read-up on, but I won’t be presenting further:
git bisect
git rerere
git stash
- and more…
Going further
I advise you to check out Learn git branching to practice a few of the notions I just wrote about, with a nice visualization of the commit graph to explain what you are doing along the way.
Furthermore, the Pro Git book is available online for free, and
contains a lot of great content. You can read it whole, but I especially
recommend checking out chapter 7 (Git Tools) and chapter 8 (Git
Configuration). If you want to learn about the inner workings of git
and how
it stores the repository on your hard-drive, checkout chapter 10 (Git
Internals).