|
|
|
---
|
|
|
|
title: "Renames in Git explained"
|
|
|
|
date: 2020-11-28T12:07:00Z
|
|
|
|
draft: false
|
|
|
|
toc: true
|
|
|
|
tags: ['tech', 'git', 'rename']
|
|
|
|
author: "Gaël Depreeuw"
|
|
|
|
---
|
|
|
|
|
|
|
|
## Introduction
|
|
|
|
|
|
|
|
One of the questions I'm often asked when teaching or explaining Git is how Git
|
|
|
|
handles file and/or directory renames. The short answer to this is: **It**
|
|
|
|
**doesn't**.
|
|
|
|
|
|
|
|
The slightly longer answer is: **It does, but probably not in the way you**
|
|
|
|
**envision it?**.
|
|
|
|
|
|
|
|
Let's first take a look at how Git works internally. If you don't quite
|
|
|
|
understand everything which follows, I can recommend reading chapter 10 of
|
|
|
|
the [Git Pro Book 2nd. Edition](https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain).
|
|
|
|
|
|
|
|
## Git stores content, not files
|
|
|
|
|
|
|
|
When you commit to a Git repository it basically does the following:
|
|
|
|
|
|
|
|
Create a **blob** object for every file in the index (a.k.a. the staging area).
|
|
|
|
A blob object is created by taking the content of the file, prepending a header
|
|
|
|
and compressing the result. A SHA-1 hash is then calculated for this object
|
|
|
|
which will be used to identify the object. The object is stored in the aptly
|
|
|
|
named object store (found in `.git/objects`). The first 2 characters of the
|
|
|
|
hash (in hex format) are used as a directory within this store, while the
|
|
|
|
remaining characters are the filename of the blob object.
|
|
|
|
|
|
|
|
Let's look at an example. Create a git repo somewhere and create a file.
|
|
|
|
|
|
|
|
```bash
|
|
|
|
git init foo
|
|
|
|
cd foo
|
|
|
|
echo "foo" >> foo.txt
|
|
|
|
```
|
|
|
|
|
|
|
|
If you look into your `.git/objects` directory, it will be empty, aside from
|
|
|
|
two empty subdirectories. Let's create a blob out of this file now.
|
|
|
|
|
|
|
|
```bash
|
|
|
|
$ git hash-object -w foo.txt
|
|
|
|
257cc5642cb1a054f08cc83f2d943e56fd3ebe99
|
|
|
|
```
|
|
|
|
|
|
|
|
You will now find an object in the store:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
$ find .git/objects -type f
|
|
|
|
.git/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99
|
|
|
|
```
|
|
|
|
|
|
|
|
Here's an interesting exercise: what happens if you rename the file and create
|
|
|
|
the blob with the renamed file?
|
|
|
|
|
|
|
|
```bash
|
|
|
|
$ mv foo.txt bar.txt
|
|
|
|
$ git hash-object -w bar.txt
|
|
|
|
257cc5642cb1a054f08cc83f2d943e56fd3ebe99
|
|
|
|
```
|
|
|
|
|
|
|
|
That's right, nothing changed! This makes sense as we're only adding the content
|
|
|
|
to the object store! So how does Git remember the file names?
|
|
|
|
|
|
|
|
## Filenames are part of tree objects
|
|
|
|
|
|
|
|
Aside from **blob** objects, Git also creates **tree** objects. You can sort of
|
|
|
|
compare it to the directories in your worktree, i.e. for each directory in your
|
|
|
|
worktree, you will have a tree object. A tree object's content looks like:
|
|
|
|
|
|
|
|
```code
|
|
|
|
<mode> <type> <hash> <name>
|
|
|
|
...
|
|
|
|
<mode> <type> <hash> <name>
|
|
|
|
```
|
|
|
|
|
|
|
|
You can create a tree object yourself by doing:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
$ git update-index --add --cacheinfo 100644 \
|
|
|
|
257cc5642cb1a054f08cc83f2d943e56fd3ebe99 foo.txt
|
|
|
|
$ git write-tree
|
|
|
|
fcf0be4d7e45f0ef9592682ad68e42270b0366b4
|
|
|
|
$ git cat-file -p fcf0be4d7e45f0ef9592682ad68e42270b0366b4
|
|
|
|
100644 blob 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 foo.txt
|
|
|
|
```
|
|
|
|
|
|
|
|
There are 3 different types (that I know of) which can be referred to in a tree
|
|
|
|
object: blob, tree, commit. Blobs represent file content, tree represent other
|
|
|
|
tree (i.e. subdirectories) and commits represent submodules (i.e the commit
|
|
|
|
at which they are included). A commit is a type of object which is also present
|
|
|
|
outside the tree objects. They contain the top tree object (representing the
|
|
|
|
top level of your repository), a link to one or more parent commits and some
|
|
|
|
meta data (author, commit msg, date, ...). Finally and for completion's sake,
|
|
|
|
there is also an object for annotated tags, which contain the commit it is
|
|
|
|
pointing too as well as some meta data.
|
|
|
|
|
|
|
|
## Renaming
|
|
|
|
|
|
|
|
Armed with the knowledge about trees and blobs, it should be fairly easy to
|
|
|
|
understand what happens if you rename a file. To make it easier to understand,
|
|
|
|
consider a simple example: we just rename a file at the top level.
|
|
|
|
|
|
|
|
> Note: more complex examples are just more time consuming to explain, but
|
|
|
|
> not to understand. The same principles apply.
|
|
|
|
|
|
|
|
In case of such a rename, when you commit this rename, your repository will
|
|
|
|
be impacted as follows:
|
|
|
|
|
|
|
|
- The blob representing the file remains unchanged.
|
|
|
|
- The top level tree object changes as well because the filename associated with
|
|
|
|
the blob is different.
|
|
|
|
- The commit object will point to the new tree. (Its parent will point to the
|
|
|
|
old tree.)
|
|
|
|
|
|
|
|
Nowhere is there any special mention of a rename occuring. Remember, we're just
|
|
|
|
storing content! As such Git is not aware of any name changes. This is why the
|
|
|
|
short answer was: Git doesn't handle file renames. The repository itself has no
|
|
|
|
notion of this action. It's just has content and a structure for that content.
|
|
|
|
|
|
|
|
However, that does not mean you lose your history when you rename a file.
|
|
|
|
|
|
|
|
### How to see history of a renamed file
|
|
|
|
|
|
|
|
Git might not store information on renames in the repository but it does come
|
|
|
|
packed with an algorithm that detects file renames. The way it works is that for
|
|
|
|
every add/delete pair added to the index, it tries to determine a rename
|
|
|
|
candidate for every deleted file. It does this by comparing how similar the
|
|
|
|
paired files are. If they are at least 50% similar, it considered the pair to
|
|
|
|
have been a rename. If there are multiple rename candidates for one file, it
|
|
|
|
takes the one with the highest similarity percentage. If multipe files have the
|
|
|
|
same percentage, it picks one depending on the implementation.
|
|
|
|
|
|
|
|
> **Note**: I believe, but am not sure, it basicaly takes the first
|
|
|
|
> alphabeticaly match in the last case.
|
|
|
|
|
|
|
|
By default `git log -- <file>` does not track accross renames. If you want to
|
|
|
|
do see the history across renames, you will need to add the `--follow` option.
|
|
|
|
|
|
|
|
You can also define the treshold percentage to be different from 50%. This is
|
|
|
|
done via the `-M<n>` or `--find-renames=<n>` option. See the git documentation
|
|
|
|
for the correct syntax.
|
|
|
|
|
|
|
|
You can also turn off rename detection by doing `--no-renames`
|
|
|
|
|
|
|
|
### Rename best practice
|
|
|
|
|
|
|
|
Because of the treshold and the cheapness of commits, it is recommended that
|
|
|
|
when you rename a file/directory, you commit those renames first, before you
|
|
|
|
continue working on the renamed file. This basically makes it so you can use
|
|
|
|
a treshold of 100% all the time.
|
|
|
|
|
|
|
|
### Why did Git do it this way
|
|
|
|
|
|
|
|
This is pure speculation, but here's my thoughts on it:
|
|
|
|
|
|
|
|
Filenames are actually part of the underlying file systems, so for a version
|
|
|
|
controls system to support multiple file system they have to handle filenames
|
|
|
|
in their own way. This includes renames. If you think about what this would
|
|
|
|
require for Git, it would not be very straightforward: Git could have chosen to
|
|
|
|
provide a command to store rename data, let's say: `git rename fileA fileB`, but
|
|
|
|
what should this command do? We can image it could create new '**rename**
|
|
|
|
object, which would hold the blob hash and the name of the previous file. Now,
|
|
|
|
every time you would walk through history, when you encounter this object type,
|
|
|
|
you would need to remember this redirections. There's probably a lot of little
|
|
|
|
nuances which are not immediately apparent though and it does not deal with one
|
|
|
|
of the major drawbacks of this new command: What happens if the user forgets it
|
|
|
|
and just does `mv fileA fileB`?
|
|
|
|
|
|
|
|
Well, we'd actually want to have some mechanism to detect this as a rename as
|
|
|
|
once this is commited it becomes more difficult to undo this change
|
|
|
|
(especially if we already pushed the commit!). So it sure would be nice if Git
|
|
|
|
could somehow figure out that it was a rename. Which is exactly what they did.
|
|
|
|
But now that we have this functionality, what actually is the point of the
|
|
|
|
new command we wanted to implement? This is probably highly subjective, but to
|
|
|
|
me it seems completely irrelevant now. Instead of having a command which can be
|
|
|
|
forgotten and for which we need contigency, just use the contigency as the
|
|
|
|
solution! It makes the behaviour a lot more consistent!
|
|
|
|
|
|
|
|
### Can I fix my commit if I did change a lot of content after renaming
|
|
|
|
|
|
|
|
First, to prevent this: always check using `git status` whether are not the
|
|
|
|
rename is being detected. Now, how to solve it?
|
|
|
|
|
|
|
|
It depends. If your commit is local only and it is the last commit, then you can
|
|
|
|
fix this easily. There are many ways to to it, but one option is:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
git mv <newname> <oldname> # Undo the file rename
|
|
|
|
git commit --amend # Commit the changes to the file
|
|
|
|
git mv <oldname> <newname> # Rename the file
|
|
|
|
git commit # Commit the rename
|
|
|
|
```
|
|
|
|
|
|
|
|
If you want to rename first and the changes second you can also do this, but
|
|
|
|
it is a bit more complex:
|
|
|
|
|
|
|
|
```bash
|
|
|
|
git reset --soft HEAD~ # Go back one commit, but keep the changes
|
|
|
|
git restore --staged <oldname> <newname> # unstage the deletion and addition
|
|
|
|
git restore <oldname> # undelete the old file
|
|
|
|
mv <newname> <newname.tmp> # make a temp backup of the new file
|
|
|
|
git mv <oldname> <newname> # Rename the old file
|
|
|
|
git commit # commit the rename
|
|
|
|
cp <newname.tmp> <newname> # apply the new changes
|
|
|
|
git commit -a # Commit the changes
|
|
|
|
```
|
|
|
|
|
|
|
|
If the commit is already a couple of commits ago, you can do the same with an
|
|
|
|
interactive rebase and doing either of the above at the correct time.
|
|
|
|
|
|
|
|
If you already pushed your commits you will have to check with the team if you
|
|
|
|
can rewrite the history and push it. If this is not possible, you might need to
|
|
|
|
find the right treshold to have Git mark it as a rename.
|
|
|
|
|
|
|
|
## Summary
|
|
|
|
|
|
|
|
So in summary: no, Git does not store renames in its repository. Instead, for
|
|
|
|
every add/delete pair in a commit, Git will do an similarity analysis and
|
|
|
|
when they are X% alike (default 50%), it will assume a rename occured.
|
|
|
|
|
|
|
|
Some commands influenced by this are: `git log`, `git diff` and `git merge`.
|
|
|
|
Options related to renames are:
|
|
|
|
|
|
|
|
```txt
|
|
|
|
-M=<n>, --find-renames=<n> # where n is the treshold percentage.
|
|
|
|
--no-renames # don't do any rename detection
|
|
|
|
```
|
|
|
|
|
|
|
|
It is best practise to handle renames in their own commits. Try to avoid
|
|
|
|
renaming and modifying a file within the same commit.
|