@ -0,0 +1,236 @@ |
--- |
title: "Renames in Git explained" |
date: 2020-11-28T12:07:00Z |
draft: false |
toc: true |
tags: ['tech', 'git', 'rename'] |
author: "Gaël Depreeuw" |
--- |
## Introduction |
One of the questions I'm often asked when teaching or explaining Git is how Git |
handles file and/or directory renames. The short answer to this is: **It** |
**doesn't**. |
The slightly longer answer is: **It does, but probably not in the way you** |
**envision it?**. |
Let's first take a look at how Git works internally. If you don't quite |
understand everything which follows, I can recommend reading chapter 10 of |
the [Git Pro Book 2nd. Edition]( |
## Git stores content, not files |
When you commit to a Git repository it basically does the following: |
Create a **blob** object for every file in the index (a.k.a. the staging area). |
A blob object is created by taking the content of the file, prepending a header |
and compressing the result. A SHA-1 hash is then calculated for this object |
which will be used to identify the object. The object is stored in the aptly |
named object store (found in `.git/objects`). The first 2 characters of the |
hash (in hex format) are used as a directory within this store, while the |
remaining characters are the filename of the blob object. |
Let's look at an example. Create a git repo somewhere and create a file. |
```bash |
git init foo |
cd foo |
echo "foo" >> foo.txt |
``` |
If you look into your `.git/objects` directory, it will be empty, aside from |
two empty subdirectories. Let's create a blob out of this file now. |
```bash |
$ git hash-object -w foo.txt |
257cc5642cb1a054f08cc83f2d943e56fd3ebe99 |
``` |
You will now find an object in the store: |
```bash |
$ find .git/objects -type f |
.git/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99 |
``` |
Here's an interesting exercise: what happens if you rename the file and create |
the blob with the renamed file? |
```bash |
$ mv foo.txt bar.txt |
$ git hash-object -w bar.txt |
257cc5642cb1a054f08cc83f2d943e56fd3ebe99 |
``` |
That's right, nothing changed! This makes sense as we're only adding the content |
to the object store! So how does Git remember the file names? |
## Filenames are part of tree objects |
Aside from **blob** objects, Git also creates **tree** objects. You can sort of |
compare it to the directories in your worktree, i.e. for each directory in your |
worktree, you will have a tree object. A tree object's content looks like: |
```code |
<mode> <type> <hash> <name> |
... |
<mode> <type> <hash> <name> |
``` |
You can create a tree object yourself by doing: |
```bash |
$ git update-index --add --cacheinfo 100644 \ |
257cc5642cb1a054f08cc83f2d943e56fd3ebe99 foo.txt |
$ git write-tree |
fcf0be4d7e45f0ef9592682ad68e42270b0366b4 |
$ git cat-file -p fcf0be4d7e45f0ef9592682ad68e42270b0366b4 |
100644 blob 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 foo.txt |
``` |
There are 3 different types (that I know of) which can be referred to in a tree |
object: blob, tree, commit. Blobs represent file content, tree represent other |
tree (i.e. subdirectories) and commits represent submodules (i.e the commit |
at which they are included). A commit is a type of object which is also present |
outside the tree objects. They contain the top tree object (representing the |
top level of your repository), a link to one or more parent commits and some |
meta data (author, commit msg, date, ...). Finally and for completion's sake, |
there is also an object for annotated tags, which contain the commit it is |
pointing too as well as some meta data. |
## Renaming |
Armed with the knowledge about trees and blobs, it should be fairly easy to |
understand what happens if you rename a file. To make it easier to understand, |
consider a simple example: we just rename a file at the top level. |
> Note: more complex examples are just more time consuming to explain, but |
> not to understand. The same principles apply. |
In case of such a rename, when you commit this rename, your repository will |
be impacted as follows: |
- The blob representing the file remains unchanged. |
- The top level tree object changes as well because the filename associated with |
the blob is different. |
- The commit object will point to the new tree. (Its parent will point to the |
old tree.) |
Nowhere is there any special mention of a rename occuring. Remember, we're just |
storing content! As such Git is not aware of any name changes. This is why the |
short answer was: Git doesn't handle file renames. The repository itself has no |
notion of this action. It's just has content and a structure for that content. |
However, that does not mean you lose your history when you rename a file. |
### How to see history of a renamed file |
Git might not store information on renames in the repository but it does come |
packed with an algorithm that detects file renames. The way it works is that for |
every add/delete pair added to the index, it tries to determine a rename |
candidate for every deleted file. It does this by comparing how similar the |
paired files are. If they are at least 50% similar, it considered the pair to |
have been a rename. If there are multiple rename candidates for one file, it |
takes the one with the highest similarity percentage. If multipe files have the |
same percentage, it picks one depending on the implementation. |
> **Note**: I believe, but am not sure, it basicaly takes the first |
> alphabeticaly match in the last case. |
By default `git log -- <file>` does not track accross renames. If you want to |
do see the history across renames, you will need to add the `--follow` option. |
You can also define the treshold percentage to be different from 50%. This is |
done via the `-M<n>` or `--find-renames=<n>` option. See the git documentation |
for the correct syntax. |
You can also turn off rename detection by doing `--no-renames` |
### Rename best practice |
Because of the treshold and the cheapness of commits, it is recommended that |
when you rename a file/directory, you commit those renames first, before you |
continue working on the renamed file. This basically makes it so you can use |
a treshold of 100% all the time. |
### Why did Git do it this way |
This is pure speculation, but here's my thoughts on it: |
Filenames are actually part of the underlying file systems, so for a version |
controls system to support multiple file system they have to handle filenames |
in their own way. This includes renames. If you think about what this would |
require for Git, it would not be very straightforward: Git could have chosen to |
provide a command to store rename data, let's say: `git rename fileA fileB`, but |
what should this command do? We can image it could create new '**rename** |
object, which would hold the blob hash and the name of the previous file. Now, |
every time you would walk through history, when you encounter this object type, |
you would need to remember this redirections. There's probably a lot of little |
nuances which are not immediately apparent though and it does not deal with one |
of the major drawbacks of this new command: What happens if the user forgets it |
and just does `mv fileA fileB`? |
Well, we'd actually want to have some mechanism to detect this as a rename as |
once this is commited it becomes more difficult to undo this change |
(especially if we already pushed the commit!). So it sure would be nice if Git |
could somehow figure out that it was a rename. Which is exactly what they did. |
But now that we have this functionality, what actually is the point of the |
new command we wanted to implement? This is probably highly subjective, but to |
me it seems completely irrelevant now. Instead of having a command which can be |
forgotten and for which we need contigency, just use the contigency as the |
solution! It makes the behaviour a lot more consistent! |
### Can I fix my commit if I did change a lot of content after renaming |
First, to prevent this: always check using `git status` whether are not the |
rename is being detected. Now, how to solve it? |
It depends. If your commit is local only and it is the last commit, then you can |
fix this easily. There are many ways to to it, but one option is: |
```bash |
git mv <newname> <oldname> # Undo the file rename |
git commit --amend # Commit the changes to the file |
git mv <oldname> <newname> # Rename the file |
git commit # Commit the rename |
``` |
If you want to rename first and the changes second you can also do this, but |
it is a bit more complex: |
```bash |
git reset --soft HEAD~ # Go back one commit, but keep the changes |
git restore --staged <oldname> <newname> # unstage the deletion and addition |
git restore <oldname> # undelete the old file |
mv <newname> <newname.tmp> # make a temp backup of the new file |
git mv <oldname> <newname> # Rename the old file |
git commit # commit the rename |
cp <newname.tmp> <newname> # apply the new changes |
git commit -a # Commit the changes |
``` |
If the commit is already a couple of commits ago, you can do the same with an |
interactive rebase and doing either of the above at the correct time. |
If you already pushed your commits you will have to check with the team if you |
can rewrite the history and push it. If this is not possible, you might need to |
find the right treshold to have Git mark it as a rename. |
## Summary |
So in summary: no, Git does not store renames in its repository. Instead, for |
every add/delete pair in a commit, Git will do an similarity analysis and |
when they are X% alike (default 50%), it will assume a rename occured. |
Some commands influenced by this are: `git log`, `git diff` and `git merge`. |
Options related to renames are: |
```txt |
-M=<n>, --find-renames=<n> # where n is the treshold percentage. |
--no-renames # don't do any rename detection |
``` |
It is best practise to handle renames in their own commits. Try to avoid |
renaming and modifying a file within the same commit. |
