You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
236 lines
9.5 KiB
236 lines
9.5 KiB
--- |
|
title: "Renames in Git explained" |
|
date: 2020-11-28T12:07:00Z |
|
draft: false |
|
toc: true |
|
tags: ['tech', 'git', 'rename'] |
|
author: "Gaël Depreeuw" |
|
--- |
|
|
|
## Introduction |
|
|
|
One of the questions I'm often asked when teaching or explaining Git is how Git |
|
handles file and/or directory renames. The short answer to this is: **It** |
|
**doesn't**. |
|
|
|
The slightly longer answer is: **It does, but probably not in the way you** |
|
**envision it?**. |
|
|
|
Let's first take a look at how Git works internally. If you don't quite |
|
understand everything which follows, I can recommend reading chapter 10 of |
|
the [Git Pro Book 2nd. Edition](https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain). |
|
|
|
## Git stores content, not files |
|
|
|
When you commit to a Git repository it basically does the following: |
|
|
|
Create a **blob** object for every file in the index (a.k.a. the staging area). |
|
A blob object is created by taking the content of the file, prepending a header |
|
and compressing the result. A SHA-1 hash is then calculated for this object |
|
which will be used to identify the object. The object is stored in the aptly |
|
named object store (found in `.git/objects`). The first 2 characters of the |
|
hash (in hex format) are used as a directory within this store, while the |
|
remaining characters are the filename of the blob object. |
|
|
|
Let's look at an example. Create a git repo somewhere and create a file. |
|
|
|
```bash |
|
git init foo |
|
cd foo |
|
echo "foo" >> foo.txt |
|
``` |
|
|
|
If you look into your `.git/objects` directory, it will be empty, aside from |
|
two empty subdirectories. Let's create a blob out of this file now. |
|
|
|
```bash |
|
$ git hash-object -w foo.txt |
|
257cc5642cb1a054f08cc83f2d943e56fd3ebe99 |
|
``` |
|
|
|
You will now find an object in the store: |
|
|
|
```bash |
|
$ find .git/objects -type f |
|
.git/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99 |
|
``` |
|
|
|
Here's an interesting exercise: what happens if you rename the file and create |
|
the blob with the renamed file? |
|
|
|
```bash |
|
$ mv foo.txt bar.txt |
|
$ git hash-object -w bar.txt |
|
257cc5642cb1a054f08cc83f2d943e56fd3ebe99 |
|
``` |
|
|
|
That's right, nothing changed! This makes sense as we're only adding the content |
|
to the object store! So how does Git remember the file names? |
|
|
|
## Filenames are part of tree objects |
|
|
|
Aside from **blob** objects, Git also creates **tree** objects. You can sort of |
|
compare it to the directories in your worktree, i.e. for each directory in your |
|
worktree, you will have a tree object. A tree object's content looks like: |
|
|
|
```code |
|
<mode> <type> <hash> <name> |
|
... |
|
<mode> <type> <hash> <name> |
|
``` |
|
|
|
You can create a tree object yourself by doing: |
|
|
|
```bash |
|
$ git update-index --add --cacheinfo 100644 \ |
|
257cc5642cb1a054f08cc83f2d943e56fd3ebe99 foo.txt |
|
$ git write-tree |
|
fcf0be4d7e45f0ef9592682ad68e42270b0366b4 |
|
$ git cat-file -p fcf0be4d7e45f0ef9592682ad68e42270b0366b4 |
|
100644 blob 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 foo.txt |
|
``` |
|
|
|
There are 3 different types (that I know of) which can be referred to in a tree |
|
object: blob, tree, commit. Blobs represent file content, tree represent other |
|
tree (i.e. subdirectories) and commits represent submodules (i.e the commit |
|
at which they are included). A commit is a type of object which is also present |
|
outside the tree objects. They contain the top tree object (representing the |
|
top level of your repository), a link to one or more parent commits and some |
|
meta data (author, commit msg, date, ...). Finally and for completion's sake, |
|
there is also an object for annotated tags, which contain the commit it is |
|
pointing too as well as some meta data. |
|
|
|
## Renaming |
|
|
|
Armed with the knowledge about trees and blobs, it should be fairly easy to |
|
understand what happens if you rename a file. To make it easier to understand, |
|
consider a simple example: we just rename a file at the top level. |
|
|
|
> Note: more complex examples are just more time consuming to explain, but |
|
> not to understand. The same principles apply. |
|
|
|
In case of such a rename, when you commit this rename, your repository will |
|
be impacted as follows: |
|
|
|
- The blob representing the file remains unchanged. |
|
- The top level tree object changes as well because the filename associated with |
|
the blob is different. |
|
- The commit object will point to the new tree. (Its parent will point to the |
|
old tree.) |
|
|
|
Nowhere is there any special mention of a rename occuring. Remember, we're just |
|
storing content! As such Git is not aware of any name changes. This is why the |
|
short answer was: Git doesn't handle file renames. The repository itself has no |
|
notion of this action. It's just has content and a structure for that content. |
|
|
|
However, that does not mean you lose your history when you rename a file. |
|
|
|
### How to see history of a renamed file |
|
|
|
Git might not store information on renames in the repository but it does come |
|
packed with an algorithm that detects file renames. The way it works is that for |
|
every add/delete pair added to the index, it tries to determine a rename |
|
candidate for every deleted file. It does this by comparing how similar the |
|
paired files are. If they are at least 50% similar, it considered the pair to |
|
have been a rename. If there are multiple rename candidates for one file, it |
|
takes the one with the highest similarity percentage. If multipe files have the |
|
same percentage, it picks one depending on the implementation. |
|
|
|
> **Note**: I believe, but am not sure, it basicaly takes the first |
|
> alphabeticaly match in the last case. |
|
|
|
By default `git log -- <file>` does not track accross renames. If you want to |
|
do see the history across renames, you will need to add the `--follow` option. |
|
|
|
You can also define the treshold percentage to be different from 50%. This is |
|
done via the `-M<n>` or `--find-renames=<n>` option. See the git documentation |
|
for the correct syntax. |
|
|
|
You can also turn off rename detection by doing `--no-renames` |
|
|
|
### Rename best practice |
|
|
|
Because of the treshold and the cheapness of commits, it is recommended that |
|
when you rename a file/directory, you commit those renames first, before you |
|
continue working on the renamed file. This basically makes it so you can use |
|
a treshold of 100% all the time. |
|
|
|
### Why did Git do it this way |
|
|
|
This is pure speculation, but here's my thoughts on it: |
|
|
|
Filenames are actually part of the underlying file systems, so for a version |
|
controls system to support multiple file system they have to handle filenames |
|
in their own way. This includes renames. If you think about what this would |
|
require for Git, it would not be very straightforward: Git could have chosen to |
|
provide a command to store rename data, let's say: `git rename fileA fileB`, but |
|
what should this command do? We can image it could create new '**rename** |
|
object, which would hold the blob hash and the name of the previous file. Now, |
|
every time you would walk through history, when you encounter this object type, |
|
you would need to remember this redirections. There's probably a lot of little |
|
nuances which are not immediately apparent though and it does not deal with one |
|
of the major drawbacks of this new command: What happens if the user forgets it |
|
and just does `mv fileA fileB`? |
|
|
|
Well, we'd actually want to have some mechanism to detect this as a rename as |
|
once this is commited it becomes more difficult to undo this change |
|
(especially if we already pushed the commit!). So it sure would be nice if Git |
|
could somehow figure out that it was a rename. Which is exactly what they did. |
|
But now that we have this functionality, what actually is the point of the |
|
new command we wanted to implement? This is probably highly subjective, but to |
|
me it seems completely irrelevant now. Instead of having a command which can be |
|
forgotten and for which we need contigency, just use the contigency as the |
|
solution! It makes the behaviour a lot more consistent! |
|
|
|
### Can I fix my commit if I did change a lot of content after renaming |
|
|
|
First, to prevent this: always check using `git status` whether are not the |
|
rename is being detected. Now, how to solve it? |
|
|
|
It depends. If your commit is local only and it is the last commit, then you can |
|
fix this easily. There are many ways to to it, but one option is: |
|
|
|
```bash |
|
git mv <newname> <oldname> # Undo the file rename |
|
git commit --amend # Commit the changes to the file |
|
git mv <oldname> <newname> # Rename the file |
|
git commit # Commit the rename |
|
``` |
|
|
|
If you want to rename first and the changes second you can also do this, but |
|
it is a bit more complex: |
|
|
|
```bash |
|
git reset --soft HEAD~ # Go back one commit, but keep the changes |
|
git restore --staged <oldname> <newname> # unstage the deletion and addition |
|
git restore <oldname> # undelete the old file |
|
mv <newname> <newname.tmp> # make a temp backup of the new file |
|
git mv <oldname> <newname> # Rename the old file |
|
git commit # commit the rename |
|
cp <newname.tmp> <newname> # apply the new changes |
|
git commit -a # Commit the changes |
|
``` |
|
|
|
If the commit is already a couple of commits ago, you can do the same with an |
|
interactive rebase and doing either of the above at the correct time. |
|
|
|
If you already pushed your commits you will have to check with the team if you |
|
can rewrite the history and push it. If this is not possible, you might need to |
|
find the right treshold to have Git mark it as a rename. |
|
|
|
## Summary |
|
|
|
So in summary: no, Git does not store renames in its repository. Instead, for |
|
every add/delete pair in a commit, Git will do an similarity analysis and |
|
when they are X% alike (default 50%), it will assume a rename occured. |
|
|
|
Some commands influenced by this are: `git log`, `git diff` and `git merge`. |
|
Options related to renames are: |
|
|
|
```txt |
|
-M=<n>, --find-renames=<n> # where n is the treshold percentage. |
|
--no-renames # don't do any rename detection |
|
``` |
|
|
|
It is best practise to handle renames in their own commits. Try to avoid |
|
renaming and modifying a file within the same commit.
|
|
|