|
|
@ -0,0 +1,219 @@ |
|
|
|
|
|
|
|
--- |
|
|
|
|
|
|
|
title: "Renames in Git explained" |
|
|
|
|
|
|
|
date: 2020-11-28T12:07:00Z |
|
|
|
|
|
|
|
draft: false |
|
|
|
|
|
|
|
toc: true |
|
|
|
|
|
|
|
tags: ['tech', 'git', 'rename'] |
|
|
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Introduction |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
One of the questions I'm often asked when teaching or explaining Git is how Git |
|
|
|
|
|
|
|
handles file and/or directory renames. The short answer to this is: **It** |
|
|
|
|
|
|
|
**doesn't**. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The slightly longer answer is: **It does, but probably not in the way you** |
|
|
|
|
|
|
|
**envision it**. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To help you understand this topic a bit more, we first have to go back to the |
|
|
|
|
|
|
|
basics: What actually is a file or directory name? The answer to this question |
|
|
|
|
|
|
|
is highly dependent on the underlying file system, but in general it can be |
|
|
|
|
|
|
|
boiled down to this: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
> A file (or directory) name is an index used by the file system to look up the |
|
|
|
|
|
|
|
> contents of the file. (Note: from now on I will only refer to file names, but |
|
|
|
|
|
|
|
> the same applies to directory names as well.) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
What you should note from this is that a filename is actually not a property of |
|
|
|
|
|
|
|
the file content itself, but part of the meta-data regarding the content. In |
|
|
|
|
|
|
|
Linux, for instance, the filename of a file is stored in the directory, which |
|
|
|
|
|
|
|
is basically a associative array which maps filenames to inodes (the object |
|
|
|
|
|
|
|
which stores the meta-data of a file). |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
When renaming a file, what you are actually doing is updating a look up table. |
|
|
|
|
|
|
|
In Linux, this would be updating the associative array of the directory. If you |
|
|
|
|
|
|
|
move a file, then you remove the element from one directory and add it to |
|
|
|
|
|
|
|
another directory. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
How this all works internally depends on the OS and the underlying file system, |
|
|
|
|
|
|
|
but more importantly is seldom related to the content of a file. Which brings us |
|
|
|
|
|
|
|
to the next chapter. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Git stores content, not files |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
When you commit to a Git repository it basically does the following: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
For each directory (including the top one), create a **tree** object. This is |
|
|
|
|
|
|
|
done by looking at every file and directory to be commited and create **blob** |
|
|
|
|
|
|
|
objects for the files and tree objects for the directories. The hash of each |
|
|
|
|
|
|
|
such object is added to this tree object together with the filename if the |
|
|
|
|
|
|
|
object type is blob and the directory name if the object type is tree. This is |
|
|
|
|
|
|
|
then prepended with a header and compressed. The SHA-1 hash is calculated and |
|
|
|
|
|
|
|
the object is stored in the object store (.git/objects), using the first two |
|
|
|
|
|
|
|
characters as a directory and the rest as filename. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
It then creates a commit object which points to the top level tree's hash. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
> Note: it of course only really does this for files which were part of the |
|
|
|
|
|
|
|
> staging area. That's the most efficient. Of course if the content of a file |
|
|
|
|
|
|
|
> was changed, it hash will change and thus the tree object it was part of will |
|
|
|
|
|
|
|
> change and its hash will also change and so on until the top level tree |
|
|
|
|
|
|
|
> object. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
As an example, suppose you have the following structure: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash |
|
|
|
|
|
|
|
. |
|
|
|
|
|
|
|
├── README.md |
|
|
|
|
|
|
|
├── bar |
|
|
|
|
|
|
|
│ ├── bar.md |
|
|
|
|
|
|
|
│ └── baz |
|
|
|
|
|
|
|
│ └── baz.md |
|
|
|
|
|
|
|
└── foo |
|
|
|
|
|
|
|
└── foo.md |
|
|
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
If you were to commit this structure to git, you will have (simplified): |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- 4 blob objects (README.md, bar.md, foo.md, baz.md) |
|
|
|
|
|
|
|
- 4 tree objects (., ./foo, ./bar and ./bar/baz) |
|
|
|
|
|
|
|
- 1 commit object |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
In my case: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash |
|
|
|
|
|
|
|
gael@Aviendha:~/git/tmp$ git commit -m "First commit" |
|
|
|
|
|
|
|
[master (root-commit) 8be3cf0] First commit |
|
|
|
|
|
|
|
4 files changed, 4 insertions(+) |
|
|
|
|
|
|
|
create mode 100644 README.md |
|
|
|
|
|
|
|
create mode 100644 bar/bar.md |
|
|
|
|
|
|
|
create mode 100644 bar/baz/baz.md |
|
|
|
|
|
|
|
create mode 100644 foo/foo.md |
|
|
|
|
|
|
|
gael@Aviendha:~/git/tmp$ find .git/objects/ -type f |
|
|
|
|
|
|
|
.git/objects/52/01cdd884658a103819d66f910ea25ba1dad2e0 |
|
|
|
|
|
|
|
.git/objects/be/e527307ae70706c20eb89f205f444c3bb385e9 |
|
|
|
|
|
|
|
.git/objects/6b/dd34e3e9ab26062ab881adb1024923923b5f8e |
|
|
|
|
|
|
|
.git/objects/8b/e3cf05d01320a124991a8e7c10fe83ec9cd5e3 |
|
|
|
|
|
|
|
.git/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99 |
|
|
|
|
|
|
|
.git/objects/57/16ca5987cbf97d6bb54920bea6adde242d87e6 |
|
|
|
|
|
|
|
.git/objects/f9/07d059fcdc9b594c6e14dc0c3826f26ab47832 |
|
|
|
|
|
|
|
.git/objects/e8/45566c06f9bf557d35e8292c37cf05d97a9769 |
|
|
|
|
|
|
|
.git/objects/0c/7d27db1f575263efdcab3dc650f4502a2dbcbf |
|
|
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
To get the top level tree object, just look at the commit: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash |
|
|
|
|
|
|
|
gael@Aviendha:~/git/tmp$ git cat-file -p 8be3cf0 |
|
|
|
|
|
|
|
tree 5201cdd884658a103819d66f910ea25ba1dad2e0 |
|
|
|
|
|
|
|
author Gaël Depreeuw <gael@depreeuw.dev> 1606569688 +0100 |
|
|
|
|
|
|
|
committer Gaël Depreeuw <gael@depreeuw.dev> 1606569688 +0100 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
First commit |
|
|
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
And if we look at the tree: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash |
|
|
|
|
|
|
|
gael@Aviendha:~/git/tmp$ git cat-file -p 5201cdd |
|
|
|
|
|
|
|
100644 blob e845566c06f9bf557d35e8292c37cf05d97a9769 README.md |
|
|
|
|
|
|
|
040000 tree f907d059fcdc9b594c6e14dc0c3826f26ab47832 bar |
|
|
|
|
|
|
|
040000 tree 0c7d27db1f575263efdcab3dc650f4502a2dbcbf foo |
|
|
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The contents of `README.md` is: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash |
|
|
|
|
|
|
|
gael@Aviendha:~/git/tmp$ git cat-file -p e845566 |
|
|
|
|
|
|
|
README |
|
|
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
So what does this all mean, when we rename a file? |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Renaming file |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
If we're just looking at renaming a file, then the contents of the file will |
|
|
|
|
|
|
|
not change. This means the blob object representing the file does not change. |
|
|
|
|
|
|
|
What does change is: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. The old file's name is removed from the tree object it belong to. |
|
|
|
|
|
|
|
2. The new file's name is added to the tree object it belongs to (with the same |
|
|
|
|
|
|
|
hash in this case). |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
As such Git is not aware of any name changes. This is why the short answer is: |
|
|
|
|
|
|
|
Git doesn't handle file renames. The repository itself has no notion of this |
|
|
|
|
|
|
|
action. It's just has content and a structure for that content. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
However, that does not mean you lose your history when you rename a file. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### How to see history of a renamed file |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
When you remove and add a file (which is what a rename is for Git), Git will |
|
|
|
|
|
|
|
analyze this and when the files are X% alike (with X being defaulted to 50), |
|
|
|
|
|
|
|
it will assume a rename occured. You can show the log of a file including |
|
|
|
|
|
|
|
renames using: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash |
|
|
|
|
|
|
|
git log --follow -- <file> |
|
|
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
If you want to adjust the treshold you can use the `-MX%` option, where X is the |
|
|
|
|
|
|
|
percentage you want (0-100). |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Because there is a percentage treshold, the recommendation is that you do not |
|
|
|
|
|
|
|
combine renaming a file, with modifying a file. If files are 100% identical when |
|
|
|
|
|
|
|
adding/removing it makes it much easier to see them as renames. If on the other |
|
|
|
|
|
|
|
hand, you rename a file and start modifying it heavily, Git might not detect |
|
|
|
|
|
|
|
this as a rename, unless you lower the treshold. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
You can also turn off rename detection by doing `--no-renames` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Can I fix my commit if I did change a lot of content after renaming |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
First, to prevent this: always check using `git status` whether are not the |
|
|
|
|
|
|
|
rename is being detected. Now, how to solve it? |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
It depends. If your commit is local only and it is the last commit, then you can |
|
|
|
|
|
|
|
fix this easily. There are many ways to to it, but a couple options are: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```bash |
|
|
|
|
|
|
|
git mv <newname> <oldname> # Undo the file rename |
|
|
|
|
|
|
|
git commit --amend # Commit the changes to the file |
|
|
|
|
|
|
|
git mv <oldname> <newname> # Rename the file |
|
|
|
|
|
|
|
git commit # Commit the rename |
|
|
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
If the commit is already a couple of commits ago, you can do the same with an |
|
|
|
|
|
|
|
interactive rebase and amending the commit at the right time. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
If you already pushed your commits you will have to check with the team if you |
|
|
|
|
|
|
|
can rewrite the history and push it. If this is not possible, you might need to |
|
|
|
|
|
|
|
find the right treshold to have Git mark it as a rename. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Why did Git do it this way |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This is pure speculation but dealing with renames is not as easy as it first |
|
|
|
|
|
|
|
looks. For instance you could add a git command to do a rename (like subversion |
|
|
|
|
|
|
|
has), which could create a new type of object a rename object which links two |
|
|
|
|
|
|
|
objects (old and new). But what if the user forgets to do this and just uses |
|
|
|
|
|
|
|
`mv fileA fileB` and commits this? Should Git automatically assume this is a |
|
|
|
|
|
|
|
rename? It could use the same treshold discused earlier to determine so. That |
|
|
|
|
|
|
|
would make it easier. But then what is the point of having a dedicated rename |
|
|
|
|
|
|
|
command? I think for easy of use, they just decided not to add such a command, |
|
|
|
|
|
|
|
because it is not a solution for all instances. Instead, the rename detection |
|
|
|
|
|
|
|
works good enough for everything and they leave it up to the commiter to make |
|
|
|
|
|
|
|
sure his renames are detected properly. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Summary |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
So in summary: no, Git does not store renames in its repository. Instead, it |
|
|
|
|
|
|
|
for every add/delete pair part of a commit, Git will do a likeness analysis and |
|
|
|
|
|
|
|
when they are X% alike (default 50%), it will assume a rename occured. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Some commands influenced by this are: git log, git diff and git merge. Options |
|
|
|
|
|
|
|
related to renames are: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
```txt |
|
|
|
|
|
|
|
-M=<n>, --find-renames=<n> # where n is the treshold percentage. |
|
|
|
|
|
|
|
--no-renames # don't do any rename detection |
|
|
|
|
|
|
|
``` |