|
|
|
|
---
|
|
|
|
|
title: "Renames in Git explained"
|
|
|
|
|
date: 2020-11-28T12:07:00Z
|
|
|
|
|
draft: false
|
|
|
|
|
toc: true
|
|
|
|
|
tags: ['tech', 'git', 'rename']
|
|
|
|
|
author: "Gaël Depreeuw"
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Introduction
|
|
|
|
|
|
|
|
|
|
One of the questions I'm often asked when teaching or explaining Git is how Git
|
|
|
|
|
handles file and/or directory renames. The short answer to this is: **It**
|
|
|
|
|
**doesn't**.
|
|
|
|
|
|
|
|
|
|
The slightly longer answer is: **It does, but probably not in the way you**
|
|
|
|
|
**envision it**.
|
|
|
|
|
|
|
|
|
|
To help you understand this topic a bit more, we first have to go back to the
|
|
|
|
|
basics: What actually is a file or directory name? The answer to this question
|
|
|
|
|
is highly dependent on the underlying file system, but in general it can be
|
|
|
|
|
boiled down to this:
|
|
|
|
|
|
|
|
|
|
> A file (or directory) name is an index used by the file system to look up the
|
|
|
|
|
> contents of the file. (Note: from now on I will only refer to file names, but
|
|
|
|
|
> the same applies to directory names as well.)
|
|
|
|
|
|
|
|
|
|
What you should note from this is that a filename is actually not a property of
|
|
|
|
|
the file content itself, but part of the meta-data regarding the content. In
|
|
|
|
|
Linux, for instance, the filename of a file is stored in the directory, which
|
|
|
|
|
is basically a associative array which maps filenames to inodes (the object
|
|
|
|
|
which stores the meta-data of a file).
|
|
|
|
|
|
|
|
|
|
When renaming a file, what you are actually doing is updating a look up table.
|
|
|
|
|
In Linux, this would be updating the associative array of the directory. If you
|
|
|
|
|
move a file, then you remove the element from one directory and add it to
|
|
|
|
|
another directory.
|
|
|
|
|
|
|
|
|
|
How this all works internally depends on the OS and the underlying file system,
|
|
|
|
|
but more importantly is seldom related to the content of a file. Which brings us
|
|
|
|
|
to the next chapter.
|
|
|
|
|
|
|
|
|
|
## Git stores content, not files
|
|
|
|
|
|
|
|
|
|
When you commit to a Git repository it basically does the following:
|
|
|
|
|
|
|
|
|
|
For each directory (including the top one), create a **tree** object. This is
|
|
|
|
|
done by looking at every file and directory to be commited and create **blob**
|
|
|
|
|
objects for the files and tree objects for the directories. The hash of each
|
|
|
|
|
such object is added to this tree object together with the filename if the
|
|
|
|
|
object type is blob and the directory name if the object type is tree. This is
|
|
|
|
|
then prepended with a header and compressed. The SHA-1 hash is calculated and
|
|
|
|
|
the object is stored in the object store (.git/objects), using the first two
|
|
|
|
|
characters as a directory and the rest as filename.
|
|
|
|
|
|
|
|
|
|
It then creates a commit object which points to the top level tree's hash.
|
|
|
|
|
|
|
|
|
|
> Note: it of course only really does this for files which were part of the
|
|
|
|
|
> staging area. That's the most efficient. Of course if the content of a file
|
|
|
|
|
> was changed, it hash will change and thus the tree object it was part of will
|
|
|
|
|
> change and its hash will also change and so on until the top level tree
|
|
|
|
|
> object.
|
|
|
|
|
|
|
|
|
|
As an example, suppose you have the following structure:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
.
|
|
|
|
|
├── README.md
|
|
|
|
|
├── bar
|
|
|
|
|
│ ├── bar.md
|
|
|
|
|
│ └── baz
|
|
|
|
|
│ └── baz.md
|
|
|
|
|
└── foo
|
|
|
|
|
└── foo.md
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
If you were to commit this structure to git, you will have (simplified):
|
|
|
|
|
|
|
|
|
|
- 4 blob objects (README.md, bar.md, foo.md, baz.md)
|
|
|
|
|
- 4 tree objects (., ./foo, ./bar and ./bar/baz)
|
|
|
|
|
- 1 commit object
|
|
|
|
|
|
|
|
|
|
In my case:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
gael@Aviendha:~/git/tmp$ git commit -m "First commit"
|
|
|
|
|
[master (root-commit) 8be3cf0] First commit
|
|
|
|
|
4 files changed, 4 insertions(+)
|
|
|
|
|
create mode 100644 README.md
|
|
|
|
|
create mode 100644 bar/bar.md
|
|
|
|
|
create mode 100644 bar/baz/baz.md
|
|
|
|
|
create mode 100644 foo/foo.md
|
|
|
|
|
gael@Aviendha:~/git/tmp$ find .git/objects/ -type f
|
|
|
|
|
.git/objects/52/01cdd884658a103819d66f910ea25ba1dad2e0
|
|
|
|
|
.git/objects/be/e527307ae70706c20eb89f205f444c3bb385e9
|
|
|
|
|
.git/objects/6b/dd34e3e9ab26062ab881adb1024923923b5f8e
|
|
|
|
|
.git/objects/8b/e3cf05d01320a124991a8e7c10fe83ec9cd5e3
|
|
|
|
|
.git/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99
|
|
|
|
|
.git/objects/57/16ca5987cbf97d6bb54920bea6adde242d87e6
|
|
|
|
|
.git/objects/f9/07d059fcdc9b594c6e14dc0c3826f26ab47832
|
|
|
|
|
.git/objects/e8/45566c06f9bf557d35e8292c37cf05d97a9769
|
|
|
|
|
.git/objects/0c/7d27db1f575263efdcab3dc650f4502a2dbcbf
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
To get the top level tree object, just look at the commit:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
gael@Aviendha:~/git/tmp$ git cat-file -p 8be3cf0
|
|
|
|
|
tree 5201cdd884658a103819d66f910ea25ba1dad2e0
|
|
|
|
|
author Gaël Depreeuw <gael@depreeuw.dev> 1606569688 +0100
|
|
|
|
|
committer Gaël Depreeuw <gael@depreeuw.dev> 1606569688 +0100
|
|
|
|
|
|
|
|
|
|
First commit
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
And if we look at the tree:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
gael@Aviendha:~/git/tmp$ git cat-file -p 5201cdd
|
|
|
|
|
100644 blob e845566c06f9bf557d35e8292c37cf05d97a9769 README.md
|
|
|
|
|
040000 tree f907d059fcdc9b594c6e14dc0c3826f26ab47832 bar
|
|
|
|
|
040000 tree 0c7d27db1f575263efdcab3dc650f4502a2dbcbf foo
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
The contents of `README.md` is:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
gael@Aviendha:~/git/tmp$ git cat-file -p e845566
|
|
|
|
|
README
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
So what does this all mean, when we rename a file?
|
|
|
|
|
|
|
|
|
|
## Renaming file
|
|
|
|
|
|
|
|
|
|
If we're just looking at renaming a file, then the contents of the file will
|
|
|
|
|
not change. This means the blob object representing the file does not change.
|
|
|
|
|
What does change is:
|
|
|
|
|
|
|
|
|
|
1. The old file's name is removed from the tree object it belong to.
|
|
|
|
|
2. The new file's name is added to the tree object it belongs to (with the same
|
|
|
|
|
hash in this case).
|
|
|
|
|
|
|
|
|
|
As such Git is not aware of any name changes. This is why the short answer is:
|
|
|
|
|
Git doesn't handle file renames. The repository itself has no notion of this
|
|
|
|
|
action. It's just has content and a structure for that content.
|
|
|
|
|
|
|
|
|
|
However, that does not mean you lose your history when you rename a file.
|
|
|
|
|
|
|
|
|
|
### How to see history of a renamed file
|
|
|
|
|
|
|
|
|
|
When you remove and add a file (which is what a rename is for Git), Git will
|
|
|
|
|
analyze this and when the files are X% alike (with X being defaulted to 50),
|
|
|
|
|
it will assume a rename occured. You can show the log of a file including
|
|
|
|
|
renames using:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
git log --follow -- <file>
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
If you want to adjust the treshold you can use the `-MX%` option, where X is the
|
|
|
|
|
percentage you want (0-100).
|
|
|
|
|
|
|
|
|
|
Because there is a percentage treshold, the recommendation is that you do not
|
|
|
|
|
combine renaming a file, with modifying a file. If files are 100% identical when
|
|
|
|
|
adding/removing it makes it much easier to see them as renames. If on the other
|
|
|
|
|
hand, you rename a file and start modifying it heavily, Git might not detect
|
|
|
|
|
this as a rename, unless you lower the treshold.
|
|
|
|
|
|
|
|
|
|
You can also turn off rename detection by doing `--no-renames`
|
|
|
|
|
|
|
|
|
|
### Can I fix my commit if I did change a lot of content after renaming
|
|
|
|
|
|
|
|
|
|
First, to prevent this: always check using `git status` whether are not the
|
|
|
|
|
rename is being detected. Now, how to solve it?
|
|
|
|
|
|
|
|
|
|
It depends. If your commit is local only and it is the last commit, then you can
|
|
|
|
|
fix this easily. There are many ways to to it, but a couple options are:
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
git mv <newname> <oldname> # Undo the file rename
|
|
|
|
|
git commit --amend # Commit the changes to the file
|
|
|
|
|
git mv <oldname> <newname> # Rename the file
|
|
|
|
|
git commit # Commit the rename
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
If the commit is already a couple of commits ago, you can do the same with an
|
|
|
|
|
interactive rebase and amending the commit at the right time.
|
|
|
|
|
|
|
|
|
|
If you already pushed your commits you will have to check with the team if you
|
|
|
|
|
can rewrite the history and push it. If this is not possible, you might need to
|
|
|
|
|
find the right treshold to have Git mark it as a rename.
|
|
|
|
|
|
|
|
|
|
### Why did Git do it this way
|
|
|
|
|
|
|
|
|
|
This is pure speculation but dealing with renames is not as easy as it first
|
|
|
|
|
looks. For instance you could add a git command to do a rename (like subversion
|
|
|
|
|
has), which could create a new type of object a rename object which links two
|
|
|
|
|
objects (old and new). But what if the user forgets to do this and just uses
|
|
|
|
|
`mv fileA fileB` and commits this? Should Git automatically assume this is a
|
|
|
|
|
rename? It could use the same treshold discused earlier to determine so. That
|
|
|
|
|
would make it easier. But then what is the point of having a dedicated rename
|
|
|
|
|
command? I think for easy of use, they just decided not to add such a command,
|
|
|
|
|
because it is not a solution for all instances. Instead, the rename detection
|
|
|
|
|
works good enough for everything and they leave it up to the commiter to make
|
|
|
|
|
sure his renames are detected properly.
|
|
|
|
|
|
|
|
|
|
## Summary
|
|
|
|
|
|
|
|
|
|
So in summary: no, Git does not store renames in its repository. Instead, it
|
|
|
|
|
for every add/delete pair part of a commit, Git will do a likeness analysis and
|
|
|
|
|
when they are X% alike (default 50%), it will assume a rename occured.
|
|
|
|
|
|
|
|
|
|
Some commands influenced by this are: git log, git diff and git merge. Options
|
|
|
|
|
related to renames are:
|
|
|
|
|
|
|
|
|
|
```txt
|
|
|
|
|
-M=<n>, --find-renames=<n> # where n is the treshold percentage.
|
|
|
|
|
--no-renames # don't do any rename detection
|
|
|
|
|
```
|