main-site/content/post/renames-in-git-explained.md

---
title: "Renames in Git explained"
date: 2020-11-28T12:07:00Z
draft: false
toc: true
tags: ['tech', 'git', 'rename']
author: "Gaël Depreeuw"
---

## Introduction

One of the questions I'm often asked when teaching or explaining Git is how Git
handles file and/or directory renames. The short answer to this is: **It**
**doesn't**.

The slightly longer answer is: **It does, but probably not in the way you**
**envision it?**.

Let's first take a look at how Git works internally. If you don't quite
understand everything which follows, I can recommend reading chapter 10 of
the [Git Pro Book 2nd. Edition](https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain).

## Git stores content, not files

When you commit to a Git repository it basically does the following:

Create a **blob** object for every file in the index (a.k.a. the staging area).
A blob object is created by taking the content of the file, prepending a header
and compressing the result. A SHA-1 hash is then calculated for this object
which will be used to identify the object. The object is stored in the aptly
named object store (found in `.git/objects`). The first 2 characters of the
hash (in hex format) are used as a directory within this store, while the
remaining characters are the filename of the blob object.

Let's look at an example. Create a git repo somewhere and create a file.

```bash
git init foo
cd foo
echo "foo" >> foo.txt
```

If you look into your `.git/objects` directory, it will be empty, aside from
two empty subdirectories. Let's create a blob out of this file now.

```bash
$ git hash-object -w foo.txt
257cc5642cb1a054f08cc83f2d943e56fd3ebe99
```

You will now find an object in the store at:

```bash
$ find .git/objects -type f
.git/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99
```

Here's an interesting exercise: what happens if you rename the file and create
the blob with the renamed file?

```bash
$ mv foo.txt bar.txt
$ git hash-object -w bar.txt
257cc5642cb1a054f08cc83f2d943e56fd3ebe99
```

That's right, nothing changed, this makes sense as we're only adding the content
to the object store! So how does Git remember the file names?

## Filenames are part of tree objects

Aside from **blob** objects, Git also creates **tree** objects. You can sort of
compare it to the directories in your worktree, i.e. for each directory in your
worktree, you will have a tree object. A tree object's content looks like:

```code
<mode> <type> <hash>    <name>
```

You can create this one for yourself by doing:

```bash
$ git update-index --add --cacheinfo 100644 \
  257cc5642cb1a054f08cc83f2d943e56fd3ebe99 foo.txt
$ git write-tree
fcf0be4d7e45f0ef9592682ad68e42270b0366b4
$ git cat-file -p fcf0be4d7e45f0ef9592682ad68e42270b0366b4
100644 blob 257cc5642cb1a054f08cc83f2d943e56fd3ebe99    foo.txt
```

There are 3 different types (that I know of) which can be referred to in a tree
object: blob, tree, commit. Blobs represent file content, tree represent other
tree (i.e. subdirectories) and commits represent submodules (i.e the commit
at which they are included). A commit is a type of object which is also present
outside the tree objects. They contain the top tree object (representing the
top level of your repository), a link to one or more parent commits and some
meta data (author, commit msg, date, ...). Finally and for completion's sake,
there is also an object for annotated tags, which contain the commit it is
pointing too as well as some meta data.

## Renaming

Armed with the knowledge about trees and blobs, it should be fairly easy to
understand what happens if you rename a file. To make not make it easier to
understand, consider a simple example: we just rename a file at the top level.

> Note: more complex examples are just more time consuming to explain, but
> not to understand. The same principles apply.

In case of such a rename, when you commit this rename, your repository will
be impacted as follows:

- The blob representing the file remains unchanged.
- The top level tree object changes as it now has a different file name.
- The commit object will point to the new tree. (It's parent will point to the
  old tree.)

Nowhere is there any special mention of a rename occuring. Remember, we're just
storing content! As such Git is not aware of any name changes. This is why the
short answer was: Git doesn't handle file renames. The repository itself has no
notion of this action. It's just has content and a structure for that content.

However, that does not mean you lose your history when you rename a file.

### How to see history of a renamed file

Git might not store information on renames in it repository but it does come
packed with an algorithm that detects file renames. For every add/delete pair
added to the index, it determines how alike the paired files are. If they are
at least 50% alike, it considered the pair to have been a rename. If there
are multiple possibilities it takes the highest percentage one. If multipe files
have the same percentage, it picks one depending on the implementation.

> **Note**: I believe, but am not sure, it basicaly takes the first
> alphabeticaly match in the last case.

By default `git log -- <file>` does not track accross renames. If you want to
do see the history across renames, you will need to add the `--follow` option.

You can also define the treshold percentage to be different from 50%. This is
done via the `-M<n>` or `--find-renames=<n>` option. See the git documentation
for the correct syntax.

You can also turn off rename detection by doing `--no-renames`

### Rename best practice

Because of the treshold and the  cheapness of commits, it is recommended that
when you rename a file/directory. You commit those renames first, before you
continue working on the renamed file. This basically makes it so you can use
a treshold of 100% all the time.

### Why did Git do it this way

This is pure speculation, but here's my thoughts on it:

Filenames are actually part of the underlying file systems, so for a version
controls system to support multiple file system they have to handle filenames
in their own way. This includes renames. If you think about what this would
require for Git, it would not be very straightforward: Git could have chosen to
provide a command to store rename data, let's say: `git rename fileA fileB`, but
what should this command do? We can image it could create new '**rename**
object, which would hold the blob hash and the name of the previous file. Now,
every time you would walk through history, when you encounter this object type,
you would need to remember this redirections. There's probably a lot of little
nuances which are not immediately apparent though and it does not deal with one
of the major drawbacks of this new command: What happens if the user forgets it
and just does `mv fileA fileB`?

Well, we'd actually want to have some mechanism to detect this as a rename as
once this is commited it becomes more difficult to undo this change
(especially if we already pushed the commit!). So it sure would be nice if Git
could somehow figure out that it was a rename. Which is exactly what they did.
But now that we have this functionality, what actually is the point of the
new command we wanted to implement? This is probably highly subjective, but to
me it seems completely irrelevant now. Instead of having a command which can be
forgotten and for which we need contigency, just use the contigency as the
solution! It makes the behaviour a lot more consistent!

### Can I fix my commit if I did change a lot of content after renaming

First, to prevent this: always check using `git status` whether are not the
rename is being detected. Now, how to solve it?

It depends. If your commit is local only and it is the last commit, then you can
fix this easily. There are many ways to to it, but one option is:

```bash
git mv <newname> <oldname> # Undo the file rename
git commit --amend # Commit the changes to the file
git mv <oldname> <newname> # Rename the file
git commit # Commit the rename
```

If you want to rename first and the changes second you can also do this, but
it is a bit more complex:

```bash
git reset --soft HEAD~ # Go back one commit, but keep the changes
git restore --staged <oldname> <newname> # unstage the deletion and addition
git restore <oldname> # undelete the old file
mv <newname> <newname.tmp> # make a temp backup of the new file
git mv <oldname> <newname> # Rename the old file
git commit # commit the rename
cp <newname.tmp> <newname> # apply the new changes
git commit -a # Commit the changes
```

If the commit is already a couple of commits ago, you can do the same with an
interactive rebase and doing either of the above at the correct time.

If you already pushed your commits you will have to check with the team if you
can rewrite the history and push it. If this is not possible, you might need to
find the right treshold to have Git mark it as a rename.

## Summary

So in summary: no, Git does not store renames in its repository. Instead, for
every add/delete pair in a commit, Git will do an similarity analysis and
when they are X% alike (default 50%), it will assume a rename occured.

Some commands influenced by this are: `git log`, `git diff` and `git merge`.
Options related to renames are:

```txt
-M=<n>, --find-renames=<n> # where n is the treshold percentage.
--no-renames # don't do any rename detection
```

It is best practise to handle renames in their own commits. Try to avoid
renaming and modifying a file within the same commit.
Add "Renames in Git explained" 4 years ago			`---`
			`title: "Renames in Git explained"`
			`date: 2020-11-28T12:07:00Z`
			`draft: false`
			`toc: true`
			`tags: ['tech', 'git', 'rename']`
Update default.md archetype 4 years ago			`author: "Gaël Depreeuw"`
Add "Renames in Git explained" 4 years ago			`---`

			`## Introduction`

			`One of the questions I'm often asked when teaching or explaining Git is how Git`
			`handles file and/or directory renames. The short answer to this is: It`
			`doesn't.`

			`The slightly longer answer is: It does, but probably not in the way you`
Improve renames in git blog post 4 years ago			`envision it?.`
Add "Renames in Git explained" 4 years ago
Improve renames in git blog post 4 years ago			`Let's first take a look at how Git works internally. If you don't quite`
			`understand everything which follows, I can recommend reading chapter 10 of`
			`the [Git Pro Book 2nd. Edition](https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain).`
Add "Renames in Git explained" 4 years ago
			`## Git stores content, not files`

			`When you commit to a Git repository it basically does the following:`

Improve renames in git blog post 4 years ago			`Create a blob object for every file in the index (a.k.a. the staging area).`
			`A blob object is created by taking the content of the file, prepending a header`
			`and compressing the result. A SHA-1 hash is then calculated for this object`
			`which will be used to identify the object. The object is stored in the aptly`
			named object store (found in `.git/objects`). The first 2 characters of the
			`hash (in hex format) are used as a directory within this store, while the`
			`remaining characters are the filename of the blob object.`
Add "Renames in Git explained" 4 years ago
Improve renames in git blog post 4 years ago			`Let's look at an example. Create a git repo somewhere and create a file.`
Add "Renames in Git explained" 4 years ago
			```bash
Improve renames in git blog post 4 years ago			`git init foo`
			`cd foo`
			`echo "foo" >> foo.txt`
Add "Renames in Git explained" 4 years ago			```

Improve renames in git blog post 4 years ago			If you look into your `.git/objects` directory, it will be empty, aside from
			`two empty subdirectories. Let's create a blob out of this file now.`
Add "Renames in Git explained" 4 years ago
Improve renames in git blog post 4 years ago			```bash
			`$ git hash-object -w foo.txt`
			`257cc5642cb1a054f08cc83f2d943e56fd3ebe99`
			```
Add "Renames in Git explained" 4 years ago
Improve renames in git blog post 4 years ago			`You will now find an object in the store at:`
Add "Renames in Git explained" 4 years ago
			```bash
Improve renames in git blog post 4 years ago			`$ find .git/objects -type f`
Add "Renames in Git explained" 4 years ago			`.git/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99`
			```

Improve renames in git blog post 4 years ago			`Here's an interesting exercise: what happens if you rename the file and create`
			`the blob with the renamed file?`
Add "Renames in Git explained" 4 years ago
			```bash
Improve renames in git blog post 4 years ago			`$ mv foo.txt bar.txt`
			`$ git hash-object -w bar.txt`
			`257cc5642cb1a054f08cc83f2d943e56fd3ebe99`
Add "Renames in Git explained" 4 years ago			```

Improve renames in git blog post 4 years ago			`That's right, nothing changed, this makes sense as we're only adding the content`
			`to the object store! So how does Git remember the file names?`
Add "Renames in Git explained" 4 years ago
Improve renames in git blog post 4 years ago			`## Filenames are part of tree objects`

			`Aside from blob objects, Git also creates tree objects. You can sort of`
			`compare it to the directories in your worktree, i.e. for each directory in your`
			`worktree, you will have a tree object. A tree object's content looks like:`

			```code
			`<mode> <type> <hash> <name>`
Add "Renames in Git explained" 4 years ago			```

Improve renames in git blog post 4 years ago			`You can create this one for yourself by doing:`
Add "Renames in Git explained" 4 years ago
			```bash
Improve renames in git blog post 4 years ago			`$ git update-index --add --cacheinfo 100644 \`
			`257cc5642cb1a054f08cc83f2d943e56fd3ebe99 foo.txt`
			`$ git write-tree`
			`fcf0be4d7e45f0ef9592682ad68e42270b0366b4`
			`$ git cat-file -p fcf0be4d7e45f0ef9592682ad68e42270b0366b4`
			`100644 blob 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 foo.txt`
Add "Renames in Git explained" 4 years ago			```

Improve renames in git blog post 4 years ago			`There are 3 different types (that I know of) which can be referred to in a tree`
			`object: blob, tree, commit. Blobs represent file content, tree represent other`
			`tree (i.e. subdirectories) and commits represent submodules (i.e the commit`
			`at which they are included). A commit is a type of object which is also present`
			`outside the tree objects. They contain the top tree object (representing the`
			`top level of your repository), a link to one or more parent commits and some`
			`meta data (author, commit msg, date, ...). Finally and for completion's sake,`
			`there is also an object for annotated tags, which contain the commit it is`
			`pointing too as well as some meta data.`
Add "Renames in Git explained" 4 years ago
Improve renames in git blog post 4 years ago			`## Renaming`
Add "Renames in Git explained" 4 years ago
Improve renames in git blog post 4 years ago			`Armed with the knowledge about trees and blobs, it should be fairly easy to`
			`understand what happens if you rename a file. To make not make it easier to`
			`understand, consider a simple example: we just rename a file at the top level.`
Add "Renames in Git explained" 4 years ago
Improve renames in git blog post 4 years ago			`> Note: more complex examples are just more time consuming to explain, but`
			`> not to understand. The same principles apply.`
Add "Renames in Git explained" 4 years ago
Improve renames in git blog post 4 years ago			`In case of such a rename, when you commit this rename, your repository will`
			`be impacted as follows:`

			`- The blob representing the file remains unchanged.`
			`- The top level tree object changes as it now has a different file name.`
			`- The commit object will point to the new tree. (It's parent will point to the`
			`old tree.)`

			`Nowhere is there any special mention of a rename occuring. Remember, we're just`
			`storing content! As such Git is not aware of any name changes. This is why the`
			`short answer was: Git doesn't handle file renames. The repository itself has no`
			`notion of this action. It's just has content and a structure for that content.`
Add "Renames in Git explained" 4 years ago
			`However, that does not mean you lose your history when you rename a file.`

			`### How to see history of a renamed file`

Improve renames in git blog post 4 years ago			`Git might not store information on renames in it repository but it does come`
			`packed with an algorithm that detects file renames. For every add/delete pair`
			`added to the index, it determines how alike the paired files are. If they are`
			`at least 50% alike, it considered the pair to have been a rename. If there`
			`are multiple possibilities it takes the highest percentage one. If multipe files`
			`have the same percentage, it picks one depending on the implementation.`
Add "Renames in Git explained" 4 years ago
Improve renames in git blog post 4 years ago			`> Note: I believe, but am not sure, it basicaly takes the first`
			`> alphabeticaly match in the last case.`
Add "Renames in Git explained" 4 years ago
Improve renames in git blog post 4 years ago			By default `git log -- <file>` does not track accross renames. If you want to
			do see the history across renames, you will need to add the `--follow` option.
Add "Renames in Git explained" 4 years ago
Improve renames in git blog post 4 years ago			`You can also define the treshold percentage to be different from 50%. This is`
			done via the `-M<n>` or `--find-renames=<n>` option. See the git documentation
			`for the correct syntax.`
Add "Renames in Git explained" 4 years ago
			You can also turn off rename detection by doing `--no-renames`

Improve renames in git blog post 4 years ago			`### Rename best practice`

			`Because of the treshold and the cheapness of commits, it is recommended that`
			`when you rename a file/directory. You commit those renames first, before you`
			`continue working on the renamed file. This basically makes it so you can use`
			`a treshold of 100% all the time.`

			`### Why did Git do it this way`

			`This is pure speculation, but here's my thoughts on it:`

			`Filenames are actually part of the underlying file systems, so for a version`
			`controls system to support multiple file system they have to handle filenames`
			`in their own way. This includes renames. If you think about what this would`
			`require for Git, it would not be very straightforward: Git could have chosen to`
			provide a command to store rename data, let's say: `git rename fileA fileB`, but
			`what should this command do? We can image it could create new 'rename`
			`object, which would hold the blob hash and the name of the previous file. Now,`
			`every time you would walk through history, when you encounter this object type,`
			`you would need to remember this redirections. There's probably a lot of little`
			`nuances which are not immediately apparent though and it does not deal with one`
			`of the major drawbacks of this new command: What happens if the user forgets it`
			and just does `mv fileA fileB`?

			`Well, we'd actually want to have some mechanism to detect this as a rename as`
			`once this is commited it becomes more difficult to undo this change`
			`(especially if we already pushed the commit!). So it sure would be nice if Git`
			`could somehow figure out that it was a rename. Which is exactly what they did.`
			`But now that we have this functionality, what actually is the point of the`
			`new command we wanted to implement? This is probably highly subjective, but to`
			`me it seems completely irrelevant now. Instead of having a command which can be`
			`forgotten and for which we need contigency, just use the contigency as the`
			`solution! It makes the behaviour a lot more consistent!`

Add "Renames in Git explained" 4 years ago			`### Can I fix my commit if I did change a lot of content after renaming`

			First, to prevent this: always check using `git status` whether are not the
			`rename is being detected. Now, how to solve it?`

			`It depends. If your commit is local only and it is the last commit, then you can`
Improve renames in git blog post 4 years ago			`fix this easily. There are many ways to to it, but one option is:`
Add "Renames in Git explained" 4 years ago
			```bash
			`git mv <newname> <oldname> # Undo the file rename`
			`git commit --amend # Commit the changes to the file`
			`git mv <oldname> <newname> # Rename the file`
			`git commit # Commit the rename`
			```

Improve renames in git blog post 4 years ago			`If you want to rename first and the changes second you can also do this, but`
			`it is a bit more complex:`

			```bash
			`git reset --soft HEAD~ # Go back one commit, but keep the changes`
			`git restore --staged <oldname> <newname> # unstage the deletion and addition`
			`git restore <oldname> # undelete the old file`
			`mv <newname> <newname.tmp> # make a temp backup of the new file`
			`git mv <oldname> <newname> # Rename the old file`
			`git commit # commit the rename`
			`cp <newname.tmp> <newname> # apply the new changes`
			`git commit -a # Commit the changes`
			```

Add "Renames in Git explained" 4 years ago			`If the commit is already a couple of commits ago, you can do the same with an`
Improve renames in git blog post 4 years ago			`interactive rebase and doing either of the above at the correct time.`
Add "Renames in Git explained" 4 years ago
			`If you already pushed your commits you will have to check with the team if you`
			`can rewrite the history and push it. If this is not possible, you might need to`
			`find the right treshold to have Git mark it as a rename.`

			`## Summary`

Improve renames in git blog post 4 years ago			`So in summary: no, Git does not store renames in its repository. Instead, for`
			`every add/delete pair in a commit, Git will do an similarity analysis and`
Add "Renames in Git explained" 4 years ago			`when they are X% alike (default 50%), it will assume a rename occured.`

Improve renames in git blog post 4 years ago			Some commands influenced by this are: `git log`, `git diff` and `git merge`.
			`Options related to renames are:`
Add "Renames in Git explained" 4 years ago
			```txt
			`-M=<n>, --find-renames=<n> # where n is the treshold percentage.`
			`--no-renames # don't do any rename detection`
			```
Improve renames in git blog post 4 years ago
			`It is best practise to handle renames in their own commits. Try to avoid`
			`renaming and modifying a file within the same commit.`