.gitattributes Best Practices

Git

Update (2020-08-04)

Added some more information about CRLF line endings for .cmd and .bat files. Also added a section about Git LFS support on GitHub.

.gitignore

If you've messed with Git for long enough, you're aware that you can use the .gitignore file to exclude files from being checked into your repository. There is even a whole GitHub repository with nothing but pre-made .gitignore files you can download. If you work with anything vaguely in the Microsoft space with Visual Studio, you probably want the 'Visual Studio' .gitignore file.

.gitattributes

There is a lesser know .gitattributes file that can control a bunch of Git settings that you should consider adding to almost every repository as a matter of course.

Line Endings

If you've studied a little computer science, you'll have seen that operating systems use different characters to represent line feeds in text files. Windows uses a Carriage Return (CR) followed by the Line Feed (LF) character, while Unix based operating systems use the Line Feed (LF) alone. All of this has it's origin in typewriters which is pretty amazing given how antiquated they are. I recommend reading the Newline Wikipedia article for more on the subject.

Newline characters often cause problems in Git when you have developers working on different operating systems (Windows, Mac and Linux). If you've ever seen a phantom file change where there are no visible changes, that could be because the line endings in the file have been changed from CRLF to LF or vice versa.

Git can actually be configured to automatically handle line endings using a setting called autocrlf. This automatically changes the line endings in files depending on the operating system. However, you shouldn't rely on people having correctly configured Git installations. If someone with an incorrect configuration checked in a file, it would not be easily visible in a pull request and you'd end up with a repository with inconsistent line endings.

The solution to this is to add a .gitattributes file at the root of your repository and set the line endings to be automatically normalised like so:

# Set default behavior to automatically normalize line endings.
* text=auto

# Force batch scripts to always use CRLF line endings so that if a repo is accessed
# in Windows via a file share from Linux, the scripts will work.
*.{cmd,[cC][mM][dD]} text eol=crlf
*.{bat,[bB][aA][tT]} text eol=crlf
*.{ics,[iI][cC][sS]} text eol=crlf

# Force bash scripts to always use LF line endings so that if a repo is accessed
# in Unix via a file share from Windows, the scripts will work.
*.sh text eol=lf

Only the first line is strictly necessary. It hard codes the line endings for Windows cmd and batch scripts to CRLF and bash scripts to be LF, so that they can be executed via a file share. It's a practice I picked up from the corefx repository.

Git Large File System (LFS)

It's pretty common to want to check binary files into your Git repository. Building a website for example, involves images, fonts, maybe some compressed archives too. The problem with these binary files is that they bloat the repository a fair bit. Every time you check-in a change to a binary file, you've now got both files saved in Git's history. Over time this bloats the repository and makes cloning it slow. A much better solution is to use Git Large File System (LFS). LFS stores binary files in a separate file system. When you clone a repository, you only download the latest copies of the binary files and not every single changed version of them.

LFS is supported by most source control providers like GitHub, Bitbucket and Azure DevOps. It a plugin to Git that has to be separately installed (It's a checkbox in the Git installer) and it even has it's own CLI command 'git lfs' so you can run queries and operations against the files in LFS. You can control which files fall under LFS's remit in the .gitattributes file like so:

 # Archives
*.7z filter=lfs diff=lfs merge=lfs -text
*.br filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text

# Documents
*.pdf filter=lfs diff=lfs merge=lfs -text

# Images
*.gif filter=lfs diff=lfs merge=lfs -text
*.ico filter=lfs diff=lfs merge=lfs -text
*.jpg filter=lfs diff=lfs merge=lfs -text
*.png filter=lfs diff=lfs merge=lfs -text
*.psd filter=lfs diff=lfs merge=lfs -text
*.webp filter=lfs diff=lfs merge=lfs -text

# Fonts
*.woff2 filter=lfs diff=lfs merge=lfs -text

# Other
*.exe filter=lfs diff=lfs merge=lfs -text

So here I've added a whole list of file extensions for various file types I want to be controlled by Git LFS. I tell Git that I want to filter, diff and merge using the LFS tool and finally the -text argument tells Git that this is not a text file, which is a strange way to tell it that it's a binary file.

A quick warning about adding LFS to an existing repository with existing binary files checked into it. The existing binary files will be checked into Git and not LFS without rewriting Git history which would be bad and you shouldn't do unless you are the only developer. You will have to add a one off commit to take the latest versions of all binary files and add them to LFS. Everyone who uses the repository will also have to re-clone the repository (I found this out the hard way in a team of 15 people. Many apologies were made over the course of a week). Ideally you add this from day one and educate developers about Git's treatment of binary files, so people don't check-in any binary files not controlled by LFS.

GitHub

GitHub does technically provide Git LFS for free but they limit the bandwidth to 1GB. If your repository is public and you have any traffic going to your site whatsoever, you will get through that very quickly. GitHub charges $5 per month for a data pack which gives you 50GB of bandwidth per month which I've found is enough for a moderately popular GitHub repository.

I really don't understand why GitHub charges for Git LFS because people who don't want to pay are just going to check in binary files in Git instead which presumably would cost them more bandwidth. Surely they should be encouraging it's use by making it free?

Binary Files

When talking about the .gitattributes file, you will quite often hear some people talk about explicitly listing all binary files instead of relying on Git to auto-detect binary files (yes Git is clever enough to do that) like this:

# Denote all files that are truly binary and should not be modified.
*.png binary
*.jpg binary

As you saw above, we already do this with Git LFS but if you don't use LFS, read on as you may need to explicitly list binary files in certain rare circumstances.

I was interested so I asked a Stack Overflow question and got great answers. If you look at the Git source code, it checks first 8,000 bytes of a file to see if it contains a NUL character. If it does, the file is assumed to be binary. However, there are cases where you may need to do it explicitly:

  • UTF-16 encoded files could be mis-detected as binary.
  • Some image format or file that consists only of printable ASCII bytes. This is pretty weird and sounds unlikely to happen.

Final Form

This is what the final .gitattributes file I copy to most repositories looks like:

###############################
# Git Line Endings            #
###############################

# Set default behaviour to automatically normalize line endings.
* text=auto

# Force batch scripts to always use CRLF line endings so that if a repo is accessed
# in Windows via a file share from Linux, the scripts will work.
*.{cmd,[cC][mM][dD]} text eol=crlf
*.{bat,[bB][aA][tT]} text eol=crlf

# Force bash scripts to always use LF line endings so that if a repo is accessed
# in Unix via a file share from Windows, the scripts will work.
*.sh text eol=lf

###############################
# Git Large File System (LFS) #
###############################

# Archives
*.7z filter=lfs diff=lfs merge=lfs -text
*.br filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.tar filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text

# Documents
*.pdf filter=lfs diff=lfs merge=lfs -text

# Images
*.gif filter=lfs diff=lfs merge=lfs -text
*.ico filter=lfs diff=lfs merge=lfs -text
*.jpg filter=lfs diff=lfs merge=lfs -text
*.pdf filter=lfs diff=lfs merge=lfs -text
*.png filter=lfs diff=lfs merge=lfs -text
*.psd filter=lfs diff=lfs merge=lfs -text
*.webp filter=lfs diff=lfs merge=lfs -text

# Fonts
*.woff2 filter=lfs diff=lfs merge=lfs -text

# Other
*.exe filter=lfs diff=lfs merge=lfs -text

Conclusions

All of the above are bits and pieces I've put together over time. Are there any other settings that should be considered best practice and added to any .gitattributes file?

Web Mentions

What's this?

0 Replies

Comment

Initializing...