Technology platforms evolve over years, or even decades, within large, mature organizations. This means that the underlying codebase also evolves, since numerous developers and application managers will invariably manipulate that code with every modification and technology addition. The need to preserve the resiliency of code is dire, which has effectively triggered an explosion of code versioning and management tools.
Two tools in particular -- Git and Apache Subversion (SVN) -- have become top names in the code versioning tool market. But while they chase similar goals, there are some key differences in the way they handle versioning and the places they are best suited to do their job.
To highlight these differences, let's take a look at some of the basics concerning Git and SVN, including their major benefits and most notable drawbacks.
SVN has mostly found a home in development shops where teams deal with large repositories and binary files that prioritize finer access control. For instance, a lot of game development studios continue to favor centralized models for their primary use. By design, SVN provides a centralized approach to code management by storing code and related metadata in a single server. Client machines must connect to the server to retrieve a copy of the code within a particular repository.
What are SVN's benefits?
The entirety of that code repository and its related metadata reside on that single server, with the exception of "working code" that is actively under maintenance. This means that individual clients only need to store the blocks of code they wish to alter, and can commit their changes to the server directly.
Similar to other versioning tools, SVN takes a file-centric approach to version retrieval. SVN stores the latest version of the codebase as a complete file that includes all historic changes and modifications.
SVN still maintains a record of individual past changes, but only stores the specific sections of code that were manipulated rather than a complete replication of the code repository, which is based on the process of delta differencing. To retrieve a specific version all the delta fragments are applied sequentially on the latest version.
What are SVN's disadvantages?
SVN's approach does come with some drawbacks that users should be aware of. Due to its centralized design, most operations dealing with state or history will require a direct connection to the central server.
Every commit pushed to the server requires a new version of the entire repository, including the unchanged files. Pushing a commit may require both the server and client to update before the commit can go through. This is necessary in cases where a locally changed file has a newer revision on the server.
SVN adopts the concept of "branches" in order to isolate any code experiments or incomplete features and uses tags for code snapshotting. SVN also enables you to quickly retrieve versions of a code repository through the checkout process. While SVN doesn't support nested repositories, you can still retrieve and combine changes found in multiple code repositories into one working copy of the code using the command svn:externals.
In contrast between Git and SVN, Git uses a decentralized model and tries to address the needs of large-scale software projects. All git nodes share an equitable relationship as all the data, metadata and history is replicated across participating nodes. A node can be designated as a server for centralized push and pull of code, but that is not a prerequisite. Git is designed to support peer-to-peer exchange of data without a central server.
What are Git's advantages?
In addition to its decentralized design, Git does not take a file-centric view for version management. Git looks at the changes made to files together and creates a blob of changes to store them locally. Every change is stored as a set of object IDs, blob (the content of a file), tree directory listing and a committed snapshot. This prevents the size bloat for a repository over time. Commit snapshot stores metadata like author email, date and parent tree. All metadata is organized within the .git folder at the root level and the storage design and navigation is efficient. Git also supports nested repositories using git submodules.
Due to its design choices, Git is fast, decentralized and supports non-linear commit sequences. Git repositories can be forked and cloned by multiple client nodes. Web platforms like GitHub and GitLab enable downstream forks to submit changes upstream. This model scales and works well for projects that are contributed to by large communities of people. It is also one of the reasons for the adoption of code management platforms like GitHub and GitLab for open source development.
What are Git's drawbacks?
While Git has several benefits, its command-line interface (CLI) options can be a bit overwhelming. To unleash its full power requires a solid understanding of git internals. And, because git stores all data locally, repositories containing large binary files tend to be cumbersome to work.
To handle these large repositories, you'll need an extension like Git-LFS (large file storage) that pushes cumbersome project files to a remote data store, allowing the git project itself to only store references.
A major difference between Git and SVN is that Git has a simpler CLI that pulls down large binary files only when they are being modified. But it relies on a central server connected over a network and can offer only limited functionality without connectivity.
Additionally, in a centralized model, all contributors must have access to the centralized repository. This makes it limiting for use in public projects contributed by large communities. Git was originally authored to address some of these issues.