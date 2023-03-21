At some point, developers working in modular software environments will likely encounter references to "high cohesion, low coupling." This turn of phrase refers to the balance between dependency and autonomy between the various modules that inhabit an application or software architecture.

However, the execution of this guiding principle isn't always as simple as its wording. Finding the right balance of coupling and cohesion requires close attention to the signs of overdependence, the degree to which they split code and the overall impact of application changes. This article will explore the idea of high cohesion and low coupling by examining the two terms independently, the relative tradeoffs involved and the approaches that can help maintain the right relationship between modules.

What is cohesion? The concept of cohesion can be succinctly characterized using Robert C. Martin's famed single-responsibility principle: "A module should be responsible to one, and only one, actor." In other words, there should never be more than one reason for a class to change. This concept plays a foundational role in C programming languages, where standard modules are organized by themes like math, string, time and standard I/O. With today's application development approaches and software architecture styles, these modules could take the form of subroutines, classes, APIs or individual microservices. No matter the module's form, however, components exhibit low cohesion if a single change to the system requires many other module changes in multiple different places. This will lead to a significant number of problems, not least of all the difficulty it adds to the process of finding errors within the codebase.

What is coupling? To implement the single-responsibility principle effectively, we also need to focus on coupling, which refers to the direct, codependent relationships that exist between application components. In systems with tight coupling, component relationships are often fixed and rigid. In loosely coupled systems, those relationships tend to be more flexible and modular. For example, it's arguable that the tight coupling found in a monolithic application architecture provides a valuable measure of stability and predictability. However, developers need to weigh that benefit against the flexibility restrictions a monolith imposes regarding updates and feature additions. A simple change to one interface component could induce a complicated series of file changes, recompiled code or even complete redeployments.

Loose vs. tight coupling in code Imagine a programmer is writing code for an electronic data interchange application. When a file appears in a directory, the application picks it up, reads each line and sends those orders off to a production environment. The programmer decides that the simplest way to write that process is to create a while loop that systematically scans the directory for new files until certain conditions are met. Once the loop algorithm recognizes a file, it waits five seconds to make sure the file's current volume of data isn't growing (i.e., actively receiving new data from a source). Once that's confirmed, the program will create a database connection, open the file, sort through it, run a few SQL inserts and move the file somewhere else. However, if something goes wrong, the programmer will need to manually debug the file system by clearing and refreshing the database. This means the programmer will need to store the name of the database in a separate location and provide an alternative name for testing purposes. Finally, they'll need to use a series of SELECT statements to make sure the database returns the right results. To improve this design, we'll look for what Michael Feathers calls "seams" in his book, Working Effectively with Legacy Code. Seams are places where it makes sense to split code into pieces; for example, write one program that checks a directory for changes, and then write another that inserts records into the database. That way, developers can test each loop, method, SQL process and other code components independently. Instead of creating a new database connection inside the main method, the developer can now call a factory method that retrieves database objects. Those objects could be connections to a production database, test database or even a mock object. As such, there are now five separate components -- one for each of the following five tasks: Check a directory and deal with new files appropriately. Loop through a file, processing each row as a string. Create a SQL INSERT INTO statement for each string. Call a factory method to create database objects. Execute the necessary SQL commands. Now, if a change needs to happen in this particular piece of software, that change is more likely to happen in one place. A real code implementation of one of these methods may look like this: int ExecuteCommand(string SQL, database db)

{

return(db->do(SQL));

} Assuming exception handling is done elsewhere, this method is unlikely to ever change. It can probably safely be included in some higher-level method at the end. Yes, we could spend time and energy to make this independent because we might want to email the file later; we could implement the command pattern today. Of course, this independence does not eliminate the need for thorough end-to-end testing. However, when tests continually fail or new requirements emerge, developers will have a much easier time extending that code. Keep in mind, though, that problems may emerge if programmers are continually forced to make flurries of unanticipated changes. A worst-case example of this would involve large software teams making a series of sweeping changes to an application that may inadvertently overlap with other critical system operations.