How Refactoring Helps Dealing with Legacy Code

You are writing code either professionally or for fun. A lot of times, we think that our code will never change.

But then we run into many situations where that's not the case.

Clients/business partners call you and ask to update business logic and add more functionalities.
You are working on a personal project and you want to add a new feature as fast as possible.
A bug is reported and it's not easy to test your code thus requires changing the code for the sake of testing.

What would help you to mitigate these issues?

I recently read Beyond Legacy Code: Nine Practices to Extend the Life (and Value) of Your Software by David Scott Berstein. The book is about how to deal with codes that are hard to maintain and test. Chapter 13 deals with a practice of refactoring. When you constantly refactor your code, it becomes more understandable and testable.

"Beyond Legacy Code" discusses why refactoring is necessary but a concrete implementation example is not present. I'd like you to be familiarized with steps necessary to make a method more maintainable and testable after a series of refactorings.

Let's implement a method, GetIndexAfterFoundWord, which searches a word within a text and returns an index after the found word.

E.g.) If you search for a word "long" in "This is a long text to be searched". "long" is found at index 10 (zero-based) and the method returns an index after found word. In this case, the method returns 14 (index 10 + 4, which is the length of "long").

Note: Don't try to understand the code. Just skim through and imagine how hard it'd be to update the code.

1private static int GetIndexAfterFoundWord(string text, string searchWord)2{3	// Build Prefix KMP Table4	int j = 0;5	int i = j + 1;6	int[] T = Enumerable.Repeat(0, searchWord.Length).ToArray();7	T[0] = 0;8
9	while (i < searchWord.Length)10	{11		if (searchWord[i] == searchWord[j])12		{13			T[i] = j + 1;14			j++;15			i++;16		}17		else18		{19			while (j >= 1 && searchWord[j] != searchWord[i])20			{21				j = T[j - 1];22				if (j == 0) break;23			}24
25			if (searchWord[j] == searchWord[i])26				T[i] = j + 1;27
28			i++;29		}30	}31
32	// Search searchWord using the table.33	int wi = 0;  // index position for searchWord34	int m = 0;  // index position for text35	List<int> found = new List<int>();36
37	while (m + wi < text.Length)38	{39		if (text[m + wi] == searchWord[wi])40		{41			wi++;42			if (wi == searchWord.Length)43			{44				found.Add(m);45				m = m + wi - T[wi - 1];46				wi = T[wi - 1];47			}48		}49		else50		{51			if (T[wi] == 0)52			{53				m = m + wi + 1;54				wi = 0;55			}56			else57			{58				m = m + wi;59				wi = (wi - 1) < 0 ? 0 : T[wi - 1];60			}61		}62	}63
64	// return the index after found word65	return found.First() + searchWord.Length;66}

It uses Knuth-Morris-Pratt (KMP) algorithm to return found indices and simply returns found index + search word length.

The problem is that GetIndexAfterFoundWord is very hard to read and also hard to make changes. It doesn't have to know about building a prefix table. The method only needs to know what KMP does and doesn't need to know how it works.

Let's abstract table building and search code into methods by using Extract Method refactoring.

First Refactoring

Here is GetIndexAfterFoundWord2 after the refactoring.

1private static int GetIndexAfterFoundWord2(string text, string searchWord)2{3	int[] prefixTable = BuildPrefixTable(searchWord);4	int[] found = SearchByKmp(text, searchWord, prefixTable);5	return found.First() + searchWord.Length;6}

Now the code is more intention revealing and the code is self-documenting (at this point, comments are redundant since the method name shows what each line does).

But the problem mentioned before refactor still exists.

GetIndexAfterFoundWord2 still knows too much about the internals of KMP thus not operating at the same level of abstraction of GetIndexAfterFoundWord2.

Let's wrap the KMP search logic in a class using Extract Class refactoring.

Second Refactoring

Here is GetIndexAfterFoundWord3 after 2nd refactoring.

1private static int GetIndexAfterFoundWord3(string text, string searchWord)2{3	KmpSearch kmpSearch = new KmpSearch();4	int[] found = kmpSearch.Find(text, searchWord);5	return found.First() + searchWord.Length;6}

Now KMP code can be reused and the abstraction level is the same. KMP algorithm is now reusable.

The last problem is that, if we need to test GetIndexAfterFoundWord3 or make changes to the underlying implementation of the search algorithm, then we need to update GetIndexAfterFoundWord3, which violates the Open-Close Principle.

Open-Close Principle states that

software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification

GetIndexAfterFoundWord3 doesn't need to know which algorithm it is using so let's abstract one more time by creating an interface and interact with search code via the interface (Extract Interface Refactoring).

Third Refactoring

Here is GetIndexAfterFoundWord4 after Extract Interface Refactoring.

1private static int GetIndexAfterFoundWord4(string text, string searchWord, ITextSearch textSearch)2{3	int[] found = textSearch.Find(text, searchWord);4	return found.First() + searchWord.Length;5}

Now GetIndexAfterFoundWord4 doesn't know how text search functionality is implemented. It just knows that textSearch returns an index and that's all GetIndexAfterFoundWord4 has to know about.

This method is easily testable because you can mock out ITextSearch and pass any object you need to.

Let me show you how easy it is to replace the algorithm.

I created a class, SlowSearch that implements ITextSearch.

1public static void Main(string[] args)2{3	string text = "This is a long text to be searched";4	// A word to search within "text".5	string searchWord = "long";6
7	int nextIndex = GetIndexAfterFoundWord4(text, searchWord, new KmpSearch());8	Console.WriteLine("Result of GetIndexAfterFoundWord4 using KmpSearch  = " + nextIndex);9
10	nextIndex = GetIndexAfterFoundWord4(text, searchWord, new SlowSearch());11	Console.WriteLine("Result of GetIndexAfterFoundWord4 using SlowSearch = " + nextIndex);12}

Both of them return the same result using different search algorithm.

Result of GetIndexAfterFoundWord4 using KmpSearch = 14 Result of GetIndexAfterFoundWord4 using SlowSearch = 14

I've implemented this demo in a console application so the method is static and accepts all parameters. But if GetIndexAfterFoundWord4 were a class method, you can inject ITextSearch into a method using a Strategy Pattern.

Last Refactoring

1public class IndexIterator2{3	private readonly ITextSearch _textSearch;4
5	public IndexIterator(ITextSearch textSearch)6	{7		_textSearch = textSearch;8	}9
10	public int GetIndexAfterFoundWord(string text, string searchWord)11	{12		int[] found = _textSearch.Find(text, searchWord);13		return found.First() + searchWord.Length;14	}15}

You can either mock out ITextSearch object instance in a test using a mock or pass an instance using a Dependency Injection using a framework like Ninject.

After reaching the final step, GetIndexAfterFoundWord is

easy to read and understand the logic
easily testable
easy to find bugs
easily extendable
operating on the same level of abstraction

And lastly, it does one thing and one thing well (Single Responsibility Principle).

Conclusion

After series of refactorings, we achieved, more modular, reusable code, and also easily testable.

I hope you can now see why refactoring helps improving legacy code.

If you want to know more about refactoring, check out Martin Fowler's book, Refactoring Improving the Design of Existing Code.

The code is available on GitHub.