Applying Scientific Method to Software Development
by Graciele Damasceno
Sometimes (actually, way often) it’s easy to get stuck in a programming problem while coding, debugging or solving a problem. Especially during early stages of seniority, people tend to approach a programming issue (such as fixing bugs or running tests…) heads-on by assuming a probable solution, trying it out, and checking whether the problem was solved or not repeatedly. In that sense, applying scientific methods to this process of problem-solving can be beneficial in engineering.
What is the Scientific Method?
The Scientific Method is an empirical technique used to construct a scientific hypothesis. Those fancy words mean: using logic, we build a hypothesis to explain a fact. This method involves observation based on skepticism (since cognitive assumptions can distort interpretation of observations). After observing, questions are raised and answers are collected through tests and experiments. This methodology is useful across various fields, including scientific research, natural sciences, and as we are going to see below, even engineering!
This technique consists of the following steps:
- Observation/Question: The scientific method begins with a question about something.
- Research Topic/Background: Before starting to find a potential solution, research is conducted to learn more about the subject and collect references.
- Hypothesis: Create an educated guess of the expected answer of the proposed question.
- Test the Hypothesis with an Experiment: Check if your hypothesis is accurate by conducting a fair test, changing one factor at a time while maintaining the same conditions to verify the result.
- Analyze Data: Analyze the collected data to determine if they support or refute the hypothesis.
- Draw a Conclusion: Determine whether the hypothesis is acceptable based on the data analysis.
Those steps are conducted in a controlled and monitored environment, meaning that we are capable to observe potential variables that can interfere with the result. Those variables can be:
- Control variables: Also known as “constant variables”, those are variables that are kept the same through the experiment.
- Independent variables: Variables or conditions that can be tweaked to affect the result of the experiment.
- Dependent variables: Those variables are measured or observed on how they respond based on changes made to the independent variables.
OK, but what does it have to do with Software Development?
You might be wondering what the scientific method has to do with software development. At first glance, it may seem like a mismatch - after all, science can be argued to be about studying the natural world, while coding is about writing… well, code. But think again! The principles of the scientific method are surprisingly relevant to the daily work of software developers. Let’s see how.
Error Analysis
Using techniques of the scientific method to identify and solve bugs in a code is fairly useful. Applying the steps above:
- Identify a issue or bug to be investigated. This represents the hypothesis. For example, “The program crashes unexpectedly”.
- If needed, do some additional research around the tools, languages, configurations and/or frameworks used on the program in question. This will give you more grounds to cover when designing an experiment.
- Design a controlled experiment to test your hypothesis. In the hypothesis above, you can write unit testes or create a reproducer for the bug. The objective is to collect more information about the issue and rule out possible variables that may interfere with the result.
- Execute the experiment (e.g., execute the program in order to observe the crash). Collect the results, potential unexpected behavior, variables…
- Analyze the data, trying to correlate the result with variables.
- If necessary, repeat the experiment, changing variables and/or the environment. When the potential culprit is found, the hypothesis can be solved.
We might perform intuitively a process similar with the above when solving a problem and that’s great! Explicitly (and consciously) applying those steps can improve the accuracy of the problem-solving.
Hypothesis-Driven Development
What if we treated each feature or bug fix as a hypothesis, and then designed our development process around testing and verifying that hypothesis?
This is the concept known as Hypothesis-Driven Development (HDD).
Similar to the scientific method’s steps pointed above, we start by defining a clear hypothesis, such as “Implementing this feature will improve user engagement by 30%”. After that, a plan to test the hypothesis is created, such as:
- Implementing and deploying x, y and z as key points of the feature;
- Conducting A/B tests to compare the new feature with previous behavior;
- Gathering data on user interaction, performance metrics, business metrics… With the plan created, the changes are implemented, and the experiment is executed. In some cases, a simulation might be executed to conduct the experiment, or the new feature can be used by a small portion of users for a specific period of time. After that, the results are analyzed and the hypothesis is validated: if the results were achieved the hypothesis was confirmed and can be adopter, otherwise, new rounds of experiment can be conducted with new hypothesis.
This way of development might look similar to Test Driven Development (TDD). However, keep in mind that HDD focuses on developing features or bug fixes based on specific hypotheses and how they will improve the system or user experience, while TDD emphasizes writing unit tests for individual code changes to ensure correctness and reliability at a smaller scale.
Experimentation with Different Algorithms
We can also use the scientific method to compare different algorithms for solving a particular problem - and also determine which one is most effective. With a hypothesis like “The algorithm A will outperform the algorithm B in terms of accuracy when used to predict credit risk”, we can design an experiment comparing both algorithm options using similar input data, evaluation metrics and subject as control data. It’s also important to use a controlled environment, to avoid external factors to influence the results.
After running the experiment, the performance of each algorithm is collected and analyzed to determine which algorithm is the best as suggested by the hypothesis. We can also expand and refine the hypothesis, identifying limitations or biases in the experiment and proposing alternative explanation for the results (e.g., “The algorithm A might perform better due to its advanced ability of handling imbalanced data”).
…and many more!
There’s many other interesting problems that can benefit from using scientific method, such as:
- Code Review, when evaluating effectiveness of different solution to solve a specific problem
- Feature Prioritization, when determining which feature has the greatest importance and impact given certain constrains
- Technical Debt Analysis, using metrics to determine which technical debt when solved will have the greatest positive impact in the project
The benefits of using those techniques to solve problems is being able to approach problems - and, as shown, not only scientific prepositions! - with a systematic data-driven approach, isolating underlying issues and having a more rounded knowledge of the problem in question. Try it out!
Further Reading
Subscribe via RSS