Adam's blog: How I found (and fixed) a critical bug a day before official launch day

01 Jun 2022, 1888 words

A day before launching the new online seminar Jump-start the FI (jump@fi for short), I have been doing some final testing that everything is OK and clear to launch. This new online seminar is expected to be officially advertised to the hundreds of newly accepted students of the Faculty of informatics at Masaryk University, so the stakes for making it behave correctly from the start are pretty high. Fortunately, about forty people have already tested the entirely new web frontend, so the assurance that nothing can go wrong was also relatively high. Unfortunately, everyone (me including) has managed to miss a critical bug in the application.

How does jump@fi work?

Jump@fi is an online seminar whose main components are tasks with tests that its participants can solve. These tasks are presented in the form of a directed acyclic graph, where the participant has first to solve all tasks with an edge pointing to the task before it is unlocked. So you would guess that this part is critical-bug-free as it was tested the most. You would be wrong.They are going to be stuck if someone does not fix it, which is certainly not the best start for an activity officially advertised by the Faculty of Informatics.

The bug

To take a look at the bug, let’s visit our test environment. As seen in the picture below, we have a greyed-out egg-like task in the second row with an edge pointing to it from the turtle task.

The initial state in the test environemnt

This arrangement means that the participant must first solve the turtle task before the egg-like task is unlocked (and is not greyed out anymore). Easy, let me just solve the turtle task, and we can move to the egg-like one.

The bug itself - first task is solved, but the second one is not open

Oh no. I have solved the turtle task (as indicated by the green check), and now I want to move to the egg-like one, but it is still locked! This is an action that most participants will perform and end up with a locked task, even though it should be opened by now. They are going to be stuck if someone does not fix it, which is certainly not the best start for an activity officially advertised by the Faculty of Informatics. We have found our critical bug, just a day before offical launchday.

The cause

After investigating the issue, I have discovered that the source of this issue is a logical mistake made when designing the task icon component. This is the code responsible for updating the task icon:

@Input()
set task(val: TaskWithIcon) {
  // subscribe to the task so that task info is updated on its solve, etc.
  this.task$ = this.tasks.getTask(val.id);
}

As you can see, this code takes care of updating the task icon whenever the task changes – this is how the green checkmark appeared in the turtle task icon. Do you see where the logical mistake is? The task itself is not updated; only its prerequisite is, and nobody asked the task icon to watch for changes of that.

The solution

The solution is simple – just watch for the changes in prerequisites of the task. The only problem is that when a task is solved, the user is on another page, so the icon of the connected task does not exist at the moment. How can something watch for anything when it does not exist? To our advantage, the Angular with RxJS comes to the rescue because it internally remembers the component state when we come back to the graph page after solving the turtle task. Let’s edit the task-watching observable to account for the prerequisites changes:

@Input()
set task(val: TaskWithIcon) {
  // watch tasks requirements for state change
  const requirementsIDs = Utils.flatArray(val.prerequisities);
  const requirementsState: { [requirementId: number]: string } = {};

  const requirementsChange$ = combineLatest(requirementsIDs.map((watchedTaskId) => this.tasks.getTask(watchedTaskId))).pipe(
    filter((tasks: TaskWithIcon[]) => {
      const changed: number[] = [];

      for (const requirement of tasks) {
        // compare states for changes
        const savedState = requirementsState[requirement.id];
        environment.logger.debug(`[TASK] requirement ${requirement.id} of task ${task.id} was ${savedState} and now is ${requirement.state}`);
        if (savedState !== requirement.state) {
          requirementsState[requirement.id] = requirement.state;
          if (savedState !== undefined) {
            changed.push(requirement.id);
          }
        }
      }

      const isChange = changed.length > 0;
      if (isChange) {
        environment.logger.debug(`[TASK] requirement of task ${val.id} have changed their state: ${changed}`);
      }
      return isChange;
    })
  );

  // refresh task's data when its requirements change
  const refreshTaskCache$: Observable<boolean> = merge(
    requirementsChange$.pipe(mapTo(true)),
    this.tasks.getTask(val.id).pipe(mapTo(false))
  );

  // subscribe to the task and so that task info is updated on its solve, etc.
  this.task$ = refreshTaskCache$.pipe(
    mergeMap((refreshCache) => this.tasks.getTaskOnce(val.id, refreshCache))
  );
}

Wow. What an ugly piece of code. It works, but it could use a little separation into multiple smaller functions. Let me move the requirementsChange$ Observable to a separate function, away from the task icon component to the Task service, which is already responsible for watching for the changes in the task itself.

// TASK SERVICE
/**
 * Watches for changes in the state of the requirements of the task
 * @param task task to which requirements watch for
 * @return Observable<true> emits whenever the state the requirements has changed from the initial state
 */
public watchTaskRequirementsStateChange(task: TaskWithIcon): Observable<true> {
  // watch tasks requirements for state change
  const requirementsIDs = Utils.flatArray(task.prerequisities);
  const requirementsState: { [requirementId: number]: string } = {};

  return combineLatest(requirementsIDs.map((watchedTaskId) => this.getTask(watchedTaskId))).pipe(
    filter((tasks: TaskWithIcon[]) => {
        /* ... same as before ... */
    }),
    mapTo(true)
  );
}
// TASK ICON COMPONENT
@Input()
set task(val: TaskWithIcon) {
  // refresh task's data when its requirements change
  const refreshTaskCache$: Observable<boolean> = merge(
    this.tasks.watchTaskRequirementsStateChange(val).pipe(mapTo(true)),
    this.tasks.getTask(val.id).pipe(mapTo(false))
  );

  // subscribe to the task and so that task info is updated on its solve, etc.
  this.task$ = refreshTaskCache$.pipe(
    mergeMap((refreshCache) => this.tasks.getTaskOnce(val.id, refreshCache))
  );
}

This is much better. Now we can call the service function from the component and have a nicer-looking code. Except for one small thing - it does not work anymore. As I found out, for some reason, the requirementsState dictionary empties itself when used inside the service, while it keeps its content when inside the component. Also, when I tried passing the requirementsState from the component as a parameter to the service function, still it emptied itself every time. Curious.

The final solution

At this point, I have concluded that I have to store the requirementsState in some static cache. But when to decide what and when to store, to prevent unwanted memory consumption? Fortunately, the task service already has a caching mechanism for tasks. We can just expend that for also caching task prerequisites state! It also makes sense that we will not need to watch for changes in prerequisites of tasks that were not seen long enough for it to be exempt from the cache. So does that work?

All tasks are now

Yes! It does! Finally!

Why has nobody discovered this bug before?

The only thing that remains is to answer why has not even one of the forty people (myself included) that have tested this web application noticed this critical bug? These forty people can be divided into three groups:

Note: KSI is another online seminar which uses a slightly modified version of the web application (e. g. different colours)

As a developer of this web application, I have missed this bug, possibly because it would vanish when the page reloads itself, which automatically happens after a source code change. The KSI organizers see all tasks already unlocked, so they were unaffected by this bug. Nevertheless, the last group, KSI participants, are affected by the bug only if they have not already solved all tasks, which is possible because they were given access to the web application in the last part of the KSI year. Oh well.

Conclusion

Just a single day before the official launch day of the new online seminar, I discovered a critical bug in the web application that would make all its participants stuck without the possibility to progress. After four hours of debugging what exactly went wrong and why sometimes a dictionary kept its state and sometimes it did not, I have settled for a solution. The most fun part is that this application was quite well tested, but everyone has missed this bug due to bad timing.