Think like a hacker pt. 3 – Break things!

Data tools like Tableau or Alteryx have made data analysis much more accessible to the average user than they ever have been. Non-technical users can now, with only a few hours of training, start building data pipelines, data sets and dashboards that have required complex coding and data skills in the past . It is still hugely beneficial to know basic development concepts and techniques to make sure whatever you built doesn’t fall over with the next data refresh. You can read part 1 and part 2 here.

In part 1 and 2 we have tried to build our solution as robust and maintainable as possible, now it’s time for the fun part: Let’s try to break it!

Typically testing is done to make sure that the solution behaves as specified in the requirements. It’s a “glass half full” approach – we confirm that everything works as expected. In this article we want to look at it through the “half glass empty” lens – we want to prove that we can break it!

I typically think of it in one of two ways:

  • acting like an upstream system to throw random changes in the data at my solution to see if it copes with it
  • acting like the most inexperienced user, trying to end up in a situation where they either don’t have any idea what to do next or could feasibly walk away with a wrong answer to their question

In both cases, the main thing to keep in mind is: Nothing is too stupid or unlikely to actually happen! I have seen source systems randomly change their reporting from kg to tons without notification, column names change, values within a column might be spelled differently, etc. And if you can think of a stupid thing to do in a dashboard, the likelihood that somebody will actually do it approaches 1, the more users you deal with.

Pretending to be an upstream system

In part 2 we talked about anticipating changes to the data, so – in theory – we should be able to change anything in our data and our solution should be able to cope with it. I hope it is obvious that we should not make changes to our production data, instead you could:

  • use a test environment
  • create a view in your database that allows you to change the data
  • use a tool like Tableau Prep or Alteryx to create a new data source to point your dashboard to
  • export your data into Excel, make the changes manually and use this as the new source

We also need to distinguish between non-destructive and destructive changes. Non-destructive are typically operational changes in the source system, your order process has a new status, you introduced new product categories, you restructured your geographic hierarchies, with those you want to make sure that you have little to no work to adjust your dashboard. Destructive changes are often done on a database level, a table is deleted, a view is refactored and columns change names or are deleted. With those changes you usually cannot prevent a downtime but you want to make sure it is as short as possible, ie. you want to me notified as soon as the change hits your system and ideally you want to be able to pinpoint the root cause of the failure as accurate as possible.

So let’s try it.

1. Changed values (usually non-destructive)

What happens if you add new categories? Try to pick the ones you are actively using in filters or groupings. Do new values behave as intended (ie. appear or not appear depending on requirements)? What happens if you change the dates to a different month or year? What happens if there is a value that is unusually high or low? Could it be cut off because you fixed an axis somewhere?

2. Additional records

We work with databases…of course you expect additional records! But can you handle additional records when you usually expect just 1? Imagine you run a SaaS company that offers one service to your customers. You have a table with all customers and when they subscribed to your service. The only thing you need to do is count the number of records in that table. A few months later you expand your offering and offer a 2nd and 3rd service, suddenly one customer can subscribe to 3 services but that customer shouldn’t be counted 3 times (it’s still the same person). Would your solution cope with that change? In order to fix it, you could use a different table or use the distinct count of customer IDs. A subset of this problem comes up in your aggregation, if you have only one record, your sum, average, minimum, maximum and median are all the same number. Consider the aggregation you would need if there were multiple records and apply it there and then. Even if it doesn’t make a difference at that point, it might in the future.

3. Changed data structure (usually destructive)

Can you add or remove columns without causing problems? Adding is often not a big problem but when you remove columns, will you be notified about it or is the data for the end user just missing? This is typically more applicable in ETL tools like Alteryx than in Tableau.

4. No data at all (destructive or non-destructive)

Typically ETL pipelines are supposed to refresh your data on a regular basis. Are you notified if the source data is the same as it was last week, ie. the data at source was not refreshed as expected? This might not be an issue but depending on how the data is consumed, users may assume they work on refreshed data when in reality it is stale. This could lead to wrong decisions so at a minimum make sure your users always know the age of the data they are working with.

Pretending to be a user

This is the fun part. Imagine you are your most inexperienced user and try to do as much wrong as you can and see if you can deliberately misunderstand your solution.

5. Click anywhere and everywhere

You know the correct flow for your dashboard but at least one of your users might not. What happens if they click on the filter for the last chart first? If they select multiple values? Click on all of your charts and observe how they behave, do they highlight or filter things? Should they? Can you click on charts in a random order and is it clear afterwards what is filtered by which value? Can any action be reverted easily? Are you stuck at some point and the only thing that helps you is a page refresh? If you have the slightest doubt that something could be misunderstood or cause a problem, chances are that it will occur for one of your users in the future. So try to work around it and mitigate this behaviour.

6. Do creative things with your filters

Does it make sense that you can select multiple values in a filter? If you have US revenue data in USD and German revenue data in EUR, should a user be able to select the two countries and get the sum of the two values? (The answer is NO!). Is there an “All” option in your filters? Or can you select values that a user should never look at in this dashboard? As a rule of thumb, in a Tableau production dashboard a filter should never just use the default settings; in my experience that can cause problems in most situations.

7. Try to misrepresent

If your revenue is going up, can you try to make it look like it goes down? Do you have filters in your tooltips enabled and a user could just remove random values in a line chart? Can you just randomly exclude sales from a Salesperson you don’t like? Try to be as blatant as possible. The assumption is not that one of your colleagues will do it to deceive, the more likely scenario is they might do something like this by accident without even realising.

8. Throw everything at it

Doing a few clicks here and there on dev data or refreshing your data on occasion with a sample is one thing, production loads often look very different. What might work well in your tests, might utterly fail if you run it for a while in production, so make sure you test under circumstances that are as close to or the same as production. If you are able to load test, simulate what it looks like if 1000’s of people access the workbook at the same time. If you run a data pipeline, run it at the same intervals on the same volumes as production…and let it run for a day or two. Some problems only appear after a while, when limits are reached and thresholds are broken. Consider options like “hyper care”, where processes and dashboards are in place but constantly monitored to make sure they don’t break in the first few hours or days.

Summary

This part is all about assuming the worst possible set of circumstances and checking how your solution holds up. Of course it’s unlikely that all of these things will happen and you might be confident that your user base knows their way around your tool of choice. Nevertheless, if you are able to break your solution with these tests, you can try to fix the behavior and stay one step ahead of your users. You can never be 100% sure who is using your solutions, so making them as foolproof as possible is the best way to avoid problems down the track. It also makes you a better developer as next time you will be able to anticipate the behavior during the development and work around it early on, without having to fix something later on.

As with everything in this series, common sense and your judgement applies. Not all of the things above need testing every time and at some point you get diminishing returns if you try to catch every last problem that could theoretically occur.


Comments

2 responses to “Think like a hacker pt. 3 – Break things!”

  1. […] sure whatever you built doesn’t fall over with the next data refresh. You can read part 1 and part 3 […]

  2. […] sure whatever you built doesn’t fall over with the next data refresh. You can read part 2 and part 3 […]

Leave a Reply

Your email address will not be published. Required fields are marked *