How dummy data is generated in Alteryx

In this blog I’d like to show another way to generate a set of dummy Data in Alteryx. You can also refer to my colleage’s blog, which is about Mockaroo, a very user-friendly dummy data generator software. This task can sometimes come up as the requirement to test the feasibility of any business project or the technical function of any feature in any software.

Let’s begin as usual with the task requirement. A sample of dataset is as below, which contains 30 rows data of the sustainable alternative measures (activities), the year the experiment was conducted, the Amount of CO2 Emission in Tons that a household would consume less by application of the measures, the percentage if the emission reduction contributed to Private, Company, Investor or Government Sector.

What we want to do is generate a continual data for 20 years dated back to year 2000, each of which contains the 29 above mentioned activities. The difficult part is to replicate a repeating random dataset which takes decimal values between -50 and 8 (sAmount). The negative values are of course some rare cases and should not be distributed evenly in the set. The other columns are assumed to be the same.

That is going to be a creation of another 580 rows of data. What you can do is connect the existing excel file to input data tool in Alteryx and start the workflow!

Creating another 580 rows for 29 activities in the last 20 year requires the Generate Row Tool. In the configuration window, just tick on the button update existing field (iYear). Remember to set the condition to year 2000. Because Alteryx will start counting from the existing year which is 2020, let set the loop expression as iYear – 1 so that it is going to count reversely.

After that, we get the following dataset which contains 609 rows (including the existing 29 rows). Task 1 is done!

Task 2 : you need to sort the year in descending order so that our task can be ordered correctly as required.

Task 3: Create an amount for C02 emission reduction with restrictions. Fortunately, the random function in Formula Tool would help us achieve this.

We want to keep the sAmount of the year 2020 unchanged. The RAND function returns a random decimal value or number between 0 and 1. In order to return the values within a particular range you need to fixed the largest possible outcome which is 7 and the minimal value which is -50. A good way is to factor the existing field sAmount with RAND() into the formula. So the maximal outcome could be generated could be 7*1 =7, however we would like it to be maximal of 8, therefore add 1 to the whole thing above. Applying formula to test the minimal end: -50*1 +1 = -49, which meets the condition (greater than -50). After that just wrap the whole thing inside the round function and make sure the value has 1 decimal digit to get this:

Round ([sAmount]*RAND() +1, 0.1)

The workflow looks very simple:

I hope this trick would be helpful for you, have fun to play around with dummy data!

Author:
Tram Nguyen
Powered by The Information Lab
1st Floor, 25 Watling Street, London, EC4M 9BR
Subscribe
to our Newsletter
Get the lastest news about The Data School and application tips
Subscribe now
© 2024 The Information Lab