In Data Analysis, data analysts usually work with the boolean data type. The return value for the boolean data type is True or False. However, I often use 2 bits 1 and 0 instead of using True or False for boolean type. In this blog, I will summarize some reasons to explain why I love using 1 and 0 in preparing data.
1/ Saving time with Logical Operator
For some complicated problems, we usually need to use multiple conditions to filter the data. If we have a problem with more than 5 or 6 conditionsto filter, how can we track the result as we want?
For example: If we have multiple conditions below, how do you check the result in a short time?
(True OR False) AND (True AND False) AND True OR ( True AND True)
Instead of using True or False, I will use 2 digits number 1 for True and 0 for False. Then, I replace OR by addition (+) and AND by multiplication (*). So I will have a new equation like this:
(1 + 0) * (1 * 0) * 1 + (1 * 1) = 1 * 0 * 1 + 1 = 1
It takes me less than 1 minute to solve that equation. The result is 1 which means the result returns True. To get that equation, I used logical gate for OR and AND.
The result of the logical OR Operator is same as the result of an addition equation. However, there is 1 special thing when adding 1 with 1. The binary number only represents in 1 or 0, so 1 Or 1 = 1.
The result of the logical AND Operator is same as a product equation.
By switching TRUE and FALSE values into 1 and 0, I can:
- Solve the problem faster
- Easy to track the condition
- Easy to debug
2/ Easy to summarize the total result
Assume that you have a dataset with measure values like this:
Now, I would like to summarize all students whose grade greater than 80. So, I need to create a new field to check if the grade is greater than 80, then return True. Otherwise, it returns False.
I will solve that problem in 2 ways. My first solution is using Boolean data type to return True or False. My second solution is using 1 and 0 values to return.
a) Using Boolean data type (return True or False)
I need 4 tools to solve that problem. I need a Formula tool to check if Grade > 80, then return True; otherwise, return False. Then, I use a Filter tool to get only records which have True values in the [Passed ?] column. Finally, I use Summarize tool to count how many records.
b) Using binary numbers (1 and 0)
I only need 3 tools to solve the problem. In Formula tool, I use If function to return 1 if Grade is greater than 80; Otherwise, return 0. Then, I use summarize tool to Sum all values in the [Passed ?] column. I also get the same answer. However, the workflow will be shorter. It leads to the next advantage of using 1 and 0.
3/ Saving space and increasing performance
If we use the binary numbers to return the value, we will use fewer tools than the other way (Prove in part 2). It affects to the size of the file.
In the image above, the file size of 1_0 workflow is smaller than boolean_workflow 1 KB. If we work with a big dataset, we could save much space in the physical storage or in cloud. Also, it will help to increase the performance. The program will run faster and smoother when debugging or testing.
4/ Applications of Binary Numbers
Computers use binary numbers and logical gate to operate the system. The system encoded the characters in 1s and 0s. In some machines, 1 represents ON and 0 represents OFF.
In Machine Learning, there is a One-hot encoding method to encode the category into binary numbers.
Only 2 digits 1 and 0 but they are powerful to help me solving the problem or tracking the long condition. Besides that, they also help to save the space and increase the performance of the program. I hope after this blog, you could try apply those digits into solving the problem or challenges.
See you in the next blog!