Sunday, August 28, 2011

Creating Time-Series Atoms - Quantum “Atoms” of Data

Quantum "Atoms" of Data

This article explains how to create quantum "atoms" of data to avoid a paradox ofuncertainty on systems that have distributed transactions or human data entry.

Banks Do It All The Time

If you go to your online banking application, you'll notice that a number of recent transactions are in a "pending" state. Since the global banking system distributed in a way where there is no unified order in which events happen, transactional order can be interpreted in different ways. Banks tend to use selective interpretation to give the customer the benefit of the doubt or to charge extra penalties depending on the nature of the account and how they want to treat their customers.

It Applies To All Time Series Data

Unless your system gives a unique identifier or timestamp to all data with perfect accuracy, the uncertainty will exist in your data. Most of it is from human data entry. Unless they synchronize their watches down the microsecond and enter data with that level of accuracy, there will be some uncertainty in terms of what order things happened in. To solve this, we will create an "Atom" of time-series data at a specified level of granularity. For most applications, daily granularity is sufficient.

Creating Atoms of Data To Remove Quantum Uncertainty

Quantum physics teaches us that the nature of subatomic particles in inherently uncertain. Our world stops being as real as we'd like and we can no longer observe with absolute precision. We actually run into he same issues in our data model when multiple events supposedly occur at the same time. When we have credits and debits that claim to be happening in the same microsecond, our aggregation becomes ambiguous.
There is no correct order to put the transactions in and all permutations of order are equally valid. Sometimes we get lucky and the order in which the data was entered works itself out but there are always exceptions to the rule.
If you take a look at your online banking app, they deal with this by putting transactions in a "pending" state until the end of the day. They also do this to maximize or minimize the interest charged depending on their objective. You probably have a friend who has paid extra penalties when they thought they put their deposit in on time.
Rather than play this game, lets remove the uncertainty by atomizing our data. How do we do this? We will pick the smallest possible element in our date dimension and have our data model reflect the net changes for that period. For most systems, daily granularity is sufficient so let's pick that. Essentially, we are saying combining all invoices and payments for each day into a single record that we will use for deeper analysis. The resulting view gives us one record per customer for every date that has either an invoice or payment. Every record in this view is an "atom" of data that can not be reduced anymore without inviting ambiguity.

Making It Work With Northwind

Let's this example is easy to work with, let's the Orders Table from the Northwind sample database to create a view that simulates a dataset with credit and debit transactions. You could create a similar view on nearly any database. This query will use the OrderDate to simulate invoices.
2CustomerID AS Customer,
3OrderDate AS TransactionDate,
4Freight AS Invoiced,
5AS Paid
Next, Let's create a view called Transactions to simulate the data we need form Northwind. You could create as similar view in your system.
1CREATE VIEW Transactions AS
2SELECT 'Invoice' AS TransactionType,
3CustomerID AS Customer, OrderDate AS TransactionDate,
4Freight AS Amount
5FROM Orders
7SELECT 'Payment' AS TransactionType,
8CustomerID AS Customer,
9ShippedDate AS TransactionDate,
10-Freight AS Amount
11FROM Orders where ShippedDate IS NOT NULL
This gives us a nice, clean view to work with.
Izenda table


Now we simply do a SUM of the amount and GROUP BY the Date to compress all the data for each date into a single "atom" of data for each customer.
1SELECT Customer, TransactionDate,SUM(Amount) AS Net FROM TransactionsGROUP BY Customer, TransactionDate Each Date now has a single Net amountof change that you can work with.
Creating an "atomic" view would result in a high performance data source that guarantees consistency and eliminates a paradox of uncertainty around how to interpret the order of transactions in a given day.
We now have a view that we can plug into a rich reporting experience like Izenda.

To download a full evaluation of Izenda Reports and Dashboards, click here.

Shahzad Afzal

No comments:

Popular Posts