What exactly is a modern data warehouse?

G’day,

I was asked recently what a modern data warehouse is – and that’s a vey thought provoking question.

First, there’s the tag “Modern” – I think what’s modern now may not be quite so modern in a year or twos time – maybe sooner, judging by the speed of some technology changes at present.

My initial response – jokingly – was ‘it depends who you ask’, which is a pretty useless – but true – answer.

Firstly let me put some context around my answer. I’ve built my career around Microsoft technologies – so my answer is going to centre around their latest technologies – which are going to be all cloud based, and essentially incorporate Platform-as-a-Service (PaaS) functionality. Your experience may be different, you might be in a different cloud, you might have different ideas – but for me it’s PaaS Services in the Azure cloud. I’ll expand on this shortly.

But let’s (quickly) look back on how I’ve seen data warehouses develop – in my Microsoft world – over the years. They used to be simply built on SQL Server, incorporating fact and dimension tables. There’d be debate about star and snowflake schemas, about kimball or Inmon. More recently people talk about Data Vault. Essentially, people pulled data from their operational (transactional) system into their data warehouse (decision support system or Enterprise data warehouse or whatever they chose to call it).

They did this using some Extract, Transform and Load (ETL) tool (commonly called a data integration tool) like SQL Server integration services.

They might have had a single data source (one transactional system) or they may have had more data sources – so firstly staging the data made sense, then cleaning it and then putting it into the main data warehouse.

We’d have to deal with things like slowly changing dimensions and very often we may not have modelled data exactly as it is in the real world.

We might put analytical models over the data warehouse, initially these were multi-dimensional (MDX) but later Tabular. Then we’d either report straight off the data warehouse or the analytical model – using tools like SQL Server Reporting Services or – lately – Power BI.

For me, things have moved to the cloud and our options have expanded (or should I say exploded)

We can take data from literally any data source and land it in our Azure data lake. This area can be segregated, as it likely contains our raw data. we can now keep this – untouched – as space is literally no longer an impediment (ok, it still costs money, but it’s cheap, and we can have different degrees of storage (hot / cold /archive) and we have to trade off access times with cost. We also don’t have to worry about upgrading and migrating data at any time – this is all part of the service provide by a modern cloud.

We can then start segregating the data into different areas (of the data lake), say a staging area, a clean area, and maybe an area full of curated data sets.

We can pull in different types of data in different formats – CSV, JSON, AVRO, Parquet, and more.

Once we have the data in out desired area we can then pull it into a traditional data warehouse – say in Azure SQL Database for instance (although there maybe better cloud solution). We can use Polybase to virtualise the data.

We could also use Synapse, CosmosDB or maybe databricks.

We can build Machine Learning Models and enable Artificial Intelligence services.

Basically, I think the ‘modern’ data warehouse is about cloud based data services and simply adds to the traditional models of data warehousing.

Whereas the traditional data warehouse is about historical reporting, such as diagnostic and descriptive analytics – the new tools enable predictive and prescriptive analytics along with cognitive capabilities.

So, basically the answer to ‘what is the modern data warehouse?’ could be a lot of things, pieced together in many different ways. and we’re only going to get more as time marches on.

So maybe we should start talking about the changing data warehouse, or the evolving data warehouse or the cloud data warehouse – but not the modern data warehouse, as what’s modern today is sure to change tomorrow.

Anyway, these are just my thoughts. Some will agree, some won’t and they’ll be plenty of diverse ideas – that’s great for innovation. Um, maybe the innovative data warehouse is a better name 🙂

Anyway, have a great day

Cheers

Martin.

96 GB lost (or so I thought)

The other day I was looking to store some files, so I went and found myself a 128 GB USB 3.1 flash drive.

Upon inserting this into my computer (initially the wrong way up, and then correcting myself, as is the accepted method) – I was amazed to see it only registered as 32 GB

wow, I thought this was 128 GB

Whatever, I thought. I’ll just reformat this to the correct size.

Only 32 GB of space yet the drive had 128 GB firmly stamped on the back.

But no matter what combination of file systems that I chose, I could not get this to recognize more than 32 GB

no matter what file system I chose to format the drive with, I could not get it to recognize more than 32GB of space

I soon realised that because it was FAT32 perhaps this was the reason. But how to reclaim the apparent lost space.

Over to Disk Management, where I seen

So, I found the ‘lost’ space by using disk management.

I quickly deleted the 32GB Volume.

The 32 GB volume was deleted.

To reveal the full quota of unallocated space

So, I found the full quote of space

and I allocated that as a simple volume and re-formatted the drive as NTFS.

I found a few things out on that journey.

I could create multiple volumes of the same flash drive and each volume registers as a separate drive in Windows explorer.

But if you want to use the flash drive for something like re-imaging your surface – then it’ll have to be formatted as FAT32. Then you’ll be back to square one again and wondering where your space disappeared to.

I hope this saves somebody some time at some point.

Have a great day.

Cheers

Marty