What exactly is a modern data warehouse?

G’day,

I was asked recently what a modern data warehouse is – and that’s a vey thought provoking question.

First, there’s the tag “Modern” – I think what’s modern now may not be quite so modern in a year or twos time – maybe sooner, judging by the speed of some technology changes at present.

My initial response – jokingly – was ‘it depends who you ask’, which is a pretty useless – but true – answer.

Firstly let me put some context around my answer. I’ve built my career around Microsoft technologies – so my answer is going to centre around their latest technologies – which are going to be all cloud based, and essentially incorporate Platform-as-a-Service (PaaS) functionality. Your experience may be different, you might be in a different cloud, you might have different ideas – but for me it’s PaaS Services in the Azure cloud. I’ll expand on this shortly.

But let’s (quickly) look back on how I’ve seen data warehouses develop – in my Microsoft world – over the years. They used to be simply built on SQL Server, incorporating fact and dimension tables. There’d be debate about star and snowflake schemas, about kimball or Inmon. More recently people talk about Data Vault. Essentially, people pulled data from their operational (transactional) system into their data warehouse (decision support system or Enterprise data warehouse or whatever they chose to call it).

They did this using some Extract, Transform and Load (ETL) tool (commonly called a data integration tool) like SQL Server integration services.

They might have had a single data source (one transactional system) or they may have had more data sources – so firstly staging the data made sense, then cleaning it and then putting it into the main data warehouse.

We’d have to deal with things like slowly changing dimensions and very often we may not have modelled data exactly as it is in the real world.

We might put analytical models over the data warehouse, initially these were multi-dimensional (MDX) but later Tabular. Then we’d either report straight off the data warehouse or the analytical model – using tools like SQL Server Reporting Services or – lately – Power BI.

For me, things have moved to the cloud and our options have expanded (or should I say exploded)

We can take data from literally any data source and land it in our Azure data lake. This area can be segregated, as it likely contains our raw data. we can now keep this – untouched – as space is literally no longer an impediment (ok, it still costs money, but it’s cheap, and we can have different degrees of storage (hot / cold /archive) and we have to trade off access times with cost. We also don’t have to worry about upgrading and migrating data at any time – this is all part of the service provide by a modern cloud.

We can then start segregating the data into different areas (of the data lake), say a staging area, a clean area, and maybe an area full of curated data sets.

We can pull in different types of data in different formats – CSV, JSON, AVRO, Parquet, and more.

Once we have the data in out desired area we can then pull it into a traditional data warehouse – say in Azure SQL Database for instance (although there maybe better cloud solution). We can use Polybase to virtualise the data.

We could also use Synapse, CosmosDB or maybe databricks.

We can build Machine Learning Models and enable Artificial Intelligence services.

Basically, I think the ‘modern’ data warehouse is about cloud based data services and simply adds to the traditional models of data warehousing.

Whereas the traditional data warehouse is about historical reporting, such as diagnostic and descriptive analytics – the new tools enable predictive and prescriptive analytics along with cognitive capabilities.

So, basically the answer to ‘what is the modern data warehouse?’ could be a lot of things, pieced together in many different ways. and we’re only going to get more as time marches on.

So maybe we should start talking about the changing data warehouse, or the evolving data warehouse or the cloud data warehouse – but not the modern data warehouse, as what’s modern today is sure to change tomorrow.

Anyway, these are just my thoughts. Some will agree, some won’t and they’ll be plenty of diverse ideas – that’s great for innovation. Um, maybe the innovative data warehouse is a better name 🙂

Anyway, have a great day

Cheers

Martin.

Is NoSQL still a thing?

G’day,

Looking back, perhaps one of the (many) mistakes that I’ve made in my career was to initially ignore the N0SQL movement.

I’ve focused my career on Relational Database systems – predominantly Microsoft SQL Server – and so using a NoSQL database felt ‘wrong’, for a number of reasons.

Firstly, the name itself – NoSQL – felt like the message was that the SQL language was in someway being “devalued” and ‘kicked to the kerb”

Secondly, the lack of transaction support in NoSQL databases also felt wrong. I assumed SQL implied relational.

I think now-a-days it’s probably better to think of SQL as simply a language to manipulate data (SELECT, INSERT, UPDATE, DELETE etc). (Event if this is a deviation from the technical ANSI definition) Nearly all vendors have their own propriety extensions (Microsoft have T-SQL, Oracle have PL/SQL, Sybase also have T-SQL but it is diverging from the Microsoft language of the same name where it originally has its roots) and pure ANSI standard SQL is hard to find.

We see lots of database (relational or not) using derivatives of SQL.

We also see relational databases using non-relational functionality and non-relational databases starting to use relational functionality.

And so, both relational and non-relational have their place in today’s world and so I wonder how long the term NoSQL will continue to be used.

Quite sometime ago I started to look at CosmosDB – which a lot of people still talk about in 2020 as being the ‘new kid on the block’ – but it’s been evolving for 10 years now and is gaining a good foothold in the market.

If you’ve not seem it before it would be worth a look and although it shares some familiar terms with relational database it is not one and approaching it with the same mindset may land you in a mess – but once you start to get to grips with how it operates and the different models it employs I think it’ll start to grow on you.

Have a great day.

Cheers

Martin.

Moving to a Modern Microsoft Platform

Melbourne Event – Monday 17th June 2019 (Free – including lunch and prizes)

PASS, in conjunction with Microsoft, Intel and community leaders are offering free training on Moving to a Modern Microsoft Platform

In case you’ve never heard of PASS, they are a not profit organisation run by, and for, a diverse data community.

PASS (formerly known as The Professional Association of SQL Server) focuses on supporting data professionals throughout the world who use the Microsoft Data Platform.

You can read more about PASS on their site

Microsoft SQL Server has been around for some 25+ years and there is a major release coming up later this year – SQL Server 2019.

Another thing that is happening this year – in fact very soon – is that Microsoft SQL Server 2008 (which includes SQL Server 2008R2) is moving out of extended supported.

So, if you are using Microsoft SQL Server 2008 / 2008R2 you’ll need to prepare for SQL Server 2008 End of support

It’s 10 years since that version was released and it was one of the most popular with a lot of organisations still relying on keeping one of their prized business assets on the 2008 platform.

To ensure that organisations stay current and to help them with some of the options that are available, PASS, Microsoft and Intel have teamed up to bring users around the globe a free in-person event in many cities to provide advice, and support to ensure that they fully aware of all options to keep their users data safe, secure and highly available in an ever changing data-driven world.

The events throughout the world are listed here.

And if you would like to register for the Melbourne event – that will run in the CBD on Monday 17th June 2019 at

Telstra Customer Insights Centre
Level 1, 242 Exhibition Street
Melbourne, VIC 3000
Australia

then please head to the EventBrite page and Register for FREE – we’d love to see you there, learn about your pain points, priorities and talk through your next steps on your data-driven journey to keep your clients data safe, secure and highly available to provide the insight that a modern business needs.

They’ll be information sessions, hands-on-labs (if that’s your thing), expert instruction and of course prizes.

There’s no charge and lunch will be provided.

We look forward to meeting you!

Speaking at Events

G’day,

I recently got the opportunity to speak at the “PASS Marathon, The New World of Data Privacy”.

I had not spoken at a virtually event for quite some time and I’d forgotten how different it is to a live presentation.

There’s pros and cons to both live presentations and virtually presentations, but for me, the main difference is having audience interaction.

At a live event you are able to see the faces in the audience and constantly gauge how you feel things are going – and if appropriate change on the fly. You can also take questions ‘in real time’.

However, with a virtual event you get to ‘stay on script’ the entire time and just carry on, not really getting a gauge of how your presentation is ‘going down’.

I’d like to thank everybody who gave me feedback and take the opportunity to encourage viewers of such events to provide feedback – in as much detail as you feel comfortable with. For example if you didn’t like an element of the presentation then please say why (not just that you didn’t like it please) – this helps people to improve.

I’d additionally like to encourage audience members who have not spoken previously to consider doing so. If a virtual presentation is not the place to start for you, then consider heading along to a local SQL Saturday event – there’s always a call for speakers before hand.

There’s plenty of people to help you along on your presentation journey at these events and also in the wider SQL community.

These’s also a lot of online courses aimed at presenting and finding one of those might be a good starting point for you.

I hope to be viewing your presentation soon.

Have a great day.

Cheers

Martin

Azure Notebooks – a nice little tool.

Gday,

I’ve been playing with Azure notebooks lately and have found them invaluable as

  • A teaching aid
  • A test sandbox

I’ve been using Python code, and this means that I don’t have to install Python on the machine that I am working at.

It also means that anybody I am demonstrating my code to has only to open a web page that shows my Azure Notebook.

So – how do you get to Azure Notebooks I hear you ask.

Head along to https://notebooks.azure.com and log in with your Microsoft account.

Once there you’ll be able to start a new project

Give in a name, a URL and decide if you want to make this public or not

Create a Project and Share (if you want)

Start the Azure Project up

Start the Project up

Add a Notebook to the project

Add a Notebook

Choose the language, R and Python are available – but also F#

Choose the language for the Azure Notebook.

And then have fun!

I’d really love to see more languages added in the future. These notebooks have potential – on many levels – and they are also FREE!

Have a great day!

Cheers

Marty.

A lack of foreign keys – a bad culture example.

G’day,

One of the things that I’ve been seeing more often than I would like to lately is large databases with no foreign keys – or minimal foreign keys (and often with those minimal set of keys disabled)

By large databases I’m roughly meaning databases with several hundred tables, and I usually see a lot of these tables with several hundred GB’s of data in them.

When I generally ask about the reason for no foreign key, I’m told

  1. they add  overhead
  2. they give no benefit
  3. we can’t enter our data properly when we have them

The last one in the above list is generally down to poor modelling – an example being entering a later part of a financial transaction before the first part of the financial transaction has taken place. Once both parts of the financial transaction have taken place then the database is left in a consistent state – BUT, that generally being handled by the app NOT the database – OUCH!

My first though here is “I’m glad I don’t use this company” – and I also wonder why auditing or testing has not picked it up. Quite often it has but somebody, somewhere has decided that there’s a reasonably low chance of that occurring and so no action has been taken – until the ‘bug’ is discovered sometime later and at that point takes a long time – and a lot of money – to fix.

Points 1 and 2 are quoted more often than I would like also – usually stemming from statements that were made by somebody who worked with a long dead version of the database and didn’t quite take the time to understand fully how relational databases work – but who had tremendous influence in the company and a reputation that meant they were not to be crossed. (they have since been proved wrong but have failed to accept it)

These are simply examples of now bad culture can lead to problems years down the road.

Looking at a lot of database that lacked foreign keys, it’s usually pretty simply to find examples of inconsistent data in the tables. – caused by the three issues above.

The moral of the story here is

Always question, always learn and really try to avoid issues that are bound to come back and haunt you.

Always try to build good culture.

Have a great day.

Cheers

Martin.

 

 

 

 

 

Welcome to my new home.

G’day,

I initially set up a blog back in 2010 and wrote semi-regularly until I started back in higher education.

Then, time seemed to be at a premium and I gradually blogged less and less until one day I completely stopped.

Well, now its time to pick my writing up again – and so here we are at a new home

I generally try to syndicate my blog to SQL Server Central – as nearly all of my stuff is data related – generally about the Microsoft Data Platform.

I might bring my historic content across at some stage – but if you are interested then you can find it all on sqlservercentral.com

I hope you might find stuff here interesting.

Have a great day.

Cheers

Martin.