Azure Data Explorer – Using KQL Magic inside Azure Data Studio

One of my previous blogs looked at using the free resources that Microsoft provides to link to an Azure Data Explorer Cluster and start using Kusto Query Language.

In this post, I want to look at doing a similar thing, but this time using Azure Data Studio (ADS). The cluster that we will link too will be the same one that we previously used when we link to an ADX cluster via the Azure Portal.

Open Azure Data Studio (ADS) and install KQL addin.

KQL Extension in Azure Data Studio

Now select “New Notebook” – then select the Python Kernel

If you don’t have a python installation then don’t worry, as soon as we execute our first command then ADS will prompt you to install one.

I’m going to summarize the instructions in this post and add my take to them, if you want to switch to that post (or use the two combined) feel free.

The first two code cells, just set up KQL for us

Here’s the actual code

!pip install Kqlmagic –no-cache-dir  –upgrade

%reload_ext Kqlmagic

The next part require authentication – but all you require is a Microsoft account (outlook , hotmail ect)

here’s the code cell

%kql AzureDataExplorer://tenant=”Microsoft.com”;code;cluster=’help’;database=’Samples’

When you run the cell, you’ll be given a code and a link to a Microsoft site.

Simply navigate to the link, authenticate using a Microsoft account and type in the code.

You’ll get a message stating everything is fine and then you’ll be able to head back to ADS and carry on with the final statements they will query the database.

Next, let’s run some code

KQL Statement

And the results are

Results from the KQL Statement

Here’s another KQL statement

KQL Statement
Results from the KQL Statement

The %%kql is a “Magic” command that signals to the kernel that what we are about to run is not Python – but Kusto Query Language!

Hence KQL Magic!

Hope this helps somebody

Cheers

Marty

Azure Data Explorer – A great way to learn Kusto Query Language

Log Analytics is a great resource for storing our monitoring data from Azure.

But in order to get the best from your nicely acquired monitoring data, you’ll need a way to investigate it. You can do that in the Azure Portal by clicking around in the UI.

Kusto Query Language is the way to investigate data that is stored in Log Analytics.

We can either experiment in our own log analytics workspace – or we can use the workspace that Microsoft provide for such things.

The Log Analytics Playground

You can find that here

You’ll need to authenticate with a Microsoft account (eg outlook, hotmail)

Personally, I think a great resource for learning Kusto Query Language (KQL) is the Azure Data Explorer (ADX) cluster that Microsoft provides.

You can find that here

There’s a bit more real estate – but you choose 🙂

The ADX cluster is clean and easy to find your way around

Azure Data Explorer (ADX)

There’s enough data in there to run some KQL queries – and get to grips with the language.

If you’re looking to learn Kusto Query language then I’d suggest heading along to this resource.

Hope this helps somebody.

Cheers

Marty

Azure Data Explorer – Setting up a Cluster

I’ve worked with Log Analytics quite a bit over the years and that’s taken me down the road of the Kusto Query Language – or KQL

So, using Azure Data Explorer Clusters seemed the next logical step on that journey.

Log Analytics is great for storing and querying diagnostic data from Azure (or any other infrastructure) – but I decided that I might want to investigate my own data.

So, with that in mind – I decided to set up my own Azure Data Explorer Cluster and start looking around.

It’s easy enough to find in the Azure Portal – I chose to favorite it as I expect to be using it a fair bit over the next little while

Find “Azure Data Explorer Clusters” via the Azure Portal search bar

You can also get to it via a shortcut

Select “Azure Data Explorer Clusters” – optionally favorite

Next, I select “Create” and then filled in the minimum amount of information needed to get up and running.

Information for our ADX cluster

Just a few things to focus on here – firstly, the combination of the cluster name and the region is going to form part of a URI – so this has to be unique, and there are certain formats that the name has to meet, but you’ll soon know if you have not met them as the screen will call them out instantly.

Additionally, I’m just creating a Dev cluster here – note that there is no SLA.

Also, the region that I chose has Availability Zones – yours might not, or you might choose not to take advantage of them. I chose to include all three.

Availability Zones

You’ll then notice that you can step through all of the remain setup.

Additional setup (can be skipped – I accepted the defaults)

I simply accepted the defaults, but it’s great to know you can define additional settings at this stage.

It’ll take a little time to set the cluster up – but you’ll soon be up and running.

And once we are up and running we can see that the URI is made up of the cluster name and region

ADX Cluster name

In order to create a database on the cluster head to the resource, click on “Databases” in the left hand side menu (scrolling may be needed), click “Add database” near the top of the screen and then enter a name for the database in the fly-out pane.

Create a new database on out ADX cluster

That’s it – we now have our first Azure Data Explorer Cluster created with a new shiny database.

Hope this help somebody.

Cheers

Marty.

Azure Monitoring Diagrams

Recently, I’ve been digging into Monitoring in Azure.

I was really pleased to find that a visual representation of an Azure VNET can be produced from inside the Azure Portal.

Here’s an example

An Azure VNET diagram from the Azure Portal

It’s an SVG image.

You can import these into your documentation.

If you select a VNET in the Azure Portal and then locate the “Monitoring” section, you’ll see a menu item called “Diagram”

Just select it and you end up with a downloadable image like above

Have a great day

cheers

Marty

Azure dependent resources

G’day

They say you learn by your mistakes.

And today (amongst other days) – I did just that.

But in a good way.

I wanted to create a simple Network Interface Card (NIC) for one of my Azure VM’s.

Unfortunately (at least partly), I picked the wrong subscription.

The subscription just happened to be absolutely empty – not a single resource (even a resource group) in the subscription.

Ordinarily, I’d have just gone ahead and created the resources (in this case a NIC) and then realised my mistake at another point down the road – but something caught my eye.

so, exactly like the message says – I had no vnet in my subscription.

I’m sort of pleased the Azure Portal flagged this to me and it left me wondering if it’d flag other resources that were dependent on things that didn’t exist.

Now that’s another thing I have on my (ever going) TODO list – to identity resources that absolutely rely on other resources.

Have a great day.

Cheers

Marty.

What exactly is a modern data warehouse?

G’day,

I was asked recently what a modern data warehouse is – and that’s a vey thought provoking question.

First, there’s the tag “Modern” – I think what’s modern now may not be quite so modern in a year or twos time – maybe sooner, judging by the speed of some technology changes at present.

My initial response – jokingly – was ‘it depends who you ask’, which is a pretty useless – but true – answer.

Firstly let me put some context around my answer. I’ve built my career around Microsoft technologies – so my answer is going to centre around their latest technologies – which are going to be all cloud based, and essentially incorporate Platform-as-a-Service (PaaS) functionality. Your experience may be different, you might be in a different cloud, you might have different ideas – but for me it’s PaaS Services in the Azure cloud. I’ll expand on this shortly.

But let’s (quickly) look back on how I’ve seen data warehouses develop – in my Microsoft world – over the years. They used to be simply built on SQL Server, incorporating fact and dimension tables. There’d be debate about star and snowflake schemas, about kimball or Inmon. More recently people talk about Data Vault. Essentially, people pulled data from their operational (transactional) system into their data warehouse (decision support system or Enterprise data warehouse or whatever they chose to call it).

They did this using some Extract, Transform and Load (ETL) tool (commonly called a data integration tool) like SQL Server integration services.

They might have had a single data source (one transactional system) or they may have had more data sources – so firstly staging the data made sense, then cleaning it and then putting it into the main data warehouse.

We’d have to deal with things like slowly changing dimensions and very often we may not have modelled data exactly as it is in the real world.

We might put analytical models over the data warehouse, initially these were multi-dimensional (MDX) but later Tabular. Then we’d either report straight off the data warehouse or the analytical model – using tools like SQL Server Reporting Services or – lately – Power BI.

For me, things have moved to the cloud and our options have expanded (or should I say exploded)

We can take data from literally any data source and land it in our Azure data lake. This area can be segregated, as it likely contains our raw data. we can now keep this – untouched – as space is literally no longer an impediment (ok, it still costs money, but it’s cheap, and we can have different degrees of storage (hot / cold /archive) and we have to trade off access times with cost. We also don’t have to worry about upgrading and migrating data at any time – this is all part of the service provide by a modern cloud.

We can then start segregating the data into different areas (of the data lake), say a staging area, a clean area, and maybe an area full of curated data sets.

We can pull in different types of data in different formats – CSV, JSON, AVRO, Parquet, and more.

Once we have the data in out desired area we can then pull it into a traditional data warehouse – say in Azure SQL Database for instance (although there maybe better cloud solution). We can use Polybase to virtualise the data.

We could also use Synapse, CosmosDB or maybe databricks.

We can build Machine Learning Models and enable Artificial Intelligence services.

Basically, I think the ‘modern’ data warehouse is about cloud based data services and simply adds to the traditional models of data warehousing.

Whereas the traditional data warehouse is about historical reporting, such as diagnostic and descriptive analytics – the new tools enable predictive and prescriptive analytics along with cognitive capabilities.

So, basically the answer to ‘what is the modern data warehouse?’ could be a lot of things, pieced together in many different ways. and we’re only going to get more as time marches on.

So maybe we should start talking about the changing data warehouse, or the evolving data warehouse or the cloud data warehouse – but not the modern data warehouse, as what’s modern today is sure to change tomorrow.

Anyway, these are just my thoughts. Some will agree, some won’t and they’ll be plenty of diverse ideas – that’s great for innovation. Um, maybe the innovative data warehouse is a better name 🙂

Anyway, have a great day

Cheers

Martin.