This months T-SQL Tuesday (January 2022) invite is brought to you by Andy Yun – @sqlbek.
Andy’s asked us to write about preconceived notions.
I’ve not written in T-SQL Tuesday for quite some time – so, why not resolve to get 12 T-SQL Tuesday posts in for ’22 (12 in 22 🙂 ), and while I’m at it revisit my blogging journey. I’m not sure why I stopped blogging regularly, maybe I have a preconceived notion that nobody would read my stuff!
Anyway, before I put pen to paper (so to be speak) – or more accurately fingers to keyboard, I decided to have a look around the web at the definitions for “Preconceived Notions” – I like this one,
“a personal belief or judgment that is not founded on proof or certainty”
And my problem for at least the early part of career was that I was too willing to take the word of other people at face value, even when I suspected they weren’t correct rather than debating with them. I guess I had the preconceived notion that others preconceived notions (that often they were quite happy to tell anybody who cared to listen about) were correct.
Here’s an example, somebody once told me ‘All cursors are bad, never write one’ – and then their preconceive notion became mine (unfortunately)
Now, I soon found that cursors did have there uses, maybe not as a standard approach in production, but certainly to help me ‘write code , that writes code’.
I used them for things like, generating comma separated lists for select statements that I’d pulled from INFORATION_SCHEMA views.
I used then to create multiple DDL statements and cycle through a list, executing individual statements programmatically- usually just on my local dev machine.
Most of this type of thing I can now do with other concepts, such as XML for generating CSV lists or window functions for retrieving values from rows other than the current one.
While my use of cursors has probably subsided over the years, I still use them – but mainly when I’m developing to ‘write code, that writes code’
The moral of story – I think – is avoid letting other peoples reconceived notions become your preconceived notions, that how myths start and are continue.
I soon learned my lesson to question everything, the world revolves around people – or is it data? – but that’s just an expression, or is it yet another preconceived notion 🙂
The STUFF function in SQL Server is one of those little gems that is very under-used but when needed can be a real handy utility – at least that’s how it appears to me.
The STUFF function allows you to replace specific characters from a string with the contents of another string – we have to identify these characters using a number, signifying where to start and how many characters to replace.
At this stage, you might be wondering why the function is not called REPLACE – great questions. Turns out that T-SQL already has a REPLACE function, so (obviously 🙂 ) STUFF became the name for his one.
The function basically takes a string, the number of the character to being the string removal from, the length of the string to remove and another string that represents the characters that should be used to STUFF into the original string.
Here an example.
Let’s turn the sting “Your welcome” into the string “You’re welcome” using T-SQL and the STUFF function.
The strings can even be the result of SELECT statements themselves
SELECT STUFF((SELECT 'Your Welcome') , 4,1,(SELECT '''re')) AS [Correction(Again)];
The replacement string can even be an empty string – in which case we are essentially removing a character. I’ve found this really useful when concerning comma separated strings and then either removing the first or last comma.
Here’s a example (removing the leading comma)
SELECT STUFF((SELECT ',Martin,Julie,John,Bert,Stephen,Jean,Claire') , 1,1,'') AS [First Comma removed];
One of my children was exploring sounds at school and asked me to give them a hand.
One of the tasks that was set for them was writing down a list of word pairs that are spelt differently by pronounced identically.
While, I managed to come up with a few – it was a bit more challenge than I originally though.
Anyway, being the techie that I am (or at least think I am 🙂 ) – I decided to see if technology could help me. Specifically, could SQL Server help me out.
I’d worked with SOUNDEX in Oracle many, many years ago, but I’d never really had reason to explore it in SQL Server – well, now was my chance.
So, not only does SQL Server have a SOUNDEX function, it also has a DIFFERENCE function that indicates how different SQL Server thinks two words may be from one another.
As you might expect with functions based around string – collation is a factor.
SOUNDEX returns a four-character code that is based on how the string sounds when spoken in English. You can read more about how those codes are put together in the docs.
DIFFERENCE provides an integer values between 0 and 4 with 0 meaning no similarity and 4 representing a very close match.
I experimented with a few different combinations, ranging from the identical to the clearly different – I even used some numbers.
We can see that it found most homophones to be a 4 – indicating a very strong similarity.
Interestingly enough it compared numbers ok, it got the difference between the strings ‘martin’ and ‘luke’ right – not similar at all. However, it thought the strings ‘boy’ and ‘girl’ had a least some similarity, which seemed odd. it didn’t get ‘right’ and ‘write’ correct, likely as they begin with different letters.
On the whole very encouraging – but not perfect.
An interesting venture into the worlds of sounds in SQL Server and what my kids are up to at school.
I was recently looking at getting some sales data from SQL Server, aggregating some totals and then displaying a grand total.
This is the sort of thing that very often gets done on the client side – but as it turns out, this can be accomplished using standard T-SQL, and, before you ask, no, there are no cursors in sight.
So I put together some sample data, here it is
USE tempdb;
GO
/*
Create a simple table.
We'll simply group by ClientName
*/
IF OBJECT_ID('dbo.Sales') IS NOT NULL
BEGIN
DROP TABLE dbo.Sales;
END;
GO
CREATE TABLE dbo.Sales
(
SaleID INT NOT NULL IDENTITY(1,1)
,ClientName NVARCHAR(10) NULL
,Amount INT NOT NULL
);
GO
/*
Insert some data
*/
INSERT INTO dbo.Sales(ClientName , Amount)
SELECT
ClientName
, Amount
FROM
(
VALUES
('Marty' , 10),
('Marty' , 1),
('Marty' , 3),
('Callum' , 6),
('Callum' , 3),
('Callum' , 23),
('Jake' , 45),
('Jake' , 3),
('Jake' , 67),
('Bert' , 5),
('Bert' , 1),
('Bert' , 0),
('Bert' , 6),
('Zeb' , 35),
('Zeb' , 23),
('Zeb' , 12)
) AS TempSales(ClientName,Amount);
-----------------------------------------------------------------
The output that we’d really like – using only T-SQL – is like below.
Note that the last line contains a grand total.
While, there are a few ways we might be able to get this result set, lets concentrate on one in particular, and that is using ROLLUP and GROUPING
The following query comes pretty close
SELECT
[Full Name] = ClientName
, [Units] = SUM(Amount)
, [Grouping] = GROUPING(ClientName)
FROM
dbo.Sales
GROUP BY ROLLUP(ClientName)
ORDER BY
[Grouping] ASC , [Full Name];
and gives us
Which is close, but note we have a column called “Grouping” – however the Total looks fine, except that it is registering as NULL!
So, how did the “Grouping” column get there?
Well, first off – I decided to call it “Grouping”, but I could have literally called it anything that I wanted.
The column actually came from he use of the T-SQL GROUPING function. This function is basically used for totaling groups
The totaled column (in this case there is only one) has a NULL value – and the 1 in the “Grouping” column indicates that this NULL column is giving us a grand total – which will differentiate it from any NULL values in the same column that are not grand totals.
To demonstrate, lets change the value ‘Bert’ in our original table to be NULL
UPDATE dbo.Sales SET ClientName = NULL WHERE ClientName = 'Bert'
and when we run the query again
SELECT
[Full Name] = ClientName
, [Units] = SUM(Amount)
, [Grouping] = GROUPING(ClientName)
FROM
dbo.Sales
GROUP BY ROLLUP(ClientName)
ORDER BY
[Grouping] ASC , [Full Name];
we get the result
Note that we now have two NULL values – one of which actually represents a Grand Total and the other just represents NULL values in the table.
Also note that SQL has grouped the NULL values from the table together – so watch out for how NULLs are handled in various T-SQL expressions.
So the only way we know which NULL represents a Grand Total is by looking at the “Grouping” column – the one that has a Grand Total value of 1, while the aggregated NULL values from the table are represented by a 0 in the “Grouping” column.
With this knowledge, lets try and tidy up this output to get the output that we originally wanted – I chose to use a CTE
WITH ClientAmountsCTE
AS
(
SELECT
[Client Name] = ClientName
, [Units] = SUM(Amount)
, [Total] = GROUPING(ClientName)
FROM
dbo.Sales
GROUP BY ROLLUP(ClientName)
)
SELECT
[Client Name] =
CASE
WHEN [Total] = 1 THEN 'Total'
WHEN [Client Name] IS NULL THEN 'UNKNOWN'
ELSE [Client Name]
END
, [Units]
--, [Total]
FROM
ClientAmountsCTE
ORDER BY
[Total] ASC;
GO
which gives us the result
Notice that a CASE statement has been used to differentiate the NULL columns based on the value of the “Grouping” column (which is itself not used in the display output).
ROLLUP can do a little more than this when multiple columns are involved, so check out the docs.
Note also, that there are two ways to use ROLLUP – one way is NON-ISO compliant (uses WITH) and the other way is ISO compliant (does not use WITH)
Examples
-----------------------------------------------------------------
/*
Select the data grou abd use the ROLLUP syntax
Non ISO compiant syntax (uses WITH)
*/
SELECT
[Full Name] = ClientName
, [Units] = SUM(Amount)
, [Total] = GROUPING(ClientName)
FROM
dbo.Sales
GROUP BY ClientName WITH ROLLUP
ORDER BY
[Full Name] DESC;
-----------------------------------------------------------------
/*
Select the data grou abd use the ROLLUP syntax
Fully ISO compliant syntax (does not use WITH)
*/
SELECT
[Full Name] = ClientName
, [Units] = SUM(Amount)
, [Total] = GROUPING(ClientName)
FROM
dbo.Sales
GROUP BY ROLLUP(ClientName)
ORDER BY
[Full Name] DESC;
I blogged the other about have some issues translating Oracle date formats to SQL Server date data types
I find the way SQL Server handles dates with a defined data type to be much more intuitive that the string manipulation techniques that Oracle seems to favour.
My new found friend is the SQL Server FORMAT() function.
Here’s some examples of using FORMAT () with dates, and while you can likely use CONVERT to achieve similar results, I find the option of using FORMAT() to be very useful.
It is also possible to format much more that dates with the FORMAT() function.
Recently I was asked if I could identify columns in a single SQL Server database that had a collation. I didn’t need to change anything (well, not initially, but we’ll look at doing that anyway ), simply identify columns that were candidates to have their collation changed.
Now, there’s a few other considerations here, notably the database collation and server collation, but more on that later. For now, back to the problem at hand.
My first stop was to identify all data types that could possibly have a collation.
In order to do that I’m going to head to my old friends – the system tables.
First off, a quick trip to sys.types to see which datatypes may have collations
SELECT T.[name] FROM sys.types AS T WHERE T.collation_name IS NOT NULL;
This yields the following data results
text
ntext
varchar
char
nvarchar
nchar
sysname
SYSNAME is an interesting one – it’s basically equivalent to NVARCHAR(128), there’s a discussion about that here
But in order to find the columns in our database that have collations we can use a query such as
SELECT
[Table Name] = T.[name]
,[Column Name] = C.[name]
,C.collation_name
,TY.[name]
,[Typelength] = CASE
WHEN TY.[name] IN ('text' , 'ntext') THEN UPPER(TY.[name])
WHEN C.[max_length] = 8000 THEN TY.[name] + N'(MAX)'
ELSE UPPER(TY.[name]) + + N'(' + CAST(C.[max_length] AS NVARCHAR(31)) + N')'
END
FROM sys.columns AS C
JOIN sys.tables AS T ON C.[object_id] = T.[object_id]
JOIN sys.types AS TY ON TY.[user_type_id] = C.[user_type_id]
WHERE C.collation_name IS NOT NULL
AND T.[is_ms_shipped] = 0
From here, it’s just a matter of extending the query to produced some T-SQL that will update the collations of the columns – I chose to put this in a Common Table Expression (CTE), but go with whatever suits you.
WITH CTE
AS
(
SELECT
[Table Name] = T.[name]
,[Column Name] = C.[name]
,C.collation_name
,TY.[name]
,[Typelength] = CASE
WHEN TY.[name] IN ('text' , 'ntext') THEN UPPER(TY.[name])
WHEN C.[max_length] = 8000 THEN TY.[name] + N'(MAX)'
ELSE UPPER(TY.[name]) + + N'(' + CAST(C.[max_length] AS NVARCHAR(31)) + N')'
END
FROM sys.columns AS C
JOIN sys.tables AS T ON C.[object_id] = T.[object_id]
JOIN sys.types AS TY ON TY.[user_type_id] = C.[user_type_id]
WHERE C.collation_name IS NOT NULL
AND T.[is_ms_shipped] = 0
)
SELECT N'ALTER TABLE ' + [Table Name] + N' ALTER COLUMN ' + [Column Name] + N' ' + [Typelength] + N' COLLATE Latin1_General_CI_AS' + ';' FROM CTE
This will produced statements that will change the collation of columns on an individual basis. Notice in the final SELECT statement from the CTE, I’ve stated COLLATE Latin1_General_CI_AS’. This will generate a lot of statements such as
ALTER TABLE CollationChangeTable ALTER COLUMN ColumnOne NCHAR(8) COLLATE Latin1_General_CI_AS;
I prefer this approach to a cursor as it gives me an opportunity to eyeball the T-SQL and column names.
A cursor may be more appropriate in a large database – you choice, there’s lots of them around.
Proceed with caution as getting column COLLATIONs out of sync with the database and instance can bite tou hard down the track – such as conflicts with tempdb joins.
OPENJSON is pretty central to manipulating JSON documents in T-SQL.
As we’ve seen, we can use a default schema that will return metadata about the JSON document or we can use an explicit schema where we supply a WITH clause and a PATH to the data that we want.
So far, in our examples we have either used the default schema or an explicit schema – but we have not mixed the two.
Indeed if you try this in a simple query you’ll find it is not possible without the use of APPLY operator.
OPENJSON is actually a table-valued function that parses JSON – either into an explicit schema (that we provide) or the default schema.
We can mix both the default and explicit schema by using CROSS APPLY
Like so
DECLARE @json NVARCHAR(MAX) =
N'
{
"Configuration Property": {
"Configuration name": "recovery interval (min)",
"Value": 0,
"minimum": 0,
"maximum": 32767,
"value_in_use": 0,
"description": "Maximum recovery interval in minutes",
"is_dynamic": true,
"is_advanced": true
}
}
';
SELECT
DS.[key]
,DS.[value]
,DS.[type]
,ES.[Configuration name]
,ES.[Value]
,ES.[minimum]
,ES.[maximum]
,ES.[value_in_use]
,ES.[description]
,ES.[is_dynamic]
,ES.[is_advanced]
FROM
OPENJSON(@json) AS DS
CROSS APPLY
OPENJSON(DS.[value])
WITH
(
[Configuration name] NVARCHAR(35)
,[Value] NVARCHAR(35)
,[minimum] NVARCHAR(35)
,[maximum] NVARCHAR(35)
,[value_in_use] NVARCHAR(35)
,[description] NVARCHAR(35)
,[is_dynamic] NVARCHAR(35)
,[is_advanced] NVARCHAR(35)
) AS ES
GO
This technique can come in extremely handy when attempting to drill down into sub arrays and objects contained in a JSON document.
Notice also that different aliasing (of the default and explicit schema) has allowed us to request data from both the default schema and the explicit schema. This has enabled us to get to the element that is named value – as this name appears in both the default and explicit schema names and thus requires an alias.
I hope this has helped on your SQL Server JSON journey. We’ll see more real soon.
We can alias both Tables (FROM clause) and Columns (SELECT clause) and some other things too. It’s all pretty fundamental to writing good T-SQL queries.
Here’s an example
SELECT
dbs.collation_name
,dbs.database_id AS [DatabaseID]
,[IsReadOnly] = dbs.is_read_only
FROM
sys.databases AS dbs
JOIN
sys.master_files MF
ON
dbs.database_id = MF.database_id
WHERE
dbs.[name] = 'master';
GO
Note that we can use different styles, however, I recommend you standardise however you want to do this.
Anyway, the point of this post if to outline how aliasing is done when using the default and explicit JSON schemas in our T-SQL Queries
Here is an example of aliasing with the default schema
DECLARE @json NVARCHAR(MAX) =
N'
{
"Configuration Property": {
"Configuration name": "recovery interval (min)",
"Value": 0,
"minimum": 0,
"maximum": 32767,
"value_in_use": 0,
"description": "Maximum recovery interval in minutes",
"is_dynamic": true,
"is_advanced": true
}
}
';
SELECT
DS.[key]
,DS.[value]
,DS.[type]
FROM
OPENJSON(@json, '$."Configuration Property"') AS DS;
GO
There’s nothing particularly unusual here.
The slightly unusual part may appear when aliasing with an explicit schema.
We’ve looked at getting pulling data from a JSON document into relational table format using an explicit schema that was defined in the WITH clause of the OPENJSON table valued fumction.
However, in that example, we used a PATH expression that was taken from the root of the JSON document.
A question that I recently had was ‘What of you want to take the PATH from a certain point in the JSON document’ – and that’s what we will look at here.
We use this simple document, which contains one object
Notice also that Configuration name has a space in it. So, in the PATH we simply enclose it in double quotes “..” – in a similar way to how square brackets would be used in a T-SQL expression.
Lets, look at a slightly different shaped JSON document and see how we would change the PATH in the WITH clause to locate all of the desired values.