Identifying Columns with Collations in SQL Server

Recently I was asked if I could identify columns in a single SQL Server database that had a collation. I didn’t need to change anything (well, not initially, but we’ll look at doing that anyway ), simply identify columns that were candidates to have their collation changed.

Now, there’s a few other considerations here, notably the database collation and server collation, but more on that later. For now, back to the problem at hand.

My first stop was to identify all data types that could possibly have a collation.

In order to do that I’m going to head to my old friends – the system tables.

First off, a quick trip to sys.types to see which datatypes may have collations

SELECT T.[name] FROM sys.types AS T WHERE T.collation_name IS NOT NULL;

This yields the following data results

  • text
  • ntext
  • varchar
  • char
  • nvarchar
  • nchar
  • sysname

SYSNAME is an interesting one – it’s basically equivalent to NVARCHAR(128), there’s a discussion about that here

But in order to find the columns in our database that have collations we can use a query such as

	SELECT 
		 [Table Name]  = T.[name]
		,[Column Name] = C.[name]
		,C.collation_name
		,TY.[name]
		,[Typelength] = CASE
						WHEN TY.[name] IN ('text' , 'ntext') THEN UPPER(TY.[name])
						WHEN C.[max_length] = 8000 THEN TY.[name] + N'(MAX)'
						ELSE UPPER(TY.[name]) + + N'(' + CAST(C.[max_length] AS NVARCHAR(31)) + N')'
					END
	FROM sys.columns AS C
	JOIN sys.tables AS T ON C.[object_id] = T.[object_id]
	JOIN sys.types AS TY ON TY.[user_type_id] = C.[user_type_id]
	WHERE C.collation_name IS NOT NULL
	AND T.[is_ms_shipped] = 0

From here, it’s just a matter of extending the query to produced some T-SQL that will update the collations of the columns – I chose to put this in a Common Table Expression (CTE), but go with whatever suits you.

WITH CTE
AS
(
	SELECT 
		 [Table Name]  = T.[name]
		,[Column Name] = C.[name]
		,C.collation_name
		,TY.[name]
		,[Typelength] = CASE
						WHEN TY.[name] IN ('text' , 'ntext') THEN UPPER(TY.[name])
						WHEN C.[max_length] = 8000 THEN TY.[name] + N'(MAX)'
						ELSE UPPER(TY.[name]) + + N'(' + CAST(C.[max_length] AS NVARCHAR(31)) + N')'
					END
	FROM sys.columns AS C
	JOIN sys.tables AS T ON C.[object_id] = T.[object_id]
	JOIN sys.types AS TY ON TY.[user_type_id] = C.[user_type_id]
	WHERE C.collation_name IS NOT NULL
	AND T.[is_ms_shipped] = 0
)
SELECT N'ALTER TABLE ' + [Table Name] + N' ALTER COLUMN ' + [Column Name] + N' ' + [Typelength] + N' COLLATE Latin1_General_CI_AS' + ';' FROM CTE

This will produced statements that will change the collation of columns on an individual basis. Notice in the final SELECT statement from the CTE, I’ve stated COLLATE Latin1_General_CI_AS’. This will generate a lot of statements such as

ALTER TABLE CollationChangeTable ALTER COLUMN ColumnOne NCHAR(8) COLLATE Latin1_General_CI_AS;

I prefer this approach to a cursor as it gives me an opportunity to eyeball the T-SQL and column names.

A cursor may be more appropriate in a large database – you choice, there’s lots of them around.

Proceed with caution as getting column COLLATIONs out of sync with the database and instance can bite tou hard down the track – such as conflicts with tempdb joins.

Have a great day.

Cheers

Marty.

Leave a Reply

Your email address will not be published. Required fields are marked *