SQL SERVER How to deleting big volume of data from very huge table

Issue

We one of very huge database which running out of server disk space. I was asked to remove millions of historical data from some huge table. When I try to run DELETE with where condition directly, I got two problems. Firstly, whole table blocked for long time (long then 12 hours). Secondly, after run DELETE the ldf Database log file increased exponentially and the script ending with error because disk is full.

Solution

Base on the server’s situation, we may using different way to solve this issue.

1. We just want to Deleting All the data from the table then the best option will be Truncate table directly, as below:

TRUNCATE TABLE huge_history_table


This TRUNCATE TABLE command will remove all rows from a table, without logging the individual row deletions. TRUNCATE TABLE is similar to the DELETE statement with no WHERE clause; however, TRUNCATE TABLE is faster and uses fewer system and transaction log resources.

2. We want to deleting more than 80-90 Percent of the data from the huge table and this table can be offline for short period. We can using SELECT INTO script to move the data (data need to keep) to another staging table. Then Truncate this Large table and Insert back the staging data. as below:

SELECT * INTO staging_table
FROM huge_table
WHERE create_date>’DateAdd(month, -1, Convert(datetime, GetDate()))’


TRUNCATE TABLE huge_table


INSERT INTO huge_table (column_name)
SELECT column_name
FROM staging_table

3. One last option I can think of is to change your database’s Recovery Mode to SIMPLE and then delete rows in smaller batches using a while loop something like this. We want to delete large historical data from the table, the table need to be accessed by system 24×7 and cannot find a time take if offline. Then we delete rows in smaller batches using a while loop something as below:

DECLARE @Deleted_Rows INT;
SET @Deleted_Rows = 1;

WHILE (@Deleted_Rows > 0)
BEGIN
— Delete some small number of rows at a time
DELETE TOP (10000) LargeTable
WHERE create_date<‘DateAdd(month, -1, Convert(datetime, GetDate()))’

SET @Deleted_Rows = @@ROWCOUNT;
END

PowerShell : How to replace string in files

Issue

How can we replace text in multiple files and folders quickly if folder name and file name are dynamically change.

Solution

Using PowerShell built-in cmdlets we can easily read and replace any kind of content inside of a text file.

First, we can using Get-ChildItem to get all the files. Because the folder name and file name are dynamical then need to use wildcard in the path as below:

$files = Get-ChildItem ‘D:\test\*\files**.txt’ -Recurse

Then we need to loop the $files to get then content for each file and replace.

$files | %{
(gc $_ -Encoding UTF8) -replace ‘test’,” | Set-Content $_.fullname -Encoding UTF8
}

Why left join not returning nulls?

Issue

I am trying to get all products and category name if category is still active with below left join script. But I cannot get production information if category is not active. What’s wrong with this left join script?

select p.name, c.name
from products p
left join categories c on p.categoryid = c.categoryid
where c.isactive = 1 or c.isactive is null

Reason

To find out what’s the issue of this script have, we need to understand order of a SQL query. In the script above we used SELECT, FROM, LEFT JOIN and WHERE

  1. FROM and JOIN: The FROM clause, and subsequent JOINs are first executed to determine the total working set of data that is being queried
  2. WHERE: Once we have the total working set of data, the WHERE constraints are applied to the individual rows, and rows that do not satisfy the constraint are discarded.
  3. SELECT: Any expressions in the SELECT part of the query are finally computed.

The WHERE clause is execution after the ON clause. which means the predicates in the ON clause are applied to the table before the join, WHERE clause is semantically applied to the result of the join.

So below script will get all the data from product table and will keep categories data as null if cannot find category.

select p.name, c.name
from products p
left join categories c on p.categoryid = c.categoryid

WHERE clause is applied to the JOIN result which will remove products result if category missing.

where c.isactive = 1 or c.isactive is null

Solution

if we want to get all product information even if categories is missing or is not active, we need to changed the WHERE clause to ON clause as below:

select p.name, c.name
from products p
left join categories c on p.categoryid = c.categoryid
and (c.isactive = 1 or c.isactive is null )

SSIS Converting date to YYYYMMDD

SSIS Expression format date to YYYYMMDD

Mostly, when we create an auto file creation SSIS package we want to have timestamp in file name. To achieve this purpose, we can simply use below SSIS Expression to produce file name with current timestamp.

“D:\\OutPut\\Transactions_”
+ (DT_STR, 4, 1252) DATEPART(“yyyy” , GETDATE())
+ RIGHT(“0” + (DT_STR, 2, 1252) DATEPART(“mm” , GETDATE()), 2)
+ RIGHT(“0” + (DT_STR, 2, 1252) DATEPART(“dd” , GETDATE()), 2)
+ “.csv”

SSIS Expression format YYYYMMDD to Date

(DT_DATE)(SUBSTRING(@[yourDateAsStr],1,4)
+ “-” + SUBSTRING(@[yourDateAsStr],5,2)
+ “-” + SUBSTRING(@[yourDateAsStr],7,2))

Set SSIS Expression to format File Connection Manager Property

Powershell copy or move files from multiple folder to another folder

In the daily data processing task, we always need to copy or move files from multiple source place to one destination folder. In this blog, I’ll show you how to do those task quickly and easily via PowerShell.

Sample Code

I am using below code to move all csv files under Archives folder to Source folder. Archives folder has more than thousands sub-folders.

get-childitempath e:\archives\*.csv -recurse | move-item -destination d:\source

Get-childitem

Get-ChildItem by default only can get the items in one or more specified folder, to perform sub-folders copy and move , we need to user parameter -recurse .

Pip the result

To perform loop copy or move, we need to pipe (|) the recurse result first, then we can use move-item and copy-item cmdlets to do copy or move.

RDP unable to connect Azure server

This week I have been having trouble connecting to my Azure server via RDP. I keep getting below errors:

-The number of connections to this computer is limited and all connections are in use right now.

I can connect to server if I changed inbound security rule to allowed my IP address only. But I cannot use white IP list inbound rule as I don’t have static IP.

Then I did some deep investigation and try to tracking failed logon information from system event.

I checked Event Viewer -> Windows Logs -> Security there are lots of Audit Failure event looks like below:

But as you see, it is completely useless. I only can guess their are some attackers tried to logon with a username of Administrator and the Logon Type is set to 3 (generic network logon), and there is no Source Network Address recorded.

But in Event Viewer -> Applications and Services Logs -> Microsoft -> Windows -> RemoteDesktopServices-RDPCoreTS I found lots of below warnings:

In Windows Server 2012 and later version, if an attacker attempts to logon but fails to do so AND uses a username that DOES NOT EXIST on the targeted RDS host or domain that the host is a member of, Event ID 140 is logged, showing you the source IP of the attacker.

I added this IP to Azure blocked IP list then issue is fixed.

SQL server nchar and nvarchar size

This blog I will show you a very trick issue thing a bout nchar and nvarchar actual size and storage size.

Issue

When I use sp_help to check the table property. I got below result:

which shows length 100

but when I using SSMS to check the column size, it shows nvarchar(50)

so what’s the different between sp_help and SSMS design view?

Finding

I checked sp_help , it is get the max length field from sys.all_columns. In Microsoft document website, it mentioned maximum length (in bytes) of the column.

But we know nchar and nvarchar, the storage size is two times n bytes (n is the actual number of characters ). so this explained, sp_help is not real character size, it is the storage size in bytes. But SSMS design view is using real character size.

For details you can check Microsoft article nchar and nvarchar and sys.all_columns

SSIS : Script component error – Output0Buffer does not exist

Before we start to talk about the error, let me introduce a Script component editor property SynchronousInputID first. In Microsoft document, described SynchronousInputID as below:

The SynchronousInputID property has a non-zero value only in transformations with synchronous outputs. If the value of this property is zero, it means that the output is asynchronous. For a synchronous output, where rows are passed through to the selected output or outputs without adding any new rows, this property should contain the ID of the component’s input.

Which means if output data only one line, then the output columns are just part of the Input Columns, and there is no need for Output0Buffer. If your output data has many rows, then you need to create new Output0Buffer.

In SSIS, SynchronousInputID default value is zero. If you created a new Script component and have change this default value, but in C# script you are try to use Output0Buffer.AddRow(); to create multiple rows output, then you will get error message:

Output0Buffer does not exist

If you got this error, just follow below steps to change the SynchronousInputID value to non.

clicking on the Script Transformation Editor -> clicking the catagory Inputs and Outputs -> clicking on Output 0 -> changing SynchronousInputId to None in the right hand panel

SSIS: Extract and import JSON file to SQL Server via Script Component as Transformation

Currently there are lots of JSON format stored data. It’s important to be able to read the JSON data stored in files, load the data into SQL Server, and analyze it. This blog describes how to import JSON files into SQL Server via SSIS Script Component as Transformation.

I have JSON format file looks like below

{“id”:”037-002″,”accountAlias”:”Primary”,”homeCurrency”:”USD”,”marginUsed”:”0.0548″}
{“id”:”055-001″,”accountAlias”:”Primary”,”homeCurrency”:”USD”,”marginUsed”:”10.000″}

need import this format json file to database table, table schema looks like below

CREATE TABLE import_raw_accounts(
id varchar(60) NULL,
account_alias varchar(60) NULL,
home_currency varchar(60) NULL,
division varchar(3) NULL,
userid varchar(60) NULL,
marginUsed varchar(60) NULL
)

Create Script Component as Source

Step 1 : Drag and drop Script Component in the SSIS toolbox to the data flow region. Once you drop the Script component, a new pop up window called Select Script Content Type opened. Then selecting the Source option

Step 2: Within the Input and Outputs tab, Go to Output Columns to add output columns.

Step 3 : Within the Script tab, please click on the Edit Script.. button to write the actual C# Script

Once you click on the Edit Script, it will open the main.cs class file to write the C# code. Please write your custom code inside the CreateNewOutputRows function

Step 4 : Before start extract Json data, we need to create a new class for imported JSON object, which will be the same JSON data structure.

public class Accounts
{
public string Id { get; set; }
public string accountAlias { get; set; }
public string homeCurrency { get; set; }
public string division { get; set; }
public string userid { get; set; }
public string marginUsed { get; set; }
}

Step 5: Add JSON deserialize code to method CreateNewOutputRows

using (StreamReader r = new StreamReader(Variables.JsonFileName))
{
while (!r.EndOfStream)
{
string json = r.ReadLine();
var item = JsonConvert.DeserializeObject(json);
if (item != null)
{
OutputAccountsBuffer.AddRow();
OutputAccountsBuffer.id = item.Id;
OutputAccountsBuffer.accountalias = item.accountAlias;
OutputAccountsBuffer.homecurrency = item.homeCurrency;
OutputAccountsBuffer.marginUsed = item.marginUsed;
var acc = item.Id.Split(‘-‘);
if (acc.Length >= 1)
{
OutputAccountsBuffer.division = acc[1];
OutputAccountsBuffer.userid = acc[2];
}
}
}
}

Final Step: add destination source and mapping output data to table import_raw_accounts.

SSIS Expression Task

According to Microsoft document

The Expression Task creates and evaluates expressions that set variable values at runtime, using the Expression Builder

It’s a pretty useful task which can help you avoid having to write any .net code. In this blog, I will you show you how to use Expression Task to do conditional check.

I have one SSIS package need to export data to flat file. But need to determine if the exported row count is grater than zero or not. If it’s grater than zero then I need to update the row count to database table. If not then need to delete empty file.

I am using Expression Task to do this. The expression in my Expression Task looks like this.

Using conditional logic to compare the row_count then determine which action need to run.

Left side Precedence Constraint Editor

Right side Precedence Constraint Editor