Image Credits https://docs.microsoft.com/

How to use Azure Data Factory custom activities

Charith Ekanayake
4 min readJun 5, 2021

--

Azure Data Factory, gives you the freedom to create a custom activity with your own data movement or transformation logic. Then the same can be orchestrated inside a data factory pipeline.

The custom activity that I am going to explain consists of the below steps

  • Capture the metadata of the latest delivered file to an SFTP directory.
  • Write the metadata back to an Azure SQL table. Subsequently, this data can be looked up by a Lookup activity and files can be copied based on the same metadata using a Copy activity.

What should you need to run this custom activity securely?

  • Azure Key Vault
  • Azure Batch Account, Azure Storage Account
  • Linked Services to AzureBatch Account, AzureStorage Account from Azure Data Factory

Essentially the Azure Data Factory Service 😊

It was written as a workaround for the off-the-shelf metadata activity, which has a limitation to capture the metadata(Name, LastModifiedDate, etc) of the latest delivered file on the SFTP host.

This is a C# Console app written based on the dotnet framework. Refer the Github Repository for project files.

Code Snippet

public static DataTable SFTPMetadata
(
string connectionHost, string connectionUsername,
string connectionPassword, string remoteDirectory
)
{
using (var sftp = new SftpClient(connectionHost, connectionUsername, connectionPassword))
{
sftp.Connect();
var files = sftp.ListDirectory(remoteDirectory); var file = files.Where(y => !y.Name.StartsWith(".")).OrderBy(x => x.LastAccessTime).Last(); DataTable table = new DataTable(); table.Columns.Add("FileType", typeof(string)); table.Columns.Add("Source", typeof(string)); table.Columns.Add("FileName", typeof(string)); table.Columns.Add("DateTimeUTC", typeof(DateTime)); fileName = file.Name;table.Rows.Add(FileExt, FileCat, fileName, DateTime.UtcNow);
return table;
}
}

a. Azure Storage Account formalities

Create a private container inside the storage account and upload the files inside the net 5.0 folder in the Release directory.

b. Azure Batch Account formalities

Create a Windows Server VM pool inside the batch account so that custom workload could be executed inside the same pool.

The Data Factory pipeline structure would look like below. Two web activities to retrieve the passwords stored on the Azure Key Vault and a custom activity. We use the Azure Key Vault to store passwords securely and retrieve them using ADFv2 later.

Web Activity Parameter Properties

Property             ValueSecureOutput         True
URL [Your secret URI value]?api-version=7.0
Method GET
Authentication Managed Identity
Resource https://vault.azure.net

Custom Activity

Select the linked service name created to Azure Batch account from ADFv2 from the drop-down

In the settings tab pick the blob storage linked service and fill your storage container name to which the files were previously uploaded.

Extended Properties

Command Parameter
CustomActivities.exe ADF_SFTP_MetadataReader
outPutTableName
FileReader_MetaData
outPutDbConnectionString
@concat('Data Source=[YourAzureSQLServer].database.windows.net;Initial Catalog=[Your DBName];User ID=[sqlauthUsername];Password=',activity('Web2').output.value,';Connection Timeout=500')
connectionHost
[yoursftphost].com
sftpUserName
[sftpusername]
passWord
@activity(‘Web1’).output.value
remoteDirectory
[/remote/sftpdir/]
FileCat
[yourprefferedcat]
FileExt
[]

That should be it. Hope this article helps your course quick start guide in creating a custom dotnet activity in ADFv2.

Special Notes

  1. activity.json is a dynamic file created on runtime based on the Extended properties you enter in ADFv2 custom activity. Omit this file when you upload files to the storage container.
  2. For debugging purpose of your code locally you can update the activity.json and store it inside the debug folder.

3. The C# project file can be extended to handle multiple use cases in the form of new classes since the app accepts a class name as a parameter.

4.You can have [n] number of extended properties based on your requirement.

References

Securing the passwords using Azure Key Vault in Data Factory

Key Vault Overview

Use Azure Key Vault secrets in pipeline activities — Azure Data Factory | Microsoft Docs

--

--