关于SQL Server:SSIS步骤将CSV从Azure Blob加载到Azure SQL

SSIS steps to load CSV from Azure blob to Azure SQL

我需要连接到Azure Blob(源)中的CSV文件,然后将数据加载到Azure SQL Server表中,然后将CSV文件移动到另一个(存档)Azure Blob中。

在没有Azure的情况下,我将创建到本地文件的平面文件连接,然后使用Source Assistant创建Data Flow task


Any ideas for the other half of the problem: moving the files from the Source blob to an Archive blob after processing them?

如我所知,没有内置任务可帮助您实现此目的。根据我的测试,我假设您可以利用脚本任务并编写代码(VB或C#)直接处理Blob。这是我的详细步骤,您可以参考它们:

1)使用"数据流"下的Azure Blob源和OLE DB目标将CSV文件从Azure Blob加载到Azure SQL数据库中。

enter

假设源blob和目标blob在同一容器下,然后按如下所示添加三个变量(SourceBlobUrlContainerSasTokenArchiveBlobUrl),然后在脚本任务编辑器中将它们添加为ReadOnlyVariables,您可以参考本教程介绍如何在脚本任务中使用变量。

enterScriptMain.cs下的Main方法,如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
public async void Main()
{
    // TODO: Add your code here
    string sasToken = Dts.Variables["ContainerSasToken"].Value.ToString();
    string sourceBlobUrl = Dts.Variables["SourceBlobUrl"].Value.ToString();
    string archiveBlobUrl = Dts.Variables["ArchiveBlobUrl"].Value.ToString();

    try
    {
        HttpClient client = new HttpClient();
        client.DefaultRequestHeaders.Add("x-ms-copy-source", sourceBlobUrl + sasToken);
        //copy source blob to archive blob
        Dts.Log($"start copying blob from [{sourceBlobUrl}] to [{archiveBlobUrl}]...", 0, new byte[0]);
        HttpResponseMessage response = await client.PutAsync(archiveBlobUrl + sasToken, null);
        if (response.StatusCode == HttpStatusCode.Accepted || response.StatusCode == HttpStatusCode.Created)
        {
            client.DefaultRequestHeaders.Clear();
            Dts.Log($"start deleting blob [{sourceBlobUrl}]...", 0, new byte[0]);
            //delete source blob
            HttpResponseMessage result = await client.DeleteAsync(sourceBlobUrl + sasToken);
            if (result.StatusCode == HttpStatusCode.Accepted || result.StatusCode == HttpStatusCode.Created)
            {
                Dts.TaskResult = (int)ScriptResults.Success;
                return;
            }
        }
        Dts.TaskResult = (int)ScriptResults.Failure;
    }
    catch (Exception ex)
    {
        Dts.Events.FireError(-1,"Script Task - Move source blob to an archive blob", ex.Message +"\
" + ex.StackTrace, String.Empty, 0);
        Dts.TaskResult = (int)ScriptResults.Failure;
    }
}

结果

enter


按照Bruce的建议,我决定使用Azure存储客户端库,该库不同于Bruce提出的解决方案,因此我将我的工作代码发布给任何希望采用这种方法的人。

我为此找到了两个很好的参考:

http://microsoft-ssis.blogspot.com/2015/10/azure-file-system-task-for-ssis.html

http://cc.davelozinski.com/code/csharp-azure-blob-storage-manager-class

首先,在脚本任务中,添加对Microsoft.WindowsAzure.Storage程序集的引用(C:\\\\ Program Files \\\\ Microsoft SDKs \\\\ Azure.NET SDK \\\\ v2.9 \\\\ ToolsRef)

第二,添加命名空间引用:

1
2
    using Microsoft.WindowsAzure.Storage;
    using Microsoft.WindowsAzure.Storage.Blob;

最后,这是在Azure存储帐户之间移动文件的完整主要方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
    public void Main()
    {
        // Get parameter values
        string blobFile = Dts.Variables["$Package::NewClientFile"].Value.ToString();
        string containerName = Dts.Variables["$Project::ClientContainer"].Value.ToString();

        // get connections
        string connStrSource = Dts.Connections["azureadhocstorage"].AcquireConnection(Dts.Transaction).ToString();
        string connStrTarget = Dts.Connections["azurearchivestorage"].AcquireConnection(Dts.Transaction).ToString();

        try
        {
            // Retrieve storage accounts from connection string.
            CloudStorageAccount storageAcctSource = CloudStorageAccount.Parse(connStrSource);
            CloudStorageAccount storageAcctTarget = CloudStorageAccount.Parse(connStrTarget);

            // Create the blob clients
            CloudBlobClient blobClientSource = storageAcctSource.CreateCloudBlobClient();
            CloudBlobClient blobClientTarget = storageAcctTarget.CreateCloudBlobClient();

            // Create a reference to the container you want to delete
            CloudBlobContainer containerSource = blobClientSource.GetContainerReference(containerName);
            CloudBlobContainer containerTarget = blobClientTarget.GetContainerReference(containerName);

            // get blockblob (the files) references
            CloudBlockBlob blobBlockSource = containerSource.GetBlockBlobReference(blobFile);
            CloudBlockBlob blobBlockTarget = containerTarget.GetBlockBlobReference(blobFile);

            // copy the source to the target, waiting for it to finish (it is Asynchronous between separate accounts)
            blobBlockTarget.StartCopy(blobBlockSource);
            while (blobBlockTarget.CopyState.Status == CopyStatus.Pending)
            {
                // not done copying yet, so go to sleep
                System.Threading.Thread.Sleep(100);
                // refresh the copy status
                blobBlockTarget.FetchAttributes();
            }

            // delete the source
            blobBlockSource.Delete();

            // Show success in log
            bool fireAgain = true;
            Dts.Events.FireInformation(0,"Move Block Blob", containerName +":" + blobFile +" was moved successfully", string.Empty, 0, ref fireAgain);

            // Close Script Task with Success
            Dts.TaskResult = (int)ScriptResults.Success;
        }
        catch (Exception ex)
        {
            // Show Failure in log
            Dts.Events.FireError(0,"Move Block Blob", ex.Message, string.Empty, 0);

            // Close Script Task with Failure
            Dts.TaskResult = (int)ScriptResults.Failure;
        }
    }