SSIS steps to load CSV from Azure blob to Azure SQL
我需要连接到Azure Blob(源)中的CSV文件,然后将数据加载到Azure SQL Server表中,然后将CSV文件移动到另一个(存档)Azure Blob中。
在没有Azure的情况下,我将创建到本地文件的平面文件连接,然后使用Source Assistant创建
Any ideas for the other half of the problem: moving the files from the Source blob to an Archive blob after processing them?
如我所知,没有内置任务可帮助您实现此目的。根据我的测试,我假设您可以利用脚本任务并编写代码(VB或C#)直接处理Blob。这是我的详细步骤,您可以参考它们:
1)使用"数据流"下的Azure Blob源和OLE DB目标将CSV文件从Azure Blob加载到Azure SQL数据库中。
2)成功将CSV数据加载到SQL表后,使用脚本任务将源Blob移动到存档Blob。
我将调用带有容器SAS令牌的Blob服务REST API复制Blob和删除Blob,您可以利用Microsoft Azure存储资源管理器并按照此官方教程为您的Blob容器生成SAS令牌。
假设源blob和目标blob在同一容器下,然后按如下所示添加三个变量(
单击"脚本任务编辑器"下的"编辑脚本"按钮以启动您在其中编写自定义脚本的VSTA开发环境。这是
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | public async void Main() { // TODO: Add your code here string sasToken = Dts.Variables["ContainerSasToken"].Value.ToString(); string sourceBlobUrl = Dts.Variables["SourceBlobUrl"].Value.ToString(); string archiveBlobUrl = Dts.Variables["ArchiveBlobUrl"].Value.ToString(); try { HttpClient client = new HttpClient(); client.DefaultRequestHeaders.Add("x-ms-copy-source", sourceBlobUrl + sasToken); //copy source blob to archive blob Dts.Log($"start copying blob from [{sourceBlobUrl}] to [{archiveBlobUrl}]...", 0, new byte[0]); HttpResponseMessage response = await client.PutAsync(archiveBlobUrl + sasToken, null); if (response.StatusCode == HttpStatusCode.Accepted || response.StatusCode == HttpStatusCode.Created) { client.DefaultRequestHeaders.Clear(); Dts.Log($"start deleting blob [{sourceBlobUrl}]...", 0, new byte[0]); //delete source blob HttpResponseMessage result = await client.DeleteAsync(sourceBlobUrl + sasToken); if (result.StatusCode == HttpStatusCode.Accepted || result.StatusCode == HttpStatusCode.Created) { Dts.TaskResult = (int)ScriptResults.Success; return; } } Dts.TaskResult = (int)ScriptResults.Failure; } catch (Exception ex) { Dts.Events.FireError(-1,"Script Task - Move source blob to an archive blob", ex.Message +"\ " + ex.StackTrace, String.Empty, 0); Dts.TaskResult = (int)ScriptResults.Failure; } } |
结果
此外,您还可以利用Microsoft .NET的Microsoft Azure存储客户端库来访问存储Blob,此时,您需要将程序集加载到GAC中没有的SSIS脚本任务中,有关更多详细信息,请参考访问此官方博客。
按照Bruce的建议,我决定使用Azure存储客户端库,该库不同于Bruce提出的解决方案,因此我将我的工作代码发布给任何希望采用这种方法的人。
我为此找到了两个很好的参考:
http://microsoft-ssis.blogspot.com/2015/10/azure-file-system-task-for-ssis.html
http://cc.davelozinski.com/code/csharp-azure-blob-storage-manager-class
首先,在脚本任务中,添加对
第二,添加命名空间引用:
1 2 | using Microsoft.WindowsAzure.Storage; using Microsoft.WindowsAzure.Storage.Blob; |
最后,这是在Azure存储帐户之间移动文件的完整主要方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | public void Main() { // Get parameter values string blobFile = Dts.Variables["$Package::NewClientFile"].Value.ToString(); string containerName = Dts.Variables["$Project::ClientContainer"].Value.ToString(); // get connections string connStrSource = Dts.Connections["azureadhocstorage"].AcquireConnection(Dts.Transaction).ToString(); string connStrTarget = Dts.Connections["azurearchivestorage"].AcquireConnection(Dts.Transaction).ToString(); try { // Retrieve storage accounts from connection string. CloudStorageAccount storageAcctSource = CloudStorageAccount.Parse(connStrSource); CloudStorageAccount storageAcctTarget = CloudStorageAccount.Parse(connStrTarget); // Create the blob clients CloudBlobClient blobClientSource = storageAcctSource.CreateCloudBlobClient(); CloudBlobClient blobClientTarget = storageAcctTarget.CreateCloudBlobClient(); // Create a reference to the container you want to delete CloudBlobContainer containerSource = blobClientSource.GetContainerReference(containerName); CloudBlobContainer containerTarget = blobClientTarget.GetContainerReference(containerName); // get blockblob (the files) references CloudBlockBlob blobBlockSource = containerSource.GetBlockBlobReference(blobFile); CloudBlockBlob blobBlockTarget = containerTarget.GetBlockBlobReference(blobFile); // copy the source to the target, waiting for it to finish (it is Asynchronous between separate accounts) blobBlockTarget.StartCopy(blobBlockSource); while (blobBlockTarget.CopyState.Status == CopyStatus.Pending) { // not done copying yet, so go to sleep System.Threading.Thread.Sleep(100); // refresh the copy status blobBlockTarget.FetchAttributes(); } // delete the source blobBlockSource.Delete(); // Show success in log bool fireAgain = true; Dts.Events.FireInformation(0,"Move Block Blob", containerName +":" + blobFile +" was moved successfully", string.Empty, 0, ref fireAgain); // Close Script Task with Success Dts.TaskResult = (int)ScriptResults.Success; } catch (Exception ex) { // Show Failure in log Dts.Events.FireError(0,"Move Block Blob", ex.Message, string.Empty, 0); // Close Script Task with Failure Dts.TaskResult = (int)ScriptResults.Failure; } } |