Azure Data Factory (ADF) is a fully-managed data integration service in Azure that allows you to iteratively build, orchestrate, and monitor your Extract Transform Load (ETL) workflows. In the journey of data integration process, you will need to periodically clean up files from the on-premises or the cloud storage server when the files become out of date. For example, you may have a staging area or landing zone, which is an intermediate storage area used for data processing during your ETL process. The data staging area sits between the data source stores and the data destination store. Given the data in staging areas are transient by nature, you need to periodically clean up the data in the staging area after the ETL process has being completed.
We are excited to share ADF built-in delete activity, which can be part of your ETL workflow to deletes undesired files without writing code. You can use ADF to delete folder or files from Azure Blob Storage, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, File System, FTP Server, sFTP Server, and Amazon S3.
You can find ADF delete activity under the “Move & Transform” section from the ADF UI to get started.
1. You can either choose to delete files or delete the entire folder. The deleted files and folder name can be logged in a csv file.
2. The file or folder name to be deleted can be parameterized, so that you have the flexibility to control the behavior of delete activity in your data integration flow.
3. You can delete expired files only rather than deleting all the files in one folder. For example, you may want to only delete the files which were last modified more than 30 days ago.
4. You can start from ADF template gallery to quickly deploy common use cases involving delete activity.
You are encouraged to give these additions a try and provide us with feedback. We hope you find them helpful in your scenarios. Please post your questions on Azure Data Factory forum or share your thoughts with us on Data Factory feedback site.
Quelle: Azure
Published by