Saisree
Company Background
Global Freight Forwarders is a logistics company responsible for managing international shipping operations across multiple regions. The company generates daily logistics events such as shipment status updates, carrier movements, and delivery confirmations in JSON format.
Problem Statement
The organization required an efficient and scalable way to ingest logistics data incrementally into Microsoft Fabric without reprocessing historical files. Full reloads were costly, slow, and prone to duplication. Challenges: Full data reloads consumed excessive compute resources Historical files were reprocessed unnecessarily • Data duplication occurred frequently Pipeline performance degraded as data volume grew
Objectives
• Implement watermark-based incremental ingestion
• Avoid reprocessing historical files
• Improve pipeline performance and reliability
• Maintain data consistency with Delta tables
Design
• Source JSON files in Lakehouse Files
• Watermark control table tracks last processed timestamp
• Lookup activity reads the current watermark value
• Copy Data activity filters and loads only new record
•Notebook activity updates watermark after successful load 6. Bronze_ShippingLogs Delta table stores the ingested data

Execuation
Watermark Table Setup
What was implemented:
-
A Watermark Control Table was created using Spark SQL to store the last successfully processed timestamp for each data source.
-
Purpose:
To enable controlled and reliable incremental data ingestion.
Highlights:
-
Stores Title Name and Watermark Value columns
-
Initialized with a baseline timestamp
-
Supports tracking across multiple source entities.
Lookup & Copy Activity Configuration
What was implemented:
A Lookup activity was configured to fetch the current watermark value, followed by a Copy Data activity to perform incremental loading.
Configuration details:
-
Lookup retrieves the latest watermark from the control table
-
Copy activity applies filter: LogTimestamp > WatermarkValue
-
Incremental records loaded into Bronze_ShippingLog
Result:
Only newly generated records are processed during each pipeline run.
Incremental Load Validation
What was implemented:
The Bronze_ShippingLogs table was queried post-execution to validate the incremental load behavior.
Validation outcome:
-
Only new records were ingested
-
No duplication from previous runs
-
Data consistency and integrity preserved
Watermark Update Notebook Setup
What was implemented:
A Notebook activity was added to update the watermark table after a successful data load.
Configuration details:
-
Workspace: Project Workspace – Group B
-
Notebook: Update_Watermark
-
Triggered only after Copy activity success
Result:
Watermark values are updated automatically after each successful run.
Pipeline Timestamp Parameterization
What was implemented:
The pipeline trigger time was passed as a parameter to the notebook.
Parameter details:
-
Name: pipelinetimestamp_va
-
Type: String
-
Value: @pipeline().TriggerTime
Result:
Notebook receives the exact execution timestamp for accurate watermark updates.
Notebook Watermark Update Logic
What was implemented:
Spark SQL logic was developed inside the notebook to update the watermark table.
Key aspects:
-
Uses parameterized SQL for security
-
Updates only the relevant Shipping Logs entry
-
Atomic operation ensures data consistency
Successful End-to-End Pipeline Execution
What was implemented:
The complete pipeline was executed and monitored for successful completion.
Outcome:
-
End-to-end incremental ingestion validated
-
Lookup, Copy, and Notebook activities executed successfully
-
Pipeline Run ID captured for monitoring and audit





