

Equifax is where you can power your possible. If you want to achieve your true potential, chart new paths, develop new skills, collaborate with bright minds, and make a meaningful impact, we want to hear from you.
Synopsis of the role
As a Lead Data Engineer, you will serve as the technical backbone of our data ecosystem, spearheading the design and implementation of high-performance data architectures using Azure Databricks and PySpark. You will be responsible for orchestrating complex, scalable ETL/ELT pipelines within Azure Data Factory, ensuring seamless data integration. By leveraging your mastery of SQL and distributed computing, you will optimize large-scale datasets to drive advanced analytics and business intelligence initiatives.
What you’ll do
What experience you need
What could set you apart
Databricks SQL & Serverless: Experience migrating traditional SQL workloads to Databricks SQL Warehouses to reduce latency and overhead.
Delta Live Tables (DLT): Proven ability to implement declarative data pipelines that handle task orchestration, monitoring, and quality constraints automatically.
Liquid Clustering: Mastery of the latest replacement for Z-Ordering to optimize data layout and query performance without manual partition management.
Terraform/Bicep: Experience deploying entire Azure environments (Resource Groups, Storage Accounts, Databricks Workspaces) using IaC to ensure environment parity across Dev, QA, and Production.
Unit Testing for Spark: Experience using frameworks like pytest or chispa to validate PySpark logic, ensuring a robust CI/CD cycle rather than "testing in production."
Unity Catalog Migration: Experience leading a migration from legacy system to Unity Catalog, including managing Identity Federation and Cross-Workspace sharing.
Fine-Grained Security: Implementation of Row-Level Security (RLS) and Column-Level Masking directly within Databricks to comply with strict privacy regulations (GDPR/CCPA).
Structured Streaming: Building production-grade, low-latency pipelines that process data in real-time or near-real-time from Azure Event Hubs or IoT Hubs.
Change Data Capture (CDC): Implementing efficient CDC patterns (using tools like Debezium or ADF's built-in CDC) to sync on-premise relational databases with the Delta Lake in near-real-time.
DBU & Cost Management: A track record of implementing Databricks Cluster Policies and tagging strategies to monitor and reduce DBU (Databricks Unit) consumption.
Optimization of ADF Triggers: Knowledge of when to use "Tumbling Window" vs. "Schedule" triggers and optimizing Integration Runtimes to minimize execution costs.
#India
We offer a hybrid work setting, comprehensive compensation and healthcare packages, attractive paid time off, and organizational growth potential through our online learning platform with guided career tracks.
Are you ready to power your possible? Apply today, and get started on a path toward an exciting new career at Equifax, where you can make a difference!
Primary Location:
IND-Bangalore-Equifax-Analytics
Function:
Function - Data and Analytics
Schedule:
Full time