Sharad Sharma

About

Pipelines at scale,
reliability by design

A Senior Data Engineer with 9+ years of hands-on experience building highly scalable, distributed big data applications. I design enterprise-grade platforms end to end — from Bronze-Silver-Gold pipeline architectures to real-time ETL monitoring — with deep expertise in PySpark, Airflow, and AWS (EMR, Redshift, S3).

Previously, I built the Data Science Enablement Platform (DEEP) at Optum/UnitedHealth Group, processing 5+ TB daily across a 100 TB cluster with 750+ nodes. At TCS, I created the ExaLogs analytics platform — an org-wide monitoring dashboard that won the Applause Award.

High-Scale Pipelines

34 Airflow DAGs managing 45B+ records across global markets

Cost Optimization

71% infrastructure cost reduction through AWS-native migration

Performance

54% Redshift query latency reduction via archival framework

Experience

Where I've made an impact

Data Engineer II

Amazon India (Fintech)

Gurugram, Haryana

Project: Fintech TDW • Team Size: 11

Apr 2022 — Present

Own the tax data platform for AWS-Global (15B+ records/month, $12B+ revenue, 15 seller-of-record countries) and US-Retail (30B+ records/month, $45B+ revenue, ~1B records/day at peak)
Designed a Bronze → Silver → Gold pipeline architecture with backfill support and data completeness scoring; built the full stack independently from ingestion through reporting
Manage 34 Airflow DAGs — 18 for Global (including 8 reconciliation DAGs with business rules) and 16 for US-Retail; handle revenue/tax reconciliation across OFA (AP, AR, GL), ATARs, and TAR
Coordinate with multiple AWS accounts to establish internal connections for cross-account data processing
Migrated legacy reporting to native AWS (Airflow + EMR + Redshift), improving monthly SLA from Day 8 to Day 5 and cutting infra costs by 71%
Built an archival framework that reduced Redshift query latency by 54%; delivered 9 QuickSight dashboards and real-time ETL monitoring, saving compliance teams ~1,200 hours/year
Enforced security best practices: Secrets Manager credential rotation, least-privilege IAM policies, restricted prod console access, and CloudTrail logging for all manual interventions

PySpark Airflow EMR Redshift S3 Athena Glue Lambda CloudWatch QuickSight Lake Formation CDK

Senior Data Engineer

To The New Pvt Ltd

Noida, Uttar Pradesh

Jan 2022 — Apr 2022

Worked with an auction client on trade-analysis workflows across multiple technology stacks
Created initial onboarding documentation — was the first engineer from To The New assigned to this client
Drove requirements gathering, planning, and a 6-month delivery roadmap single-handedly in collaboration with the customer

Airflow DBT SQL Spark SQL

Associate Data Analyst (Data Engineer)

Optum Global Solution (UnitedHealth Group)

Noida, Uttar Pradesh

Project: Data Science Enablement Platform (DEEP) • Team Size: 24

Aug 2020 — Jan 2022

Built data pipelines to ingest data from multiple sources (DB2, Oracle, SQL Server, PostgreSQL, CSV/Excel) into the Big Data platform (HDFS/Hive)
Developed a complete end-to-end automation framework for Analytics, Data Science, Hive, Python, and Java processes — scheduled execution with QC reports and automated mailers
Managed 50+ processes and 30+ ingestion jobs on scheduled frequencies; built a single-screen monitoring framework with daily summary reports
Applied Hive optimization techniques for faster query execution across the platform
Built a Prod-to-NonProd data transfer pipeline with quality-check reports and automated notifications
Created trigger-based scripts for on-demand job execution using Shell Scripting
Cluster specs: 100 TB allocated, 750+ data nodes, 120 V-Cores, 2 edge nodes, Oozie scheduler. Daily ingestion: 5+ TB

Hive Sqoop Spark Python R SAS Shell Scripting Oozie Java

System Engineer

Tata Consultancy Services Limited

Thane, Maharashtra

Project: TCS Analytics (ExaLogs) • Team Size: 15 • Role: Hadoop Developer

Sep 2016 — Apr 2020

Wrote and maintained Hive UDFs and UDAFs for complex business-logic transformations
Worked across the full Hadoop ecosystem: Sqoop, Hive, Pig, Phoenix, HBase, HDFS, MapReduce, and Flume
Developed an analytics application with Angular Charts and D3.js on the frontend and Java + PostgreSQL on the backend — adopted organization-wide for production deployment tracking
Led a team of 5 associates working on similar technologies
Implemented Oozie as the automation/scheduling tool for Hadoop — first adoption within the team
Created flowcharts, solution documents, and communication frameworks for stakeholder updates
Used JUnit for unit testing with Cobertura plugin for code coverage reporting

Hadoop Hive Sqoop Pig Phoenix HBase MapReduce Flume Angular D3.js Java Spring PostgreSQL jQuery Shell JUnit

Projects

Key projects

Amazon Fintech 2022 — Present

Fintech TDW

Tax Data Warehouse powering revenue reconciliation across AWS-Global and US-Retail markets.

45B+ Records/Month

71% Cost Reduction

34 Airflow DAGs

PySpark • Airflow • EMR • Redshift • QuickSight

Optum / UHG 2020 — 2022

DEEP — Data Science Enablement Platform

End-to-end automation framework for analytics and data science workflows on a 100 TB Hadoop cluster.

5+ TB Daily Ingestion

750+ Data Nodes

50+ Pipelines

Hive • Sqoop • Spark • Python • Oozie

TCS 2016 — 2020

ExaLogs — TCS Analytics Platform

Full-stack analytics dashboard adopted organization-wide for production deployment tracking. Won Applause Award.

Org-wide Adoption

15 Team Size

5 Direct Reports

Hadoop • Angular • D3.js • Java • PostgreSQL

Pipelines at scale,
reliability by design

High-Scale Pipelines

Cost Optimization

Performance

Where I've made an impact

Data Engineer II

Senior Data Engineer

Associate Data Analyst (Data Engineer)

System Engineer

Key projects

Fintech TDW

DEEP — Data Science Enablement Platform

ExaLogs — TCS Analytics Platform

Technical expertise

Languages

Big Data & Processing

AWS Cloud

Orchestration & Modeling

Databases & Tools

Frontend & Visualization

B.Tech, Computer Science

Certifications & Awards

Get in touch

Sharad Sharma

Pipelines at scale,reliability by design

High-Scale Pipelines

Cost Optimization

Performance

Where I've made an impact

Data Engineer II

Senior Data Engineer

Associate Data Analyst (Data Engineer)

System Engineer

Key projects

Fintech TDW

DEEP — Data Science Enablement Platform

ExaLogs — TCS Analytics Platform

Technical expertise

Languages

Big Data & Processing

AWS Cloud

Orchestration & Modeling

Databases & Tools

Frontend & Visualization

B.Tech, Computer Science

Certifications & Awards

Get in touch

Pipelines at scale,
reliability by design