Data Engineer II @ Amazon • Gurgaon, India

Sharad Sharma

I build large-scale data platforms that power billion-dollar decisions. Currently owning tax data pipelines processing 45B+ records/month across US and Global markets at Amazon Fintech.

9+ Years Experience
45B+ Records/Month
$57B+ Revenue Supported

Pipelines at scale,
reliability by design

A Senior Data Engineer with 9+ years of hands-on experience building highly scalable, distributed big data applications. I design enterprise-grade platforms end to end — from Bronze-Silver-Gold pipeline architectures to real-time ETL monitoring — with deep expertise in PySpark, Airflow, and AWS (EMR, Redshift, S3).

Previously, I built the Data Science Enablement Platform (DEEP) at Optum/UnitedHealth Group, processing 5+ TB daily across a 100 TB cluster with 750+ nodes. At TCS, I created the ExaLogs analytics platform — an org-wide monitoring dashboard that won the Applause Award.

High-Scale Pipelines

34 Airflow DAGs managing 45B+ records across global markets

Cost Optimization

71% infrastructure cost reduction through AWS-native migration

Performance

54% Redshift query latency reduction via archival framework

Where I've made an impact

Data Engineer II

Amazon India (Fintech)

Gurugram, Haryana

Project: Fintech TDW • Team Size: 11

Apr 2022 — Present
  • Own the tax data platform for AWS-Global (15B+ records/month, $12B+ revenue, 15 seller-of-record countries) and US-Retail (30B+ records/month, $45B+ revenue, ~1B records/day at peak)
  • Designed a Bronze → Silver → Gold pipeline architecture with backfill support and data completeness scoring; built the full stack independently from ingestion through reporting
  • Manage 34 Airflow DAGs — 18 for Global (including 8 reconciliation DAGs with business rules) and 16 for US-Retail; handle revenue/tax reconciliation across OFA (AP, AR, GL), ATARs, and TAR
  • Coordinate with multiple AWS accounts to establish internal connections for cross-account data processing
  • Migrated legacy reporting to native AWS (Airflow + EMR + Redshift), improving monthly SLA from Day 8 to Day 5 and cutting infra costs by 71%
  • Built an archival framework that reduced Redshift query latency by 54%; delivered 9 QuickSight dashboards and real-time ETL monitoring, saving compliance teams ~1,200 hours/year
  • Enforced security best practices: Secrets Manager credential rotation, least-privilege IAM policies, restricted prod console access, and CloudTrail logging for all manual interventions
PySpark Airflow EMR Redshift S3 Athena Glue Lambda CloudWatch QuickSight Lake Formation CDK

Senior Data Engineer

To The New Pvt Ltd

Noida, Uttar Pradesh

Jan 2022 — Apr 2022
  • Worked with an auction client on trade-analysis workflows across multiple technology stacks
  • Created initial onboarding documentation — was the first engineer from To The New assigned to this client
  • Drove requirements gathering, planning, and a 6-month delivery roadmap single-handedly in collaboration with the customer
Airflow DBT SQL Spark SQL

Associate Data Analyst (Data Engineer)

Optum Global Solution (UnitedHealth Group)

Noida, Uttar Pradesh

Project: Data Science Enablement Platform (DEEP) • Team Size: 24

Aug 2020 — Jan 2022
  • Built data pipelines to ingest data from multiple sources (DB2, Oracle, SQL Server, PostgreSQL, CSV/Excel) into the Big Data platform (HDFS/Hive)
  • Developed a complete end-to-end automation framework for Analytics, Data Science, Hive, Python, and Java processes — scheduled execution with QC reports and automated mailers
  • Managed 50+ processes and 30+ ingestion jobs on scheduled frequencies; built a single-screen monitoring framework with daily summary reports
  • Applied Hive optimization techniques for faster query execution across the platform
  • Built a Prod-to-NonProd data transfer pipeline with quality-check reports and automated notifications
  • Created trigger-based scripts for on-demand job execution using Shell Scripting
  • Cluster specs: 100 TB allocated, 750+ data nodes, 120 V-Cores, 2 edge nodes, Oozie scheduler. Daily ingestion: 5+ TB
Hive Sqoop Spark Python R SAS Shell Scripting Oozie Java

System Engineer

Tata Consultancy Services Limited

Thane, Maharashtra

Project: TCS Analytics (ExaLogs) • Team Size: 15 • Role: Hadoop Developer

Sep 2016 — Apr 2020
  • Wrote and maintained Hive UDFs and UDAFs for complex business-logic transformations
  • Worked across the full Hadoop ecosystem: Sqoop, Hive, Pig, Phoenix, HBase, HDFS, MapReduce, and Flume
  • Developed an analytics application with Angular Charts and D3.js on the frontend and Java + PostgreSQL on the backend — adopted organization-wide for production deployment tracking
  • Led a team of 5 associates working on similar technologies
  • Implemented Oozie as the automation/scheduling tool for Hadoop — first adoption within the team
  • Created flowcharts, solution documents, and communication frameworks for stakeholder updates
  • Used JUnit for unit testing with Cobertura plugin for code coverage reporting
Hadoop Hive Sqoop Pig Phoenix HBase MapReduce Flume Angular D3.js Java Spring PostgreSQL jQuery Shell JUnit

Key projects

Technical expertise

Languages

Python SQL Java Scala Shell Scripting PySpark

Big Data & Processing

Apache Spark Hadoop Hive Sqoop MapReduce

AWS Cloud

EMR S3 Redshift Glue Athena Lambda CloudWatch QuickSight Lake Formation CDK IAM

Orchestration & Modeling

Apache Airflow DBT Oozie Dimensional Modeling SCD Data Warehouse Design

Databases & Tools

PostgreSQL Oracle HBase Git CI/CD

Frontend & Visualization

Angular D3.js HTML/CSS jQuery

B.Tech, Computer Science

G.L. Bajaj Institute of Technology and Management

Greater Noida, UP — 2016 • 78%

Certifications & Awards

  • Lean Six Sigma Green Belt
  • Apache Spark with Scala — Udemy
  • Apache Airflow (Hands-On) — Udemy
  • Developer of the Month — Optum/UHG
  • Applause Award — TCS

Get in touch

Happy to chat about data engineering, pipeline architecture, or opportunities.

Gurgaon, India