Ref: 7136

Data Engineer

USA, New York

Job description

Data Engineer


Job Description

The Data Engineer builds and maintains the platform that delivers accessible data to power decision-making. The Data Engineer is focused on making it simple for end users to answer three key questions: What happened in the past, what is happening now, what will happen in the future?

Job Functions (include the following but not limited to):
  • Build and support end-user experiences for experimentation, data discovery, and business intelligence reporting.
  • Operate and manage the data platform efficiently in a consistent and reliable manner.
  • Build tools for other teams to leverage to encourage consistency and champion reliability across the platform.
  • Organising data into the data lake in highly-optimized formats for fast query processing, and maintaining the security + quality of the data-sets.
  • Build fail-proof data pipelines (ETL) to move data between different systems while maintaining integrity using Azure tools and cloud-based ETL development processes.

The Data Engineer should have experience in one or more of the following areas:
  • Excellent SQL coding experience with performance optimization for data queries.
  • Cloud infrastructure (Azure, Google Cloud, Kubernetes, Terraform).
  • Working with internals of a distributed compute engine (Spark, Presto, DBT, or Flink/Beam)
  • Security products and methods (Apache Ranger, Apache Know, OAuth, IAM, Kerberos)
  • Experience connecting to varied data sources, on-prem and cloud.
  • Deploying and scaling ML solutions using open-source frameworks (MLFlow, TFX, H2O, etc)

  • Bachelor's degree in Engineering/ Computer Science, or equivalent work experience
  • SQL: clustering, mirroring, replication, scripting, stored procedures, functions, performance tuning and trouble shooting.
  • Azure: Azure Data Factory, Azure SQL database, Azure Synapse, Databricks, Azure Analysis Services, Azure DataLake storage
  • Reporting Tools: Power BI, SSRS
  • Languages: T-SQL, Python, PowerShell, JSON, Pyspark
  • Data modelling: Tabular model, Star Schema, Solumnar modelling