The Data Engineer
builds and maintains the platform that delivers accessible data to power decision-making. The Data Engineer
is focused on making it simple for end users to answer three key questions: What happened in the past, what is happening now, what will happen in the future?
Job Functions (include the following but not limited to):
- Build and support end-user experiences for experimentation, data discovery, and business intelligence reporting.
- Operate and manage the data platform efficiently in a consistent and reliable manner.
- Build tools for other teams to leverage to encourage consistency and champion reliability across the platform.
- Organising data into the data lake in highly-optimized formats for fast query processing, and maintaining the security + quality of the data-sets.
- Build fail-proof data pipelines (ETL) to move data between different systems while maintaining integrity using Azure tools and cloud-based ETL development processes.
The Data Engineer
should have experience in one or more of the following areas:
- Excellent SQL coding experience with performance optimization for data queries.
- Cloud infrastructure (Azure, Google Cloud, Kubernetes, Terraform).
- Working with internals of a distributed compute engine (Spark, Presto, DBT, or Flink/Beam)
- Security products and methods (Apache Ranger, Apache Know, OAuth, IAM, Kerberos)
- Experience connecting to varied data sources, on-prem and cloud.
- Deploying and scaling ML solutions using open-source frameworks (MLFlow, TFX, H2O, etc)
- Bachelor's degree in Engineering/ Computer Science, or equivalent work experience
- SQL: clustering, mirroring, replication, scripting, stored procedures, functions, performance tuning and trouble shooting.
- Azure: Azure Data Factory, Azure SQL database, Azure Synapse, Databricks, Azure Analysis Services, Azure DataLake storage
- Reporting Tools: Power BI, SSRS
- Languages: T-SQL, Python, PowerShell, JSON, Pyspark
- Data modelling: Tabular model, Star Schema, Solumnar modelling