Senior Data Engineer
- Python
- Docker
- Cloud
- GCP
Skill Required :
Data Engineer :
- Strong on programming languages like Python, Java
- Must have one cloud hands-on experience (GCP preferred)
- Must have: Experience working with Dockers
- Must have: Environments managing (e.g venv, pip, poetry, etc.)
- Must have: Experience with orchestrators like Vertex AI pipelines, Airflow, etc
- Must have: Data engineering, Feature Engineering techniques
- Proficient in either Apache Spark or Apache Beam or Apache Flink
- Must have: Advance SQL knowledge
- Must be aware of Streaming concepts like Windowing , Late arrival , Triggers etc
- Should have hands-on experience on Distributed computing
- Should have working experience on Data Architecture design
- Should be aware of storage and compute options and when to choose what
- Should have good understanding on Cluster Optimisation/ Pipeline Optimisation strategies
- Should have exposure on GCP tools to develop end to end data pipeline for various scenarios (including ingesting data from traditional data bases as well as integration of API based data sources).
- Should have Business mindset to understand data and how it will be used for BI and Analytics purposes.
- Should have working experience on CI/CD pipelines, Deployment methodologies, Infrastructure as a code (eg. Terraform)
- Good to have, Hands-on experience on Kubernetes
- Good to have Vector based Database like Qdrant
Experience in Working with GCP tools like:
Storage : CloudSQL , Cloud Storage, Cloud Bigtable, Bigquery, Cloud Spanner, Cloud DataStore, Vector database
Ingest : Pub/Sub, Cloud Functions, AppEngine, Kubernetes Engine, Kafka, Micro services
Schedule : Cloud Composer, Airflow
Processing: Cloud Dataproc, Cloud Dataflow, Apache Spark, Apache Flink
CI/CD : Bitbucket+Jenkinjs / Gitlab ,Infrastructre as a tool : Terraform