Description

Hi! I'm a 7 years experienced Data Engineer with a passion for designing and building robust data pipelines. I have participated in building several Data Stacks from the ground and I will be happy to leverage my experience and design yours.

I have extensive knowledge of various technologies such as Kafka, Spark, Akka, and a proven track record of delivering quality work in a timely manner. My main programming languages are Scala, Python and SQL. I have experience in working in either AWS or GCP but i also master Ansible and Terraform to deploy your components anywhere.

I'm available for freelance missions and eager to work with you to achieve your data-related goals. Let's get in touch!

Langues

Français
Bilingue ou natif
Anglais
Capacité professionnelle complète
Italien
Capacité professionnelle limitée
Allemand
Notions

Préférences en matière de lieu de travail

Accepte de travailler sur site

Paris (jusqu’à 50 km)

Ministère des Armées
Data Engineer
SECTEUR PUBLIC & COLLECTIVITÉS
mars 2023 - Aujourd'hui (3 ans et 3 mois)
Paris, France
Classified
DataDome
Data Engineer
TÉLÉCOMMUNICATIONS
août 2021 - avril 2022 (8 mois)
Paris, France
Datadome is a real-time bot detection solution that relies on a Flink engine that my
team was taking care of. I helped build Scala and Python components that fed and
enabled sets of rules used to decide whether or not a fingerprint was a bot or a real
human.
Enabling a protection across multiple time zones, the engine had to keep a low latency
even during peak activity or ddos attacks. The key challenges were to assess the
consistency of performance and not deteriorate it when adding features.
Scala flink Python Ansible
Adot
Data Engineer
TÉLÉCOMMUNICATIONS
septembre 2016 - août 2021 (4 ans et 11 mois)
&amp;Icirc;le-de-France, France
Following a major business deal, I successfully integrated a new data source, which
initially posed challenges due to its scattered and decentralized nature. The data was
sourced from various CRM tools across different parts of the organization, making it
difficult for data analysts to extract meaningful insights.
To address this, I designed and implemented a robust data processing system using
Spark and Airflow. This system aggregated, cleansed, and assembled the data into
Parquet files stored in an S3 repository. With the capability to ingest several terabytes
of data daily, it utilized a cluster of approximately 300 cores.
The processed data was made accessible through a Presto SQL server running on a
dozen of nodes, resulting in a high-performance database. This database served as the
foundation for over 100 daily analyses, catering to both clients and internal projects.
Additionally, data scientists utilized this resource to create and train their machine
learning models, while the production pipeline extracted subsets of data for running ad
campaigns.
I played a vital role in maintaining and expanding the streaming data pipeline, which
processed more than 100 terabytes of data per day. This pipeline consumed data from
various topics at speeds of up to 200,000 messages per second, running more than 30
jobs that handled different data formats, including SparkSQL batches and Kafka events
using Akka and Spark streaming.
In addition, I provided technical support to data scientists. One notable project involved
improving a legacy ML component responsible for predicting tags on ad events from
the production database. We addressed challenges posed by increasing throughput
and an aging training dataset by collaborating with data scientists. Together, we
updated the model using TensorFlow neural networks, and I established a Scala
service using Akka Stream to handle input streams and distribute the workload across
a pool of TensorFlow servers. This enhancement enabled the component to handle the
entire input stream, processing up to 10,000 events per minute with the same number
of nodes, resulting in improved tag predictions and confidence scores. Furthermore, we
ensured the training dataset was regularly updated.
Additionally, I had the opportunity to mentor and onboard junior team members,
assisting them in acquiring essential skills and best practices within the field.
Spark Kafka Airflow Scala Python SQL AWS AWS S3 AWS EC2 Apache Parquet

Consulter toutes les expériences de Gaëtan

Soyez le premier à recommander Gaëtan

Contribuez à la réussite de ce freelance en partageant votre expérience de collaboration avec lui.

Agatha Frydrych

Backend Java Software Engineer

4.7

(3)

Baptiste Duhen

Fullstack developer

4.6

(4)

Amed Hamou

Senior Lead Developer

(2)

Audrey Champion

Web developer

4.3

(3)

S’inscrire pour les voir

Computer sciences & business intelligence master student, Informatique décisonnelle
Polytech'Nantes
2016
Computer sciences & business intelligence master student, Informatique décisonnelle
Licence préparatoire ingénieur, Mathématiques et informatique
Université de Rennes I
2013
Licence préparatoire ingénieur, Mathématiques et informatique

Data Engineer

Ingénieur IA

Gaëtan Bervet

Data Engineer

À propos de Gaëtan

Expériences

Recommandations

Ces profils de freelance correspondent également à vos critères

Formations

Compétences (19)

Catégories