You're seeing this page as if you were . The main menu is still yours, though. Exit from immersion
Gaëtan BervetGB

Gaëtan Bervet

Data Engineer

600 €/jour
Paris, FR
3-7 ans

Délai de réponse moyen : 1h

À propos de Gaëtan

Hi! I'm a 7 years experienced Data Engineer with a passion for designing and building robust data pipelines. I have participated in building several Data Stacks from the ground and I will be happy to leverage my experience and design yours.
I have extensive knowledge of various technologies such as Kafka, Spark, Akka, and a proven track record of delivering quality work in a timely manner. My main programming languages are Scala, Python and SQL. I have experience in working in either AWS or GCP but i also master Ansible and Terraform to deploy your components anywhere.
I'm available for freelance missions and eager to work with you to achieve your data-related goals. Let's get in touch!
  • Français

    Bilingue ou natif

  • Anglais

    Capacité professionnelle complète

  • Italien

    Capacité professionnelle limitée

  • Allemand

    Notions

Accepte de travailler sur site
Paris (jusqu’à 50 km)

Expériences

  • Ministère des Armées
    Data Engineer
    SECTEUR PUBLIC & COLLECTIVITÉS
    mars 2023 - Aujourd'hui (3 ans et 3 mois)
    Paris, France
    Classified
  • DataDome
    Data Engineer
    TÉLÉCOMMUNICATIONS
    août 2021 - avril 2022 (8 mois)
    Paris, France
    Datadome is a real-time bot detection solution that relies on a Flink engine that my
    team was taking care of. I helped build Scala and Python components that fed and
    enabled sets of rules used to decide whether or not a fingerprint was a bot or a real
    human.
    Enabling a protection across multiple time zones, the engine had to keep a low latency
    even during peak activity or ddos attacks. The key challenges were to assess the
    consistency of performance and not deteriorate it when adding features.
    Scala flink Python Ansible
  • Adot
    Data Engineer
    TÉLÉCOMMUNICATIONS
    septembre 2016 - août 2021 (4 ans et 11 mois)
    Île-de-France, France
    Following a major business deal, I successfully integrated a new data source, which
    initially posed challenges due to its scattered and decentralized nature. The data was
    sourced from various CRM tools across different parts of the organization, making it
    difficult for data analysts to extract meaningful insights.
    To address this, I designed and implemented a robust data processing system using
    Spark and Airflow. This system aggregated, cleansed, and assembled the data into
    Parquet files stored in an S3 repository. With the capability to ingest several terabytes
    of data daily, it utilized a cluster of approximately 300 cores.
    The processed data was made accessible through a Presto SQL server running on a
    dozen of nodes, resulting in a high-performance database. This database served as the
    foundation for over 100 daily analyses, catering to both clients and internal projects.
    Additionally, data scientists utilized this resource to create and train their machine
    learning models, while the production pipeline extracted subsets of data for running ad
    campaigns.
    I played a vital role in maintaining and expanding the streaming data pipeline, which
    processed more than 100 terabytes of data per day. This pipeline consumed data from
    various topics at speeds of up to 200,000 messages per second, running more than 30
    jobs that handled different data formats, including SparkSQL batches and Kafka events
    using Akka and Spark streaming.
    In addition, I provided technical support to data scientists. One notable project involved
    improving a legacy ML component responsible for predicting tags on ad events from
    the production database. We addressed challenges posed by increasing throughput
    and an aging training dataset by collaborating with data scientists. Together, we
    updated the model using TensorFlow neural networks, and I established a Scala
    service using Akka Stream to handle input streams and distribute the workload across
    a pool of TensorFlow servers. This enhancement enabled the component to handle the
    entire input stream, processing up to 10,000 events per minute with the same number
    of nodes, resulting in improved tag predictions and confidence scores. Furthermore, we
    ensured the training dataset was regularly updated.
    Additionally, I had the opportunity to mentor and onboard junior team members,
    assisting them in acquiring essential skills and best practices within the field.
    Spark Kafka Airflow Scala Python SQL AWS AWS S3 AWS EC2 Apache Parquet

Recommandations

Soyez le premier à recommander Gaëtan

Contribuez à la réussite de ce freelance en partageant votre expérience de collaboration avec lui.

Ces profils de freelance correspondent également à vos critères

AgathaA

Agatha Frydrych

Backend Java Software Engineer

4.7

(3)

2

BaptisteB

Baptiste Duhen

Fullstack developer

4.6

(4)

5

AmedA

Amed Hamou

Senior Lead Developer

4

(2)

7

AudreyA

Audrey Champion

Web developer

4.3

(3)

4

Formations

  • Computer sciences & business intelligence master student, Informatique décisonnelle
    Polytech'Nantes
    2016
    Computer sciences & business intelligence master student, Informatique décisonnelle
  • Licence préparatoire ingénieur, Mathématiques et informatique
    Université de Rennes I
    2013
    Licence préparatoire ingénieur, Mathématiques et informatique

Compétences (19)

Catégories