You're seeing this page as if you were . The main menu is still yours, though. Exit from immersion
Furcy PinFP

Furcy Pin

Data Engineer / Architect - Spark and SQL expert

1 000 €/jour
1 projet
Paris, FR
8-15 ans

Délai de réponse moyen : 1h

À propos de Furcy

I have more than 10 years of experience in Data Engineering. 8 years with Apache Spark.

I love sharing my knowledge, working on open source software, and writing blog articles.
Medium: medium.com/@pin.furcy
  • Français

    Bilingue ou natif

  • Anglais

    Capacité professionnelle complète

Accepte de travailler sur site
Paris (jusqu’à 20 km)

Expériences

  • Younited Credit
    Lead Data Engineer
    BANQUE & ASSURANCES
    juin 2018 - septembre 2022 (4 ans et 3 mois)
    Paris, France
    - Led a central team of 4 Data Engineers, in charge of the Data Platform, Data Lake, Data Warehouse and Customer Data Platform. We build and maintained all data pipelines from data ingestion to modelisation for all the functional scopes of the company.

    - Set up the new architecture based on Engineering best practices, using CI/CD deployment, Apache Airflow, and a tool similar to dbt but for PySpark that I developed internally, with automatic non-regression capabilities. After two years on the new architecture, our 4-people team had developed more than 1500 Spark jobs that ran every night. In 2021, the team split and I then led a 3-people team which was in charge of the Data Platform and tooling, while the other team was more focused on data ingestion and transformation.

    - Developed an internal data quality tool and dashboard similar to Great Expectations.

    - Performed a migration of the data storage from Azure Blob Storage to Azure Data Lake Gen2, and reorganized the platform architecture to have pre-production and sandbox environment and increase the platform’s security.

    - Supervised the migration of our production Airflow deployment to Cloud Composer.

    - Started and animated a Data Architecture Committee, to discuss and propagate best practices between Data Engineers, Data Analyst, DataSecOps, Data Scientists and ML Engineers.

    - Active member of the Security Community, raised the alarm about Log4Shell and let the resolution on the Data side. I also wrote an anonymization specification in accordance with the DPO for GDPR compliance.
    Tech Lead Data Engineer SQL Big Query PySpark Python DBT Google cloud Microsoft Azure Securité informatique Terraform DevOps
  • Criteo
    Software Engineer R&D
    HIGH TECH
    décembre 2017 - juin 2018 (6 mois)
    Paris, France
    - Member of the team that maintained and optimized a distributed implementation of the Louvain algorithm for graph community detection, using Spark (Scala). This algorithm ran twice a day on a graph containing more than a billion nodes and edges, it took several hours to run and required very careful optimization.

    - Maintained an in-house ingestion tool that performed data ingestion from S3 to HDFS. Found that it was losing up to 10% of the incoming data, and fixed it.

    - Implemented monitoring by adding support for Prometheus endpoints to the company’s in-house scheduler, and implemented alerts.

    - Contributed to two internal Scala formations (3 days each), and gave courses during these formation.

    - Coached the team in charge of HDFS (composed of devops and SREs) to help them learn Scala and build a tool using the Play framework, that would help them automate cluster backups.
    Scala Spark Hadoop
  • Flaminem
    CTO & Cofounder
    HIGH TECH
    avril 2013 - octobre 2017 (4 ans et 6 mois)
    - Hired and Managed a team of 5 developers, and formed them to use Hadoop, Spark and Scala.

    - Responsible of technical recruitment and technology intelligence.

    - Installed and maintained a Hadoop cluster in a secure environment (VPN). Installed and maintained various additional services like Spark and Presto, via Ansible rules.

    - Developed an ingestion and data cleaning tool (successfully used to ingest 5 TB of 1st party data from a large global Insurance company) on MapReduce, and ported it to Spark.

    - Developed 15+ UDFs, (including UDAFs and UDTFs) for Hive (used in production).

    - Creator and main developer of Flamy, an open-source tool to help organizing, validating, testing and running Hive queries, and to ease the administration of Hive databases: https://github.com/flaminem/flamy
    (Used for quality control of more than a hundred tables containing more than 100TB of data. Used in production for regular execution of several complex workflows including 10+ steps and several hours, and used for database migration and refactoring).

    - Implemented and optimized a new connected component algorithm in Spark (worked on a 300M node graph in ~10 minutes with ~100 CPUs. Spark’s GraphX implementation took a few hours), running in O(log(d)) round, and an order of magnitude faster than Hash-to-All and Hash-to-Min on specific graphs.

    - Handled several Hadoop cluster migrations.
    Tech Lead Scala Spark Hadoop Hive Terraform Ansible

Recommandations

Soyez le premier à recommander Furcy

Contribuez à la réussite de ce freelance en partageant votre expérience de collaboration avec lui.

Ces profils de freelance correspondent également à vos critères

AgathaA

Agatha Frydrych

Backend Java Software Engineer

4.7

(3)

2

BaptisteB

Baptiste Duhen

Fullstack developer

4.6

(4)

5

AmedA

Amed Hamou

Senior Lead Developer

4

(2)

7

AudreyA

Audrey Champion

Web developer

4.3

(3)

4

Formations

  • Master of Science in Computer Science
    Ecole Normale Supérieure
    2011
    Master's degree, Computer Science

Compétences

Catégories