Career Profile
who am I? I am a solution-oriented Data professional with strong analytical and numerical skills. I have ample experience analyzing large datasets and using Big Data frameworks. I have a broad knowledge of IT and data engineering, and a solid background in machine learning and econometrics. Furthermore, I am a net worker capable of working independently. Critical and innovative. I have an eye for detail, but I keep focused on the larger picture. My reports are clear and comprehensive. Results focused. I take responsibility and I keep the interests of all stakeholders in mind.
Experience
I am the technical lead of a team of specialized data engineers. We create a Kimball-style data lakehouse for Shell’s HR department. Shell uses Databricks on Azure. I use DBT and pySpark to implement Databricks workflows. I test these workflows using DBT, pytest and great expectations. I use GitHub Actions to implement CI/CD.
I migrated legacy SqlServer solutions to the cloud making extensive use of Python, pySpark, sparkSql, Databricks, Airflow, Kubernetes, Snowflake, Sqlserver, Teradata and related technology. I developed an integration testing framework for Nike’s custom data pipeline library.
I designed the data science capability for APG’s conversation platform, which will serve conversations with nearly 5 million participants. I worked on an Azure Platform, and I used Databricks, pySpark, Mlflow, BentoML, Docker.
I was responsible for developing data pipelines, which I implemented on an AWS k8s cluster. I designed a data monitoring system, which I implemented on top of Elasticsearch.
Extracting source data from a MySQL database I created Machine-Learning models. I made predictions and bootstrap prediction intervals.
I led the development of an automated workflow system for satellite remote sensing. Liaising between two development teams, I designed and implemented solutions using Spark, Yarn, Docker, Linux, Airflow, Elasticsearch, PostgreSQL, Kafka, Nifi, Dask, Python, Pandas
Python, R, Timeseries modeling (Arima/ETS/Fbprophet/LSTM neural networks), NeuralNet/ElasticNet/MARS/Poisson, Clustering techniques and Cross validation. I intensively used Python libraries like Pandas, Numpy, Sklearn, Keras/TensorFlow, R2py
I analyzed large data sets consisting of financial transactions, using Oracle, Hadoop, PySpark, Python, Kafka, Hive and SAS.
I guided a team of data scientists with advanced methods, such as Random Forests, experimental design and calibration. I used SAS to implement these methods.