Profile Picture

Osmin Larreynaga

Data Engineer

Passionate Data Engineer building scalable data pipelines and turning raw data into actionable insights. I thrive on solving complex challenges with cutting-edge tech and collaborative teams.

"Bridging the gap between data and decision-making."

My Skills

Here are some of the technologies I specialize in, showcased in a dynamic carousel.

Python logo

Python

SQL logo

SQL

BigQuery logo

BigQuery

GCP logo

GCP

Terraform logo

Terraform

Airflow logo

Airflow

GitHub Actions logo

GitHub Actions

JavaScript logo

JavaScript

Docker logo

Docker

Python logo

Python

SQL logo

SQL

BigQuery logo

BigQuery

GCP logo

GCP

Terraform logo

Terraform

Airflow logo

Airflow

GitHub Actions logo

GitHub Actions

JavaScript logo

JavaScript

Docker logo

Docker

Python logo

Python

SQL logo

SQL

BigQuery logo

BigQuery

GCP logo

GCP

Terraform logo

Terraform

Airflow logo

Airflow

GitHub Actions logo

GitHub Actions

JavaScript logo

JavaScript

Docker logo

Docker

Featured Projects

Here are some of the projects I've worked on. You can filter them by technology.

Real-time Analytics Dashboard
Real-time Analytics Dashboard
A dashboard for visualizing streaming data, providing real-time insights into user engagement and system performance. Built with a scalable architecture to handle high-throughput data streams.
Python
Spark Streaming
Kafka
React
ETL Pipeline for E-commerce
ETL Pipeline for E-commerce
Developed a robust ETL pipeline to process daily sales data from multiple sources into a centralized data warehouse, enabling business intelligence and reporting.
Airflow
BigQuery
SQL
GCP
Customer Churn Prediction
Customer Churn Prediction
A machine learning model to predict customer churn, allowing for proactive retention strategies. The model was deployed as a service for easy integration with existing systems.
Python
scikit-learn
Flask
Docker
Real-time Analytics Dashboard
Real-time Analytics Dashboard
A dashboard for visualizing streaming data, providing real-time insights into user engagement and system performance. Built with a scalable architecture to handle high-throughput data streams.
Python
Spark Streaming
Kafka
React
ETL Pipeline for E-commerce
ETL Pipeline for E-commerce
Developed a robust ETL pipeline to process daily sales data from multiple sources into a centralized data warehouse, enabling business intelligence and reporting.
Airflow
BigQuery
SQL
GCP
Customer Churn Prediction
Customer Churn Prediction
A machine learning model to predict customer churn, allowing for proactive retention strategies. The model was deployed as a service for easy integration with existing systems.
Python
scikit-learn
Flask
Docker

Articles & Insights

A collection of my own articles and recommended readings that have influenced my work.

My First Year as a Data Engineer at a Startup
My Article
John Doe on Medium

A reflection on the challenges and triumphs of building data infrastructure from the ground up in a fast-paced environment.

The Unreasonable Effectiveness of Recurrent Neural Networks
Andrej Karpathy Blog

A deep dive into the power of RNNs, showcasing their ability to generate text, from Shakespeare to source code.

Building a Cost-Effective Data Lake on GCP
My Article
John Doe on Medium

Practical tips and architectural patterns for leveraging Google Cloud Storage and BigQuery to create an affordable data lake.

Attention Is All You Need
Vaswani, et al.

The seminal paper introducing the Transformer architecture, which has become the foundation for modern NLP models like BERT and GPT.

The Illustrated Transformer
Jay Alammar

A visual and intuitive explanation of the Transformer model, making complex concepts accessible to a broader audience.

Designing Data-Intensive Applications
Martin Kleppmann

A foundational book covering the principles of data systems, from databases and stream processors to distributed systems.

My First Year as a Data Engineer at a Startup
My Article
John Doe on Medium

A reflection on the challenges and triumphs of building data infrastructure from the ground up in a fast-paced environment.

The Unreasonable Effectiveness of Recurrent Neural Networks
Andrej Karpathy Blog

A deep dive into the power of RNNs, showcasing their ability to generate text, from Shakespeare to source code.

Building a Cost-Effective Data Lake on GCP
My Article
John Doe on Medium

Practical tips and architectural patterns for leveraging Google Cloud Storage and BigQuery to create an affordable data lake.

Attention Is All You Need
Vaswani, et al.

The seminal paper introducing the Transformer architecture, which has become the foundation for modern NLP models like BERT and GPT.

The Illustrated Transformer
Jay Alammar

A visual and intuitive explanation of the Transformer model, making complex concepts accessible to a broader audience.

Designing Data-Intensive Applications
Martin Kleppmann

A foundational book covering the principles of data systems, from databases and stream processors to distributed systems.

Contact Me

Have a project in mind or just want to say hi? Feel free to send me a message.

Or find me on: