Standard Effects Analysis in Within-Subject Study Designs with SPSS and Python

On a related note, for questionnaire analysis check-out my repo under https://github.com/MohamedKari/questionnaire-analysis. Problem Setup “The purpose of a repeated measures designs is to determine the effect of different experimental conditions on a dependent variable” (Rutherford 2001, p. 61) from measures taken on the same subject under these different conditions. Let’s assume we follow a within-subject design, measuring the dependent variable “Duration” (or the mean of Duration across multiple repetitions) across 7 different participants under the the independent variables “Distance” and “Size”.
Read more

A minimum viable CI/CD chain for deploying docker-composed applications to a single remote Docker host

A minimum viable setup for deploying a set of docker-composed containers to a single Docker host in your preferred cloud through a GitHub Actions workflow. This post extends the previous one on secure APIs.

Primitives of MLOps Infrastructure

Most “frameworks for bringing ML into production” only cover part of the picture. Let’s try to take a step back and make sense of the different framework-agnostic facets of MLOps, namely Continuous Code Integration and Delivery, Continuous Data Provisioning, Continuous Training, and Continuous Delivery.

Securing a containerized Flask API with Let's Encrypt Certificates

Using Certbot, Nginx, and Flask, each running in a Docker container spun up through Docker Compose, this post shows how to serve an API over HTTPS conveniently with Let’s Encrypt certificates. Template repo available under https://github.com/MohamedKari/secure-flask-container-template.

Running an X Server with Indirect GLX Rendering on MacOS for containerized applications with GUIs

Intro For my latest research, I am looking into visual SLAM (e. g. ORB-SLAM2). Since VSLAM libraries are designated for running efficiently on embedded systems, they are generally programmed in C/C++ and designed with just Linux in mind (even though, for example, in version 2, ROS also aims for compatibility with MacOS). As a MacBook user, this becomes “interesting”. Of course, Docker makes it easy to run libraries for Linux. However, once the software also comprises graphical UIs, e.
Read more

Native Spark on Kubernetes

In this post, the different deployment alternatives of Spark on Kubernetes are evaluated. From this, I’ll outline the workflow for building and running Spark Applications as well as Spark Cluster-backed Jupyter Notebooks, both running PySpark in custom containers. It is shown how to include conda-managed Python dependencies in the image. Also, it is described how to deploy a notebook server running in Spark’s client mode to the Kubernetes cluster. Workloads use AWS S3 as the data source and sink and are observable using the Spark history server.
Read more

Reproducible ML Models using Docker

Reproducing ML models can be a pain. And this is not even talking about managing model reproducibility with different datasets, features, hyperparameters, architectures, setups, non-deterministic optimization or about model reproducibility in a production-ready setup with constantly evolving input data. No, what I am talking about is getting a model which was developed and published by a different researcher to run on your own machine. Sometimes, or more like most times, this can be a nerve-wrecking endeavor.
Read more

Remote Docker Hosts in the Cloud for Machine Learning Workflows

The problem of developing ML models on a MacBook In a recent blog post, I have argued why I think it is a good idea to develop ML models inside Docker containers. In short: reproducibility. However, if you don’t have access to a CUDA-enabled GPU, developing or even only replicating state-of-the-art deep-learning research can be close to impossible, Docker or not. All ML researchers and engineers working on a MacBook have probably been exposed to this complication.
Read more

Out-of-the-box Storage Infrastructure Alternatives for Scaled Machine Learning

The problem of storing large volumes of unstructured datasets Data preprocessing is a vital part of machine learning workflows. However, the story starts even earlier. Even before versioning or labelling data, we have to store the data we want learn from. This quickly becomes a non-trivial task in deep-learning problems where we often operate on non-tabular data such as images resulting in terabyte-scale dataset sizes such as the Waymo dataset for example.
Read more