Table of Contents

🧱 Project Structure Overview

This document outlines a domain-driven, modular folder structure designed for scalable and maintainable data pipeline and ML projects. We borrow ideas and concepts from Domain Driven Design (DDD) and apply them to a typical data science based project structure.

📂 Directory Structure

domains/
├── models/
│   ├── model_a/
│   │   ├── config.py
│   │   ├── datasets.py
│   │   ├── features.py
│   │   ├── model.py
│   │   └── aggregator.py
│   ├── model_b/
│   └── model_c/
│
├── engine/
│   ├── execution_logic/
│   └── decision_systems/
│
└── data/
    ├── data_source_a/
    ├── data_source_b/
    └── data_source_c/
        ├── config.py
        └── datasets.py

📦 `shared/`

shared/
├── config/
│   └── models/
│       └── model_config_schema.py
│
├── features/
│   └── static_attributes.py
│
├── transforms/
│   └── geo_transforms.py
│
├── utils/
│   └── ... (generic helpers: math, time, validation, etc.)
│
├── data_loader.py
└── config_builder.py

shared/ contains truly cross-domain code: reusable components, shared schemas, and general-purpose logic.
If a module has domain-specific meaning, it belongs in the appropriate domains/ folder instead.

⚙️ `tasks/`

tasks/
└── ... (entry-point task functions for orchestration)

Top-level pipeline functions — these may be executed via orchestration platforms (e.g. Airflow, Databricks), CLI, or other external triggers.

📜 `configs/`

configs/
├── shared/
│   └── test_config.yml
│
└── domain/
    ├── models/
    │   └── model_a/
    │       └── config.yml
    │
    ├── engine/
    │   └── decision_systems/
    │       └── config.yml
    │
    └── data/
        └── ... (YAML configs for data processing jobs)

Configuration files are stored separately from code. They are typically parsed by domain config classes and contain things like dataset paths, model parameters, and job settings.

🧠 Design Principles

Principle	Description
Domain-Driven	Group logic by business/domain responsibility, not technical layer
Cohesion Over Reuse	Keep related logic together — avoid premature abstraction
Shared for Stability	Use `shared/` only for stable, cross-domain components
Config is Composed	Domain config models can use shared schemas to reduce duplication
Extensibility First	Domains can extend or override shared logic when needed
Tasks Are Thin	Orchestration functions should assemble domain logic, not implement it

✅ Best Practices

Keep domain-specific components self-contained
Use shared config schemas (e.g., model hyperparameters) to avoid duplication
Avoid coupling domains through shared logic unless it is stable and intentional
Compose and extend rather than duplicate logic when variations are needed

Applying Domain-Driven Design to Machine Learning Codebases

🧱 Project Structure Overview

📂 Directory Structure

📦 `shared/`

⚙️ `tasks/`

📜 `configs/`

🧠 Design Principles

✅ Best Practices

More posts

Applying Domain-Driven Design to Machine Learning Codebases

Get your pre-commits setup right from the start

Git workflows cheatsheet

Setting Up Your Python Project with Great Developer Tooling

Applying Domain-Driven Design to Machine Learning Codebases

🧱 Project Structure Overview

📂 Directory Structure

📦 shared/

⚙️ tasks/

📜 configs/

🧠 Design Principles

✅ Best Practices

More posts

Applying Domain-Driven Design to Machine Learning Codebases

Get your pre-commits setup right from the start

Git workflows cheatsheet

Setting Up Your Python Project with Great Developer Tooling

📦 `shared/`

⚙️ `tasks/`

📜 `configs/`