Published Jun 3, 2026 ⦁ 5 min read
How to Structure AI Projects for Data Engineering

How to Structure AI Projects for Data Engineering

The landscape of data engineering and AI integration is rapidly evolving, and with it comes the need for more structured and efficient workflows. As professionals and aspiring specialists in this field, understanding how to design and implement frameworks that leverage AI tools effectively is critical. This article dives into a transformative approach to structuring AI projects, focusing on the concept of persistent instructions, frameworks, and leveraging tools like Claude.md files. Whether you're a data engineer or an AI enthusiast aiming to optimize your workflows, this guide offers actionable insights to elevate your expertise.

Introduction: The Challenge of Structuring AI-Driven Workflows

In the age of large language models (LLMs) and intelligent AI agents, traditional coding practices are being redefined. Tools like Claude and ChatGPT require a shift in perspective - from script-based programming to leveraging natural language and persistent frameworks. For data engineering professionals, this means not just adapting to these tools but creating strategies to optimize their capabilities.

This article breaks down the innovative framework outlined in the video, which employs a structured approach using Claude.md files. You'll learn how to create persistent instructions, design a tailored architecture, and ensure consistent and efficient project management in AI-powered workflows.

The Foundation: Persistent Instructions and Claude.md Files

Claude

At the heart of this methodology is the Claude.md file - a foundational component for structuring AI projects. Here's what you need to know:

What Is a Claude.md File?

A Claude.md file is a markdown file that serves as a persistent instruction set for AI agents. Unlike traditional prompts, which need to be redefined in every session, this file provides a baseline of information that the AI agent can reference continuously. Think of it as an operational manual for your project, ensuring the AI understands its role, the project structure, and the expectations every time it is engaged.

Why Are Persistent Instructions Important?

Persistent instructions solve a critical problem in working with AI agents: the need for repetitive re-prompting. By defining a clear framework upfront, you can:

  • Save time by avoiding redundant instructions.
  • Ensure consistency across sessions.
  • Create a centralized guide for the AI to follow, much like onboarding a new team member.

For example, instead of reminding the AI to activate a virtual environment or create a feature branch every time, these instructions are permanently embedded in the Claude.md file. This not only ensures compliance with best practices but also minimizes human error.

The APT Framework: Agents, Playbooks, and Tools

The video introduces a transformative architecture for AI projects called the APT Framework, which organizes workflows into three interconnected layers:

1. Playbooks: Instruction Manuals

  • Playbooks are detailed guides that outline step-by-step processes for specific tasks. These can be likened to Standard Operating Procedures (SOPs) in traditional workflows.
  • Example: A playbook for building a DBT (Data Build Tool) project might include instructions on project design, generating test data, and deploying the project.
  • Purpose: To provide clear, reusable documentation that both human developers and AI agents can follow.

2. Agents: The Decision-Makers

  • Agents, like Claude or other LLMs, act as intermediaries. They interpret the playbooks, understand the objectives, and decide how to accomplish tasks.
  • Role: They direct traffic between the playbooks (instructions) and tools (execution scripts), ensuring the project adheres to the defined structure.

3. Tools: The Executors

  • Tools are the actual scripts, functions, or commands that perform the tasks. These might include Python scripts, SQL files, or API calls.
  • Example: A bash script for replicating data from a source system to Snowflake may live in this layer.

How These Layers Work Together

By integrating these three layers, the APT Framework creates a cohesive workflow:

  • Playbooks define "what" needs to be done.
  • Agents decide "how" to do it.
  • Tools execute the task as instructed by the agents.

Practical Implementation: Building a Structured AI Workflow

Step 1: Define the Project Structure

The first step is to outline the architecture and create a Claude.md file. This file should include:

  • Project overview and objectives.
  • Role definitions for the AI agent.
  • Best practices (e.g., branch creation, virtual environment activation).
  • File structure and conventions.

Step 2: Create Playbooks

Develop detailed playbooks for each process within your project. For instance:

  • Data Transformation Playbook: Steps for transforming raw data into a usable format.
  • Testing Playbook: Instructions for generating and validating test data.

Step 3: Equip with Tools

Populate the tools layer with the scripts and functions needed to execute the playbooks. Ensure these scripts are well-documented and align with the conventions outlined in the Claude.md file.

Step 4: Optimize and Iterate

As the project evolves, refine the Claude.md file and playbooks to reflect changes and improvements. Treat the file as a living document that grows with your workflow.

Key Considerations for Success

Balancing Context Length

The Claude.md file should be concise to avoid overloading the AI's context capacity. Focus on essential instructions and streamline where possible.

Utilizing Sample Projects

To expedite setup, consider starting with a sample project. Analyze its structure, conventions, and documentation to create a robust foundation for your workflow.

Continuous Updates

Regularly update your framework to reflect best practices and resolve inaccuracies. This ensures the workflow remains efficient and aligned with project goals.

The Broader Implications: Why This Matters

The transition to AI-driven workflows marks a significant shift in how data engineers and AI specialists approach their work. Implementing structured frameworks like the APT Framework provides several benefits:

  • Consistency: Ensures that all team members (human or AI) follow the same guidelines.
  • Scalability: Simplifies the onboarding process for new AI tools or team members.
  • Efficiency: Reduces redundancy and streamlines project management.

By adopting these practices, data engineering professionals can position themselves at the forefront of this technological revolution.

Key Takeaways

  • Persistent Instructions Save Time: Use Claude.md files to eliminate repetitive prompts and ensure consistent workflows.
  • The APT Framework Provides Structure: Organize projects into three layers - Playbooks (instructions), Agents (decision-makers), and Tools (execution scripts).
  • Documentation Is Essential: Treat your AI workflows like a well-documented software project to ensure accuracy and compliance with best practices.
  • Optimize Context Length: Keep Claude.md files concise to maximize AI efficiency.
  • Iterate and Improve: Continuously refine your framework to adapt to evolving project needs and emerging best practices.
  • Sample Projects Accelerate Learning: Use example projects to jumpstart your framework design and align with proven conventions.

Conclusion: Building the Future of AI Workflows

As AI tools like Claude and ChatGPT become central to data engineering, the need for structured workflows is paramount. By implementing frameworks such as the APT architecture and leveraging persistent instructions through Claude.md files, professionals can unlock the full potential of AI in their projects. This approach not only enhances efficiency but also ensures scalability and consistency, paving the way for a new era of data engineering.

With these insights, you're equipped to take your AI projects to the next level. Embrace the challenge, iterate on your frameworks, and lead the charge in shaping the future of AI-powered workflows in data engineering.

Source: "How to Structure an AI Project for Data Engineering" - Kahan Data Solutions, YouTube, Apr 2, 2026 - https://www.youtube.com/watch?v=EhNrAtnuMuM