<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[NeoForge Labs]]></title><description><![CDATA[NeoForge Labs]]></description><link>https://blog.neoforgelabs.tech</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1761042286610/bc7899d6-1259-4ffc-bbd3-b02ee001a6ac.png</url><title>NeoForge Labs</title><link>https://blog.neoforgelabs.tech</link></image><generator>RSS for Node</generator><lastBuildDate>Thu, 23 Apr 2026 13:03:18 GMT</lastBuildDate><atom:link href="https://blog.neoforgelabs.tech/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Part 3: Counterfactual Reasoning with Causal DAGs]]></title><description><![CDATA[In Parts 1 and 2, you learned why causality matters and how to build causal DAGs. Today, we're climbing to Level 3 of Pearl's Ladder: Counterfactual Reasoning.
This is the most powerful form of causal AI—reasoning about alternate realities and answer...]]></description><link>https://blog.neoforgelabs.tech/part-3-counterfactual-reasoning-with-causal-dags</link><guid isPermaLink="true">https://blog.neoforgelabs.tech/part-3-counterfactual-reasoning-with-causal-dags</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Artificial Intelligence]]></category><category><![CDATA[Developer]]></category><category><![CDATA[Data Science]]></category><dc:creator><![CDATA[Kelyn Njeri]]></dc:creator><pubDate>Fri, 16 Jan 2026 04:00:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1768256359613/366919ff-4e0c-49dc-96e1-c062e4e3ae83.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<p>In Parts 1 and 2, you learned why causality matters and how to build causal DAGs. Today, we're climbing to Level 3 of Pearl's Ladder: <strong>Counterfactual Reasoning</strong>.</p>
<p>This is the most powerful form of causal AI—reasoning about alternate realities and answering "what if" questions that standard ML can't touch.</p>
<p>By the end of this article, you'll:</p>
<ul>
<li><p>Understand what counterfactuals are and why they're powerful</p>
</li>
<li><p>Implement counterfactual inference with your DAG</p>
</li>
<li><p>Generate personalized explanations for individual cases</p>
</li>
<li><p>Estimate individual treatment effects</p>
</li>
<li><p>Build "what if" scenario analysis tools</p>
</li>
</ul>
<p>Let's reason about alternate realities.</p>
<h2 id="heading-what-are-counterfactuals"><strong>What Are Counterfactuals?</strong></h2>
<h3 id="heading-the-three-worlds-of-causality"><strong>The Three Worlds of Causality</strong></h3>
<p><strong>Factual World (What happened):</strong></p>
<ul>
<li><p>"I watered this plant heavily"</p>
</li>
<li><p>"The plant developed root rot"</p>
</li>
<li><p>This is observation—what we actually saw</p>
</li>
</ul>
<p><strong>Interventional World (What would happen):</strong></p>
<ul>
<li><p>"If I water the next plant moderately, what happens?"</p>
</li>
<li><p>"Disease probability drops to 20%"</p>
</li>
<li><p>This is prediction—what we expect in the future</p>
</li>
</ul>
<p><strong>Counterfactual World (What would have happened):</strong></p>
<ul>
<li><p>"Would THIS plant be healthy if I had watered it differently?"</p>
</li>
<li><p>"85% probability it would be healthy"</p>
</li>
<li><p>This is retrospection—alternate history for a specific instance</p>
</li>
</ul>
<h3 id="heading-why-counterfactuals-are-special"><strong>Why Counterfactuals Are Special</strong></h3>
<p>Counterfactuals require THREE pieces of information:</p>
<p><strong>1. The causal mechanism</strong> (from your DAG)</p>
<ul>
<li><p>How do variables causally relate?</p>
</li>
<li><p>Watering → Moisture → Pathogen → Disease</p>
</li>
</ul>
<p><strong>2. The specific instance</strong> (observed data)</p>
<ul>
<li><p>This plant had: high watering, high moisture, disease</p>
</li>
<li><p>Its vigor: 0.7, environmental stress: 0.4</p>
</li>
</ul>
<p><strong>3. The alternate action</strong> (the intervention)</p>
<ul>
<li><p>What if watering had been moderate instead?</p>
</li>
<li><p>How would moisture, pathogen, disease differ?</p>
</li>
</ul>
<p>Standard ML only has #2. Intervention (Level 2) has #1 and #3. Only counterfactuals combine all three.</p>
<h3 id="heading-the-counterfactual-formula"><strong>The Counterfactual Formula</strong></h3>
<p><strong>Notation:</strong> P(Y_x' | X=x, Y=y)</p>
<p><strong>Read as:</strong> "Probability of outcome Y under intervention x', given we observed X=x and Y=y in reality"</p>
<p><strong>Example:</strong></p>
<ul>
<li><p>P(Healthy | do(Watering=moderate), Watering=heavy, Diseased)</p>
</li>
<li><p>"Would plant be healthy with moderate watering, given we observed heavy watering and disease?"</p>
</li>
</ul>
<p>This is fundamentally different from:</p>
<ul>
<li><p>P(Healthy | Watering=moderate) — observational (correlation)</p>
</li>
<li><p>P(Healthy | do(Watering=moderate)) — interventional (average effect)</p>
</li>
</ul>
<p>Counterfactuals condition on BOTH the alternate intervention AND the factual observation.</p>
<hr />
<h2 id="heading-implementing-counterfactual-inference"><strong>Implementing Counterfactual Inference</strong></h2>
<h3 id="heading-the-three-steps-of-counterfactual-analysis"><strong>The Three Steps of Counterfactual Analysis</strong></h3>
<p><strong>Step 1: Abduction</strong> — Infer latent variables from observations <strong>Step 2: Action</strong> — Modify the model according to the intervention <strong>Step 3: Prediction</strong> — Compute the counterfactual outcome</p>
<p>Let's implement this with our plant disease DAG:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">from</span> dowhy <span class="hljs-keyword">import</span> CausalModel

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">CounterfactualEngine</span>:</span>
    <span class="hljs-string">"""
    Counterfactual reasoning engine for plant disease diagnosis.
    """</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, causal_dag, data</span>):</span>
        self.dag = causal_dag
        self.data = data
        self.model = self._build_model()

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_build_model</span>(<span class="hljs-params">self</span>):</span>
        <span class="hljs-string">"""Build the causal model from DAG."""</span>
        <span class="hljs-keyword">return</span> CausalModel(
            data=self.data,
            treatment=<span class="hljs-string">'leaf_moisture_hours'</span>,
            outcome=<span class="hljs-string">'symptom_severity'</span>,
            graph=self.dag,
            common_causes=[<span class="hljs-string">'environmental_stress'</span>, <span class="hljs-string">'watering_practice'</span>],
            effect_modifiers=[<span class="hljs-string">'plant_vigor'</span>]
        )

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">abduction</span>(<span class="hljs-params">self, sample_idx</span>):</span>
        <span class="hljs-string">"""
        Step 1: Infer latent variables from observed data.
        Given what we observed, what are the unobserved factors?
        """</span>
        observed = self.data.iloc[sample_idx]

        <span class="hljs-comment"># Infer noise terms (unobserved confounders)</span>
        <span class="hljs-comment"># These capture the instance-specific factors</span>
        noise_terms = {
            <span class="hljs-string">'u_moisture'</span>: self._infer_moisture_noise(observed),
            <span class="hljs-string">'u_pathogen'</span>: self._infer_pathogen_noise(observed),
            <span class="hljs-string">'u_disease'</span>: self._infer_disease_noise(observed),
            <span class="hljs-string">'u_severity'</span>: self._infer_severity_noise(observed)
        }

        <span class="hljs-keyword">return</span> observed, noise_terms

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_infer_moisture_noise</span>(<span class="hljs-params">self, obs</span>):</span>
        <span class="hljs-string">"""Infer moisture noise from observation."""</span>
        <span class="hljs-comment"># Expected moisture given inputs</span>
        expected = <span class="hljs-number">5.0</span> + obs[<span class="hljs-string">'environmental_stress'</span>] * <span class="hljs-number">10</span>
        <span class="hljs-keyword">if</span> obs[<span class="hljs-string">'watering_practice'</span>] == <span class="hljs-number">0</span>:
            expected -= <span class="hljs-number">3</span>
        <span class="hljs-keyword">elif</span> obs[<span class="hljs-string">'watering_practice'</span>] == <span class="hljs-number">2</span>:
            expected += <span class="hljs-number">5</span>

        <span class="hljs-comment"># Noise is observed - expected</span>
        <span class="hljs-keyword">return</span> obs[<span class="hljs-string">'leaf_moisture_hours'</span>] - expected

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_infer_pathogen_noise</span>(<span class="hljs-params">self, obs</span>):</span>
        <span class="hljs-string">"""Infer pathogen growth noise."""</span>
        expected = (obs[<span class="hljs-string">'leaf_moisture_hours'</span>] / <span class="hljs-number">24</span>) ** <span class="hljs-number">1.5</span>
        <span class="hljs-keyword">return</span> obs[<span class="hljs-string">'pathogen_growth'</span>] - expected

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_infer_disease_noise</span>(<span class="hljs-params">self, obs</span>):</span>
        <span class="hljs-string">"""Infer disease threshold noise."""</span>
        <span class="hljs-comment"># Binary outcome, return indicator</span>
        <span class="hljs-keyword">return</span> obs[<span class="hljs-string">'disease_present'</span>]

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_infer_severity_noise</span>(<span class="hljs-params">self, obs</span>):</span>
        <span class="hljs-string">"""Infer symptom severity noise."""</span>
        <span class="hljs-keyword">if</span> obs[<span class="hljs-string">'disease_present'</span>] == <span class="hljs-number">0</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>
        expected = obs[<span class="hljs-string">'disease_present'</span>] * (<span class="hljs-number">1</span> - obs[<span class="hljs-string">'plant_vigor'</span>] * <span class="hljs-number">0.5</span>)
        <span class="hljs-keyword">return</span> obs[<span class="hljs-string">'symptom_severity'</span>] - expected

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">action</span>(<span class="hljs-params">self, observed, noise_terms, intervention</span>):</span>
        <span class="hljs-string">"""
        Step 2: Modify model according to intervention.
        Set the treatment variable to counterfactual value.
        """</span>
        <span class="hljs-comment"># Create counterfactual data point</span>
        cf_data = observed.copy()

        <span class="hljs-comment"># Apply intervention (break incoming edges)</span>
        <span class="hljs-keyword">for</span> var, value <span class="hljs-keyword">in</span> intervention.items():
            cf_data[var] = value

        <span class="hljs-keyword">return</span> cf_data, noise_terms

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">prediction</span>(<span class="hljs-params">self, cf_data, noise_terms</span>):</span>
        <span class="hljs-string">"""
        Step 3: Compute counterfactual outcome.
        Propagate intervention through causal graph.
        """</span>
        <span class="hljs-comment"># Re-compute downstream variables with intervention</span>

        <span class="hljs-comment"># Leaf moisture (intervened, so use counterfactual value)</span>
        cf_moisture = cf_data[<span class="hljs-string">'leaf_moisture_hours'</span>]

        <span class="hljs-comment"># Pathogen growth (function of new moisture + same noise)</span>
        cf_pathogen = (cf_moisture / <span class="hljs-number">24</span>) ** <span class="hljs-number">1.5</span> + noise_terms[<span class="hljs-string">'u_pathogen'</span>]
        cf_pathogen = np.clip(cf_pathogen, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>)

        <span class="hljs-comment"># Disease (function of new pathogen + same noise threshold)</span>
        cf_disease = <span class="hljs-number">1</span> <span class="hljs-keyword">if</span> cf_pathogen &gt; <span class="hljs-number">0.6</span> <span class="hljs-keyword">else</span> <span class="hljs-number">0</span>

        <span class="hljs-comment"># Symptom severity (function of new disease + same plant vigor + same noise)</span>
        cf_severity = cf_disease * (<span class="hljs-number">1</span> - cf_data[<span class="hljs-string">'plant_vigor'</span>] * <span class="hljs-number">0.5</span>) + noise_terms[<span class="hljs-string">'u_severity'</span>]
        cf_severity = np.clip(cf_severity, <span class="hljs-number">0</span>, <span class="hljs-number">1</span>)

        <span class="hljs-keyword">return</span> {
            <span class="hljs-string">'leaf_moisture_hours'</span>: cf_moisture,
            <span class="hljs-string">'pathogen_growth'</span>: cf_pathogen,
            <span class="hljs-string">'disease_present'</span>: cf_disease,
            <span class="hljs-string">'symptom_severity'</span>: cf_severity
        }

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">counterfactual</span>(<span class="hljs-params">self, sample_idx, intervention</span>):</span>
        <span class="hljs-string">"""
        Complete counterfactual analysis.

        Args:
            sample_idx: Index of observed instance
            intervention: Dict of {variable: counterfactual_value}

        Returns:
            Dict with factual, counterfactual, and effect
        """</span>
        <span class="hljs-comment"># Step 1: Abduction</span>
        observed, noise_terms = self.abduction(sample_idx)

        <span class="hljs-comment"># Step 2: Action</span>
        cf_data, noise_terms = self.action(observed, noise_terms, intervention)

        <span class="hljs-comment"># Step 3: Prediction</span>
        cf_outcome = self.prediction(cf_data, noise_terms)

        <span class="hljs-keyword">return</span> {
            <span class="hljs-string">'factual'</span>: {
                <span class="hljs-string">'leaf_moisture_hours'</span>: observed[<span class="hljs-string">'leaf_moisture_hours'</span>],
                <span class="hljs-string">'pathogen_growth'</span>: observed[<span class="hljs-string">'pathogen_growth'</span>],
                <span class="hljs-string">'disease_present'</span>: observed[<span class="hljs-string">'disease_present'</span>],
                <span class="hljs-string">'symptom_severity'</span>: observed[<span class="hljs-string">'symptom_severity'</span>]
            },
            <span class="hljs-string">'counterfactual'</span>: cf_outcome,
            <span class="hljs-string">'individual_effect'</span>: {
                <span class="hljs-string">'disease_change'</span>: cf_outcome[<span class="hljs-string">'disease_present'</span>] - observed[<span class="hljs-string">'disease_present'</span>],
                <span class="hljs-string">'severity_change'</span>: cf_outcome[<span class="hljs-string">'symptom_severity'</span>] - observed[<span class="hljs-string">'symptom_severity'</span>]
            },
            <span class="hljs-string">'explanation'</span>: self._generate_explanation(observed, cf_outcome)
        }

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">_generate_explanation</span>(<span class="hljs-params">self, factual, counterfactual</span>):</span>
        <span class="hljs-string">"""Generate natural language explanation of counterfactual."""</span>
        explanation = []

        <span class="hljs-comment"># Compare factual vs counterfactual</span>
        <span class="hljs-keyword">if</span> factual[<span class="hljs-string">'disease_present'</span>] == <span class="hljs-number">1</span> <span class="hljs-keyword">and</span> counterfactual[<span class="hljs-string">'disease_present'</span>] == <span class="hljs-number">0</span>:
            explanation.append(
                <span class="hljs-string">f"With moderate watering (reducing moisture from <span class="hljs-subst">{factual[<span class="hljs-string">'leaf_moisture_hours'</span>]:<span class="hljs-number">.1</span>f}</span> to "</span>
                <span class="hljs-string">f"<span class="hljs-subst">{counterfactual[<span class="hljs-string">'leaf_moisture_hours'</span>]:<span class="hljs-number">.1</span>f}</span> hours), this plant would have avoided disease."</span>
            )
        <span class="hljs-keyword">elif</span> factual[<span class="hljs-string">'disease_present'</span>] == <span class="hljs-number">0</span> <span class="hljs-keyword">and</span> counterfactual[<span class="hljs-string">'disease_present'</span>] == <span class="hljs-number">1</span>:
            explanation.append(
                <span class="hljs-string">f"If watering had been excessive (increasing moisture to "</span>
                <span class="hljs-string">f"<span class="hljs-subst">{counterfactual[<span class="hljs-string">'leaf_moisture_hours'</span>]:<span class="hljs-number">.1</span>f}</span> hours), this plant would have developed disease."</span>
            )
        <span class="hljs-keyword">else</span>:
            explanation.append(
                <span class="hljs-string">f"Disease status would remain unchanged, but symptom severity would change from "</span>
                <span class="hljs-string">f"<span class="hljs-subst">{factual[<span class="hljs-string">'symptom_severity'</span>]:<span class="hljs-number">.2</span>f}</span> to <span class="hljs-subst">{counterfactual[<span class="hljs-string">'symptom_severity'</span>]:<span class="hljs-number">.2</span>f}</span>."</span>
            )

        <span class="hljs-comment"># Add mechanism</span>
        explanation.append(
            <span class="hljs-string">f"Mechanism: Moisture affects pathogen growth (<span class="hljs-subst">{factual[<span class="hljs-string">'pathogen_growth'</span>]:<span class="hljs-number">.2</span>f}</span> → "</span>
            <span class="hljs-string">f"<span class="hljs-subst">{counterfactual[<span class="hljs-string">'pathogen_growth'</span>]:<span class="hljs-number">.2</span>f}</span>), which determines disease presence."</span>
        )

        <span class="hljs-keyword">return</span> <span class="hljs-string">" "</span>.join(explanation)


<span class="hljs-comment"># Usage Example</span>
<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    <span class="hljs-comment"># Load data and DAG (from Part 2)</span>
    <span class="hljs-keyword">from</span> part2_causal_dag <span class="hljs-keyword">import</span> generate_causal_data, causal_graph

    data = generate_causal_data(n_samples=<span class="hljs-number">1000</span>)
    cf_engine = CounterfactualEngine(causal_graph, data)

    <span class="hljs-comment"># Find a diseased plant</span>
    diseased_idx = data[data[<span class="hljs-string">'disease_present'</span>] == <span class="hljs-number">1</span>].index[<span class="hljs-number">0</span>]

    print(<span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)
    print(<span class="hljs-string">"COUNTERFACTUAL ANALYSIS"</span>)
    print(<span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)

    print(<span class="hljs-string">f"\nAnalyzing Plant #<span class="hljs-subst">{diseased_idx}</span>"</span>)
    print(<span class="hljs-string">f"Factual: Watering = <span class="hljs-subst">{data.loc[diseased_idx, <span class="hljs-string">'watering_practice'</span>]}</span>"</span>)
    print(<span class="hljs-string">f"         Moisture = <span class="hljs-subst">{data.loc[diseased_idx, <span class="hljs-string">'leaf_moisture_hours'</span>]:<span class="hljs-number">.1</span>f}</span> hours"</span>)
    print(<span class="hljs-string">f"         Disease = <span class="hljs-subst">{bool(data.loc[diseased_idx, <span class="hljs-string">'disease_present'</span>])}</span>"</span>)
    print(<span class="hljs-string">f"         Severity = <span class="hljs-subst">{data.loc[diseased_idx, <span class="hljs-string">'symptom_severity'</span>]:<span class="hljs-number">.2</span>f}</span>"</span>)

    <span class="hljs-comment"># Counterfactual: What if watering was optimal?</span>
    intervention = {<span class="hljs-string">'leaf_moisture_hours'</span>: <span class="hljs-number">6.0</span>}  <span class="hljs-comment"># Optimal moisture</span>

    result = cf_engine.counterfactual(diseased_idx, intervention)

    print(<span class="hljs-string">f"\nCounterfactual: Watering = optimal"</span>)
    print(<span class="hljs-string">f"                Moisture = <span class="hljs-subst">{result[<span class="hljs-string">'counterfactual'</span>][<span class="hljs-string">'leaf_moisture_hours'</span>]:<span class="hljs-number">.1</span>f}</span> hours"</span>)
    print(<span class="hljs-string">f"                Disease = <span class="hljs-subst">{bool(result[<span class="hljs-string">'counterfactual'</span>][<span class="hljs-string">'disease_present'</span>])}</span>"</span>)
    print(<span class="hljs-string">f"                Severity = <span class="hljs-subst">{result[<span class="hljs-string">'counterfactual'</span>][<span class="hljs-string">'symptom_severity'</span>]:<span class="hljs-number">.2</span>f}</span>"</span>)

    print(<span class="hljs-string">f"\nIndividual Treatment Effect:"</span>)
    print(<span class="hljs-string">f"  Disease change: <span class="hljs-subst">{result[<span class="hljs-string">'individual_effect'</span>][<span class="hljs-string">'disease_change'</span>]}</span>"</span>)
    print(<span class="hljs-string">f"  Severity change: <span class="hljs-subst">{result[<span class="hljs-string">'individual_effect'</span>][<span class="hljs-string">'severity_change'</span>]:<span class="hljs-number">.2</span>f}</span>"</span>)

    print(<span class="hljs-string">f"\nExplanation:"</span>)
    print(<span class="hljs-string">f"  <span class="hljs-subst">{result[<span class="hljs-string">'explanation'</span>]}</span>"</span>)
</code></pre>
<h3 id="heading-output-example"><strong>Output Example</strong></h3>
<pre><code class="lang-mermaid">============================================================
COUNTERFACTUAL ANALYSIS
============================================================

Analyzing Plant #42
Factual: Watering = 2 (overwatered)
         Moisture = 18.3 hours
         Disease = True
         Severity = 0.73

Counterfactual: Watering = optimal
                Moisture = 6.0 hours
                Disease = False
                Severity = 0.00

Individual Treatment Effect:
  Disease change: -1
  Severity change: -0.73

Explanation:
  With moderate watering (reducing moisture from 18.3 to 6.0 hours), 
  this plant would have avoided disease. Mechanism: Moisture affects 
  pathogen growth (0.82 → 0.31), which determines disease presence.
</code></pre>
<hr />
<h2 id="heading-applications-of-counterfactual-reasoning"><strong>Applications of Counterfactual Reasoning</strong></h2>
<h3 id="heading-1-personalized-recommendations"><strong>1. Personalized Recommendations</strong></h3>
<p><strong>Standard ML:</strong> "Plants with disease X should receive treatment Y" (average effect)</p>
<p><strong>Counterfactual AI:</strong> "THIS plant would benefit most from intervention Z" (personalized)</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">recommend_intervention</span>(<span class="hljs-params">cf_engine, plant_idx</span>):</span>
    <span class="hljs-string">"""
    Find optimal intervention for specific plant.
    """</span>
    <span class="hljs-comment"># Test multiple interventions</span>
    interventions = {
        <span class="hljs-string">'reduce_watering'</span>: {<span class="hljs-string">'leaf_moisture_hours'</span>: <span class="hljs-number">5.0</span>},
        <span class="hljs-string">'moderate_watering'</span>: {<span class="hljs-string">'leaf_moisture_hours'</span>: <span class="hljs-number">8.0</span>},
        <span class="hljs-string">'increase_watering'</span>: {<span class="hljs-string">'leaf_moisture_hours'</span>: <span class="hljs-number">12.0</span>}
    }

    results = {}
    <span class="hljs-keyword">for</span> name, intervention <span class="hljs-keyword">in</span> interventions.items():
        result = cf_engine.counterfactual(plant_idx, intervention)
        results[name] = result[<span class="hljs-string">'counterfactual'</span>][<span class="hljs-string">'symptom_severity'</span>]

    <span class="hljs-comment"># Find best intervention</span>
    best = min(results.items(), key=<span class="hljs-keyword">lambda</span> x: x[<span class="hljs-number">1</span>])

    <span class="hljs-keyword">return</span> {
        <span class="hljs-string">'recommendation'</span>: best[<span class="hljs-number">0</span>],
        <span class="hljs-string">'expected_severity'</span>: best[<span class="hljs-number">1</span>],
        <span class="hljs-string">'all_options'</span>: results
    }

<span class="hljs-comment"># Example usage</span>
plant_idx = <span class="hljs-number">42</span>
recommendation = recommend_intervention(cf_engine, plant_idx)

print(<span class="hljs-string">f"Optimal intervention for Plant #<span class="hljs-subst">{plant_idx}</span>:"</span>)
print(<span class="hljs-string">f"  <span class="hljs-subst">{recommendation[<span class="hljs-string">'recommendation'</span>]}</span>"</span>)
print(<span class="hljs-string">f"  Expected severity: <span class="hljs-subst">{recommendation[<span class="hljs-string">'expected_severity'</span>]:<span class="hljs-number">.2</span>f}</span>"</span>)
print(<span class="hljs-string">f"\nAll options:"</span>)
<span class="hljs-keyword">for</span> intervention, severity <span class="hljs-keyword">in</span> recommendation[<span class="hljs-string">'all_options'</span>].items():
    print(<span class="hljs-string">f"  <span class="hljs-subst">{intervention}</span>: <span class="hljs-subst">{severity:<span class="hljs-number">.2</span>f}</span>"</span>)
</code></pre>
<h3 id="heading-2-explanation-generation"><strong>2. Explanation Generation</strong></h3>
<p><strong>Why did this plant get diseased?</strong></p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">explain_disease</span>(<span class="hljs-params">cf_engine, diseased_idx, healthy_idx</span>):</span>
    <span class="hljs-string">"""
    Explain why one plant got diseased and another didn't.
    """</span>
    diseased = cf_engine.data.iloc[diseased_idx]
    healthy = cf_engine.data.iloc[healthy_idx]

    <span class="hljs-comment"># Compare key differences</span>
    differences = []

    <span class="hljs-keyword">if</span> diseased[<span class="hljs-string">'watering_practice'</span>] != healthy[<span class="hljs-string">'watering_practice'</span>]:
        differences.append(
            <span class="hljs-string">f"Watering: Plant #<span class="hljs-subst">{diseased_idx}</span> was watered differently "</span>
            <span class="hljs-string">f"(<span class="hljs-subst">{diseased[<span class="hljs-string">'watering_practice'</span>]}</span> vs <span class="hljs-subst">{healthy[<span class="hljs-string">'watering_practice'</span>]}</span>)"</span>
        )

    <span class="hljs-keyword">if</span> abs(diseased[<span class="hljs-string">'plant_vigor'</span>] - healthy[<span class="hljs-string">'plant_vigor'</span>]) &gt; <span class="hljs-number">0.2</span>:
        differences.append(
            <span class="hljs-string">f"Vigor: Plant #<span class="hljs-subst">{diseased_idx}</span> had <span class="hljs-subst">{<span class="hljs-string">'lower'</span> <span class="hljs-keyword">if</span> diseased[<span class="hljs-string">'plant_vigor'</span>] &lt; healthy[<span class="hljs-string">'plant_vigor'</span>] <span class="hljs-keyword">else</span> <span class="hljs-string">'higher'</span>}</span> vigor "</span>
            <span class="hljs-string">f"(<span class="hljs-subst">{diseased[<span class="hljs-string">'plant_vigor'</span>]:<span class="hljs-number">.2</span>f}</span> vs <span class="hljs-subst">{healthy[<span class="hljs-string">'plant_vigor'</span>]:<span class="hljs-number">.2</span>f}</span>)"</span>
        )

    <span class="hljs-comment"># Counterfactual: Would diseased plant be healthy with healthy plant's watering?</span>
    intervention = {<span class="hljs-string">'leaf_moisture_hours'</span>: healthy[<span class="hljs-string">'leaf_moisture_hours'</span>]}
    cf_result = cf_engine.counterfactual(diseased_idx, intervention)

    <span class="hljs-keyword">if</span> cf_result[<span class="hljs-string">'counterfactual'</span>][<span class="hljs-string">'disease_present'</span>] == <span class="hljs-number">0</span>:
        differences.append(
            <span class="hljs-string">f"CRITICAL: If Plant #<span class="hljs-subst">{diseased_idx}</span> had received the same watering as "</span>
            <span class="hljs-string">f"Plant #<span class="hljs-subst">{healthy_idx}</span>, it would have remained healthy."</span>
        )

    <span class="hljs-keyword">return</span> {
        <span class="hljs-string">'differences'</span>: differences,
        <span class="hljs-string">'counterfactual'</span>: cf_result,
        <span class="hljs-string">'root_cause'</span>: <span class="hljs-string">'watering_practice'</span> <span class="hljs-keyword">if</span> cf_result[<span class="hljs-string">'individual_effect'</span>][<span class="hljs-string">'disease_change'</span>] &lt; <span class="hljs-number">0</span> <span class="hljs-keyword">else</span> <span class="hljs-string">'plant_vigor'</span>
    }

<span class="hljs-comment"># Usage</span>
diseased_plant = data[data[<span class="hljs-string">'disease_present'</span>] == <span class="hljs-number">1</span>].index[<span class="hljs-number">0</span>]
healthy_plant = data[data[<span class="hljs-string">'disease_present'</span>] == <span class="hljs-number">0</span>].index[<span class="hljs-number">0</span>]

explanation = explain_disease(cf_engine, diseased_plant, healthy_plant)

print(<span class="hljs-string">f"Why did Plant #<span class="hljs-subst">{diseased_plant}</span> get diseased?"</span>)
<span class="hljs-keyword">for</span> diff <span class="hljs-keyword">in</span> explanation[<span class="hljs-string">'differences'</span>]:
    print(<span class="hljs-string">f"  • <span class="hljs-subst">{diff}</span>"</span>)
print(<span class="hljs-string">f"\nRoot cause: <span class="hljs-subst">{explanation[<span class="hljs-string">'root_cause'</span>]}</span>"</span>)
</code></pre>
<h3 id="heading-3-regret-analysis"><strong>3. Regret Analysis</strong></h3>
<p><strong>What should I have done differently?</strong></p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">regret_analysis</span>(<span class="hljs-params">cf_engine, sample_idx</span>):</span>
    <span class="hljs-string">"""
    Analyze what optimal action would have been.
    """</span>
    actual = cf_engine.data.iloc[sample_idx]

    <span class="hljs-comment"># Test all possible watering practices</span>
    watering_options = [<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>]  <span class="hljs-comment"># under, optimal, over</span>

    results = {}
    <span class="hljs-keyword">for</span> watering <span class="hljs-keyword">in</span> watering_options:
        <span class="hljs-comment"># Compute expected moisture for this watering</span>
        expected_moisture = <span class="hljs-number">5.0</span> + actual[<span class="hljs-string">'environmental_stress'</span>] * <span class="hljs-number">10</span>
        <span class="hljs-keyword">if</span> watering == <span class="hljs-number">0</span>:
            expected_moisture -= <span class="hljs-number">3</span>
        <span class="hljs-keyword">elif</span> watering == <span class="hljs-number">2</span>:
            expected_moisture += <span class="hljs-number">5</span>

        intervention = {<span class="hljs-string">'leaf_moisture_hours'</span>: max(<span class="hljs-number">0</span>, min(<span class="hljs-number">24</span>, expected_moisture))}
        cf_result = cf_engine.counterfactual(sample_idx, intervention)

        results[watering] = {
            <span class="hljs-string">'disease'</span>: cf_result[<span class="hljs-string">'counterfactual'</span>][<span class="hljs-string">'disease_present'</span>],
            <span class="hljs-string">'severity'</span>: cf_result[<span class="hljs-string">'counterfactual'</span>][<span class="hljs-string">'symptom_severity'</span>]
        }

    <span class="hljs-comment"># Find optimal action</span>
    optimal = min(results.items(), key=<span class="hljs-keyword">lambda</span> x: (x[<span class="hljs-number">1</span>][<span class="hljs-string">'disease'</span>], x[<span class="hljs-number">1</span>][<span class="hljs-string">'severity'</span>]))

    actual_watering = actual[<span class="hljs-string">'watering_practice'</span>]
    regret = {
        <span class="hljs-string">'optimal_action'</span>: optimal[<span class="hljs-number">0</span>],
        <span class="hljs-string">'actual_action'</span>: actual_watering,
        <span class="hljs-string">'regret'</span>: results[actual_watering][<span class="hljs-string">'severity'</span>] - optimal[<span class="hljs-number">1</span>][<span class="hljs-string">'severity'</span>]
    }

    <span class="hljs-keyword">return</span> regret

<span class="hljs-comment"># Usage</span>
plant_idx = <span class="hljs-number">42</span>
regret = regret_analysis(cf_engine, plant_idx)

print(<span class="hljs-string">f"Regret Analysis for Plant #<span class="hljs-subst">{plant_idx}</span>:"</span>)
print(<span class="hljs-string">f"  Actual action: <span class="hljs-subst">{[<span class="hljs-string">'under'</span>, <span class="hljs-string">'optimal'</span>, <span class="hljs-string">'over'</span>][regret[<span class="hljs-string">'actual_action'</span>]]}</span> watering"</span>)
print(<span class="hljs-string">f"  Optimal action: <span class="hljs-subst">{[<span class="hljs-string">'under'</span>, <span class="hljs-string">'optimal'</span>, <span class="hljs-string">'over'</span>][regret[<span class="hljs-string">'optimal_action'</span>]]}</span> watering"</span>)
print(<span class="hljs-string">f"  Regret: <span class="hljs-subst">{regret[<span class="hljs-string">'regret'</span>]:<span class="hljs-number">.2</span>f}</span> severity points"</span>)

<span class="hljs-keyword">if</span> regret[<span class="hljs-string">'regret'</span>] &gt; <span class="hljs-number">0.1</span>:
    print(<span class="hljs-string">f"  ⚠️  Significant regret! Better watering would have reduced severity substantially."</span>)
<span class="hljs-keyword">else</span>:
    print(<span class="hljs-string">f"  ✓ Action was near-optimal."</span>)
</code></pre>
<h3 id="heading-4-policy-evaluation"><strong>4. Policy Evaluation</strong></h3>
<p><strong>Was our intervention strategy effective?</strong></p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">evaluate_policy</span>(<span class="hljs-params">cf_engine, treated_indices, control_indices</span>):</span>
    <span class="hljs-string">"""
    Evaluate treatment effect using counterfactual reasoning.
    """</span>
    <span class="hljs-comment"># For treated group: What if they hadn't been treated?</span>
    treated_effects = []
    <span class="hljs-keyword">for</span> idx <span class="hljs-keyword">in</span> treated_indices:
        <span class="hljs-comment"># Assume treatment was reducing moisture</span>
        cf_result = cf_engine.counterfactual(
            idx, 
            {<span class="hljs-string">'leaf_moisture_hours'</span>: cf_engine.data.loc[idx, <span class="hljs-string">'leaf_moisture_hours'</span>] + <span class="hljs-number">5.0</span>}
        )
        treated_effects.append(cf_result[<span class="hljs-string">'individual_effect'</span>][<span class="hljs-string">'severity_change'</span>])

    <span class="hljs-comment"># For control group: What if they had been treated?</span>
    control_effects = []
    <span class="hljs-keyword">for</span> idx <span class="hljs-keyword">in</span> control_indices:
        cf_result = cf_engine.counterfactual(
            idx,
            {<span class="hljs-string">'leaf_moisture_hours'</span>: max(<span class="hljs-number">0</span>, cf_engine.data.loc[idx, <span class="hljs-string">'leaf_moisture_hours'</span>] - <span class="hljs-number">5.0</span>)}
        )
        control_effects.append(-cf_result[<span class="hljs-string">'individual_effect'</span>][<span class="hljs-string">'severity_change'</span>])

    <span class="hljs-comment"># Overall treatment effect</span>
    ate = np.mean(treated_effects + control_effects)

    <span class="hljs-keyword">return</span> {
        <span class="hljs-string">'average_treatment_effect'</span>: ate,
        <span class="hljs-string">'treated_effect'</span>: np.mean(treated_effects),
        <span class="hljs-string">'control_effect'</span>: np.mean(control_effects),
        <span class="hljs-string">'heterogeneity'</span>: np.std(treated_effects + control_effects)
    }
</code></pre>
<hr />
<h2 id="heading-individual-treatment-effects-ite"><strong>Individual Treatment Effects (ITE)</strong></h2>
<h3 id="heading-beyond-average-treatment-effects"><strong>Beyond Average Treatment Effects</strong></h3>
<p><strong>Average Treatment Effect (ATE):</strong> What's the effect on average?</p>
<ul>
<li>"Reducing watering decreases disease by 15% on average"</li>
</ul>
<p><strong>Individual Treatment Effect (ITE):</strong> What's the effect for THIS individual?</p>
<ul>
<li><p>"For Plant #42, reducing watering would decrease disease by 85%"</p>
</li>
<li><p>"For Plant #17, reducing watering would have no effect"</p>
</li>
</ul>
<h3 id="heading-why-ite-matters"><strong>Why ITE Matters</strong></h3>
<p><strong>Precision medicine/agriculture:</strong></p>
<ul>
<li><p>Not everyone responds the same way</p>
</li>
<li><p>Treatment X might help person A but harm person B</p>
</li>
<li><p>Counterfactuals let us estimate personalized effects</p>
</li>
</ul>
<h3 id="heading-computing-ite"><strong>Computing ITE</strong></h3>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">compute_ite</span>(<span class="hljs-params">cf_engine, sample_idx, treatment_var, treatment_value</span>):</span>
    <span class="hljs-string">"""
    Compute Individual Treatment Effect.

    ITE = Y_1 - Y_0
    where Y_1 is outcome under treatment, Y_0 is outcome under control
    """</span>
    <span class="hljs-comment"># Factual outcome (what actually happened)</span>
    factual = cf_engine.data.iloc[sample_idx]

    <span class="hljs-comment"># Counterfactual outcome (what would happen under treatment)</span>
    intervention = {treatment_var: treatment_value}
    cf_result = cf_engine.counterfactual(sample_idx, intervention)

    ite = cf_result[<span class="hljs-string">'counterfactual'</span>][<span class="hljs-string">'symptom_severity'</span>] - factual[<span class="hljs-string">'symptom_severity'</span>]

    <span class="hljs-keyword">return</span> {
        <span class="hljs-string">'ite'</span>: ite,
        <span class="hljs-string">'factual_outcome'</span>: factual[<span class="hljs-string">'symptom_severity'</span>],
        <span class="hljs-string">'counterfactual_outcome'</span>: cf_result[<span class="hljs-string">'counterfactual'</span>][<span class="hljs-string">'symptom_severity'</span>],
        <span class="hljs-string">'would_benefit'</span>: ite &lt; <span class="hljs-number">-0.1</span>,  <span class="hljs-comment"># At least 10% improvement</span>
        <span class="hljs-string">'confidence'</span>: <span class="hljs-string">'high'</span> <span class="hljs-keyword">if</span> abs(cf_result[<span class="hljs-string">'counterfactual'</span>][<span class="hljs-string">'pathogen_growth'</span>] - factual[<span class="hljs-string">'pathogen_growth'</span>]) &gt; <span class="hljs-number">0.2</span> <span class="hljs-keyword">else</span> <span class="hljs-string">'low'</span>
    }

<span class="hljs-comment"># Usage: Estimate ITE for multiple plants</span>
ite_results = []
<span class="hljs-keyword">for</span> idx <span class="hljs-keyword">in</span> range(<span class="hljs-number">100</span>):
    ite = compute_ite(cf_engine, idx, <span class="hljs-string">'leaf_moisture_hours'</span>, <span class="hljs-number">6.0</span>)
    ite_results.append({
        <span class="hljs-string">'plant_idx'</span>: idx,
        <span class="hljs-string">'ite'</span>: ite[<span class="hljs-string">'ite'</span>],
        <span class="hljs-string">'would_benefit'</span>: ite[<span class="hljs-string">'would_benefit'</span>]
    })

ite_df = pd.DataFrame(ite_results)

print(<span class="hljs-string">"Individual Treatment Effect Distribution:"</span>)
print(<span class="hljs-string">f"  Mean ITE: <span class="hljs-subst">{ite_df[<span class="hljs-string">'ite'</span>].mean():<span class="hljs-number">.3</span>f}</span>"</span>)
print(<span class="hljs-string">f"  Std ITE: <span class="hljs-subst">{ite_df[<span class="hljs-string">'ite'</span>].std():<span class="hljs-number">.3</span>f}</span>"</span>)
print(<span class="hljs-string">f"  % who would benefit: <span class="hljs-subst">{ite_df[<span class="hljs-string">'would_benefit'</span>].mean():<span class="hljs-number">.1</span>%}</span>"</span>)

<span class="hljs-comment"># Identify who benefits most</span>
top_beneficiaries = ite_df.nsmallest(<span class="hljs-number">10</span>, <span class="hljs-string">'ite'</span>)
print(<span class="hljs-string">f"\nTop 10 beneficiaries from treatment:"</span>)
print(top_beneficiaries)
</code></pre>
<hr />
<h2 id="heading-counterfactual-fairness"><strong>Counterfactual Fairness</strong></h2>
<h3 id="heading-ensuring-fair-ai-decisions"><strong>Ensuring Fair AI Decisions</strong></h3>
<p><strong>The problem:</strong> ML models can discriminate based on protected attributes</p>
<p><strong>Counterfactual fairness:</strong> "Would the decision be the same if the person had a different protected attribute?"</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">check_counterfactual_fairness</span>(<span class="hljs-params">cf_engine, sample_idx, protected_attr, alt_value</span>):</span>
    <span class="hljs-string">"""
    Check if decision would change with different protected attribute.
    """</span>
    <span class="hljs-comment"># Factual decision</span>
    factual = cf_engine.data.iloc[sample_idx]
    factual_decision = <span class="hljs-string">"treat"</span> <span class="hljs-keyword">if</span> factual[<span class="hljs-string">'symptom_severity'</span>] &gt; <span class="hljs-number">0.5</span> <span class="hljs-keyword">else</span> <span class="hljs-string">"monitor"</span>

    <span class="hljs-comment"># Counterfactual decision (with different protected attribute)</span>
    intervention = {protected_attr: alt_value}
    cf_result = cf_engine.counterfactual(sample_idx, intervention)
    cf_decision = <span class="hljs-string">"treat"</span> <span class="hljs-keyword">if</span> cf_result[<span class="hljs-string">'counterfactual'</span>][<span class="hljs-string">'symptom_severity'</span>] &gt; <span class="hljs-number">0.5</span> <span class="hljs-keyword">else</span> <span class="hljs-string">"monitor"</span>

    is_fair = factual_decision == cf_decision

    <span class="hljs-keyword">return</span> {
        <span class="hljs-string">'is_fair'</span>: is_fair,
        <span class="hljs-string">'factual_decision'</span>: factual_decision,
        <span class="hljs-string">'counterfactual_decision'</span>: cf_decision,
        <span class="hljs-string">'protected_attr'</span>: protected_attr,
        <span class="hljs-string">'explanation'</span>: <span class="hljs-string">f"Decision <span class="hljs-subst">{<span class="hljs-string">'would'</span> <span class="hljs-keyword">if</span> is_fair <span class="hljs-keyword">else</span> <span class="hljs-string">'would NOT'</span>}</span> remain the same"</span>
    }
</code></pre>
<hr />
<h2 id="heading-practical-tips-for-counterfactual-reasoning"><strong>Practical Tips for Counterfactual Reasoning</strong></h2>
<h3 id="heading-1-validate-structural-equations"><strong>1. Validate Structural Equations</strong></h3>
<p>Your counterfactuals are only as good as your causal model:</p>
<ul>
<li><p>Test on known interventions</p>
</li>
<li><p>Compare to randomized trials when available</p>
</li>
<li><p>Check if counterfactual predictions match observed data</p>
</li>
</ul>
<h3 id="heading-2-handle-uncertainty"><strong>2. Handle Uncertainty</strong></h3>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">counterfactual_with_uncertainty</span>(<span class="hljs-params">cf_engine, sample_idx, intervention, n_samples=<span class="hljs-number">100</span></span>):</span>
    <span class="hljs-string">"""
    Compute counterfactual with uncertainty via bootstrapping.
    """</span>
    results = []

    <span class="hljs-keyword">for</span> _ <span class="hljs-keyword">in</span> range(n_samples):
        <span class="hljs-comment"># Add noise to inference</span>
        cf_result = cf_engine.counterfactual(sample_idx, intervention)
        results.append(cf_result[<span class="hljs-string">'counterfactual'</span>][<span class="hljs-string">'symptom_severity'</span>])

    <span class="hljs-keyword">return</span> {
        <span class="hljs-string">'mean'</span>: np.mean(results),
        <span class="hljs-string">'std'</span>: np.std(results),
        <span class="hljs-string">'ci_lower'</span>: np.percentile(results, <span class="hljs-number">2.5</span>),
        <span class="hljs-string">'ci_upper'</span>: np.percentile(results, <span class="hljs-number">97.5</span>)
    }
</code></pre>
<h3 id="heading-3-combine-with-domain-knowledge"><strong>3. Combine with Domain Knowledge</strong></h3>
<p>The most powerful counterfactuals come from:</p>
<ul>
<li><p>Causal structure (DAG)</p>
</li>
<li><p>Domain expertise (mechanisms)</p>
</li>
<li><p>Data (observations)</p>
</li>
</ul>
<p>Don't rely on any one alone.</p>
<hr />
<h2 id="heading-youve-mastered-counterfactuals"><strong>You've Mastered Counterfactuals</strong></h2>
<p>Congratulations! You now understand:</p>
<p>✅ What counterfactuals are and why they're powerful<br />✅ The three-step process: Abduction → Action → Prediction<br />✅ How to implement counterfactual inference<br />✅ Individual Treatment Effects (ITE)<br />✅ Applications: personalization, explanation, regret analysis<br />✅ Counterfactual fairness</p>
<p><strong>This is Level 3 reasoning.</strong> Most AI can't do this.</p>
<hr />
<h2 id="heading-whats-next-intervention-design"><strong>What's Next: Intervention Design</strong></h2>
<p>In <strong>Part 4</strong> (Wednesday, Jan 22), we move from analysis to action:</p>
<ul>
<li><p>How do we use counterfactuals to design optimal interventions?</p>
</li>
<li><p>What's the best treatment for each individual?</p>
</li>
<li><p>How do we optimize for multiple objectives?</p>
</li>
<li><p>How do we account for costs and constraints?</p>
</li>
</ul>
<p>We'll build a complete intervention recommendation engine that combines everything from Parts 1-3.</p>
<hr />
<h2 id="heading-your-homework"><strong>Your Homework</strong></h2>
<p><strong>1. Implement counterfactual engine</strong></p>
<ul>
<li><p>Use the code from this article</p>
</li>
<li><p>Test on your plant disease data</p>
</li>
<li><p>Generate counterfactual explanations</p>
</li>
</ul>
<p><strong>2. Experiment with interventions</strong></p>
<ul>
<li><p>Try different intervention values</p>
</li>
<li><p>Compare factual vs counterfactual outcomes</p>
</li>
<li><p>Find cases with high regret</p>
</li>
</ul>
<p><strong>3. Think about your domain</strong></p>
<ul>
<li><p>What counterfactual questions would be valuable?</p>
</li>
<li><p>What interventions do you want to optimize?</p>
</li>
<li><p>What constraints matter in practice?</p>
</li>
</ul>
<p><strong>4. Challenge yourself</strong></p>
<ul>
<li><p>Can you extend the engine to multiple treatments?</p>
</li>
<li><p>How would you handle continuous outcomes?</p>
</li>
<li><p>What about time-series counterfactuals?</p>
</li>
</ul>
<p>Bring these to Part 4. We're building the intervention engine.</p>
<hr />
<p><strong>Series Navigation:</strong></p>
<ul>
<li><p><a target="_blank" href="https://hashnode.com/preview/6965608ba83646fb5b9d1077">← Part 2: Building Causal DAGs</a></p>
</li>
<li><p><strong>Part 3: Counterfactual Reasoning</strong> ← You are here</p>
</li>
<li><p>Part 4: Intervention Design → (Jan 22)</p>
</li>
<li><p>Part 5: Distributed Systems (Jan 24)</p>
</li>
</ul>
<p><strong>Code &amp; Resources:</strong></p>
<ul>
<li><p><a target="_blank" href="https://github.com/cod3smith/plant-disease-causal">GitHub Repository</a></p>
</li>
<li><p><a target="_blank" href="https://github.com/yourusername/plant-disease-causal/tree/main/part3">Counterfactual Examples</a></p>
</li>
</ul>
<hr />
<p><em>Part of the NeoForge Labs research series on production-grade causal AI.</em></p>
<p><strong>Questions?</strong> I read every comment.</p>
]]></content:encoded></item><item><title><![CDATA[Part 2: Building Your First Causal DAG]]></title><description><![CDATA[In Part 1, you learned why causality matters. Correlation tells you what happens, but causation tells you why and what to do about it.
Today, we're building your first causal Directed Acyclic Graph (DAG)—the foundation of causal reasoning.
By the end...]]></description><link>https://blog.neoforgelabs.tech/part-2-building-your-first-causal-dag</link><guid isPermaLink="true">https://blog.neoforgelabs.tech/part-2-building-your-first-causal-dag</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Artificial Intelligence]]></category><category><![CDATA[Data Science]]></category><category><![CDATA[Developer]]></category><dc:creator><![CDATA[Kelyn Njeri]]></dc:creator><pubDate>Wed, 14 Jan 2026 04:00:18 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1768253217725/ffb009f8-f2ca-4dbf-8f3d-c10eb343f454.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In <a target="_blank" href="https://blog.neoforgelabs.tech/why-causality-matters-for-ai">Part 1</a>, you learned why causality matters. Correlation tells you <em>what</em> happens, but causation tells you <em>why</em> and <em>what to do about it</em>.</p>
<p>Today, we're building your first causal Directed Acyclic Graph (DAG)—the foundation of causal reasoning.</p>
<p>By the end of this article, you'll:</p>
<ul>
<li><p>Understand what DAGs are and why they're powerful</p>
</li>
<li><p>Build a complete causal model for plant disease detection</p>
</li>
<li><p>Learn to identify confounders, mediators, and colliders</p>
</li>
<li><p>Know how to validate your causal assumptions</p>
</li>
<li><p>Have working code to implement your DAG in Python</p>
</li>
</ul>
<p>No more theory. Let's build something real.</p>
<p><strong>What you'll need:</strong></p>
<ul>
<li><p>Python 3.12+</p>
</li>
<li><p>Basic understanding of probability</p>
</li>
<li><p>Curiosity about how things actually work</p>
</li>
</ul>
<p>Let's go.</p>
<hr />
<h2 id="heading-what-is-a-causal-dag"><strong>What Is a Causal DAG?</strong></h2>
<h3 id="heading-graphs-as-causal-models"><strong>Graphs as Causal Models</strong></h3>
<p>A <strong>Directed Acyclic Graph (DAG)</strong> is a visual representation of causal relationships:</p>
<p><img src="https://cdn-images-1.medium.com/max/1600/1*T5VqBGZNoyIABTkfrynEuw.png" alt /></p>
<p><strong>Three key components:</strong></p>
<p><strong>1. Nodes (variables):</strong> Things that can change</p>
<ul>
<li><p>Environmental temperature</p>
</li>
<li><p>Soil moisture</p>
</li>
<li><p>Plant health</p>
</li>
<li><p>Disease presence</p>
</li>
</ul>
<p><strong>2. Directed edges (arrows):</strong> Causal relationships</p>
<ul>
<li><p>A → B means "A causes B"</p>
</li>
<li><p>Direction matters: Temperature → Disease ≠ Disease → Temperature</p>
</li>
</ul>
<p><strong>3. Acyclic (no loops):</strong> No circular causation</p>
<ul>
<li><p>Can't have: A → B → C → A</p>
</li>
<li><p>Time flows forward, causes precede effects</p>
</li>
</ul>
<h3 id="heading-why-graphs"><strong>Why Graphs?</strong></h3>
<p><strong>Compact representation of causal knowledge:</strong></p>
<p>Instead of writing:</p>
<pre><code class="lang-plaintext">If temperature is high AND humidity is high THEN moisture increases
If moisture is high AND air circulation is low THEN pathogen growth increases
If pathogen growth is high THEN disease risk increases
...
</code></pre>
<p>We draw:</p>
<p><img src="https://cdn-images-1.medium.com/max/1600/1*ps98UTS5tdguPOzP-GeRGg.png" alt /></p>
<p>DAG Knowledge Representation</p>
<p><strong>The graph encodes:</strong></p>
<ul>
<li><p>Direct causes (arrows)</p>
</li>
<li><p>Indirect causes (paths)</p>
</li>
<li><p>Independence relationships (absence of arrows)</p>
</li>
<li><p>Causal mechanisms (structure)</p>
</li>
</ul>
<h3 id="heading-reading-the-graph"><strong>Reading the Graph</strong></h3>
<p>From the DAG above, we can read:</p>
<p><strong>Direct effects:</strong></p>
<ul>
<li><p>Temperature directly causes Moisture</p>
</li>
<li><p>Pathogen Growth directly causes Disease</p>
</li>
</ul>
<p><strong>Indirect effects:</strong></p>
<ul>
<li><p>Temperature indirectly affects Disease (via Moisture → Pathogen Growth)</p>
</li>
<li><p>Humidity indirectly affects Disease (via same path)</p>
</li>
</ul>
<p><strong>Independence (no arrow):</strong></p>
<ul>
<li><p>Temperature does NOT directly cause Disease</p>
<ul>
<li>It only affects it through the moisture mechanism</li>
</ul>
</li>
<li><p>Air Circulation does NOT affect Moisture</p>
<ul>
<li>It only affects pathogen growth</li>
</ul>
</li>
</ul>
<p><strong>This is powerful</strong>: The structure tells us what variables are related and HOW.</p>
<h3 id="heading-the-dag-answers-intervention-questions"><strong>The DAG Answers Intervention Questions</strong></h3>
<p>Want to know: "What happens if I reduce humidity?"</p>
<p><strong>Follow the arrows:</strong></p>
<ol>
<li><p>Humidity ↓</p>
</li>
<li><p>→ Moisture ↓</p>
</li>
<li><p>→ Pathogen Growth ↓</p>
</li>
<li><p>→ Disease ↓</p>
</li>
</ol>
<p>The causal path tells us the effect of our intervention.</p>
<p>Want to know: "What happens if I increase temperature?"</p>
<p><strong>Check the paths:</strong></p>
<ul>
<li><p>Temperature → Moisture ↑</p>
</li>
<li><p>Moisture → Pathogen Growth (depends on pathogen type!)</p>
<ul>
<li><p>For fungi that need moisture: Growth ↓</p>
</li>
<li><p>For heat-loving pathogens: Growth ↑</p>
</li>
</ul>
</li>
</ul>
<p>The DAG shows us we need domain knowledge to complete the model.</p>
<h3 id="heading-what-dags-cannot-do"><strong>What DAGs Cannot Do</strong></h3>
<p>Important limitations:</p>
<p><strong>1. DAGs don't learn from data alone</strong></p>
<ul>
<li><p>You need domain knowledge to draw arrows</p>
</li>
<li><p>Data can validate or reject your DAG</p>
</li>
<li><p>But structure comes from understanding mechanisms</p>
</li>
</ul>
<p><strong>2. DAGs assume causality is stable</strong></p>
<ul>
<li><p>Same causes → same effects</p>
</li>
<li><p>Holds across contexts (mostly)</p>
</li>
<li><p>May break with extreme distribution shift</p>
</li>
</ul>
<p><strong>3. DAGs can be wrong</strong></p>
<ul>
<li><p>Missing arrows = missed confounders</p>
</li>
<li><p>Wrong direction = incorrect causal reasoning</p>
</li>
<li><p>Validation is crucial</p>
</li>
</ul>
<p><strong>But</strong>: A good DAG, grounded in domain expertise, is far more reliable than pure correlation.</p>
<hr />
<h2 id="heading-building-the-plant-disease-dag"><strong>Building the Plant Disease DAG</strong></h2>
<h3 id="heading-step-by-step-from-mechanism-to-graph"><strong>Step-by-Step: From Mechanism to Graph</strong></h3>
<p>Let's build our plant disease causal model systematically.</p>
<h4 id="heading-step-1-identify-root-causes-exogenous-variables"><strong>Step 1: Identify Root Causes (Exogenous Variables)</strong></h4>
<p>What are the fundamental inputs we can observe or control?</p>
<p><strong>1. Environmental Stress</strong></p>
<ul>
<li><p>Temperature extremes</p>
</li>
<li><p>Humidity levels</p>
</li>
<li><p>Light availability</p>
</li>
<li><p>Composite measure of environmental conditions</p>
</li>
</ul>
<p><strong>2. Watering Practice</strong></p>
<ul>
<li><p>Frequency</p>
</li>
<li><p>Amount</p>
</li>
<li><p>Under/optimal/overwatered</p>
</li>
<li><p>Farmer-controlled variable</p>
</li>
</ul>
<p><strong>3. Plant Vigor</strong></p>
<ul>
<li><p>Overall plant health</p>
</li>
<li><p>Genetic factors</p>
</li>
<li><p>Age and maturity</p>
</li>
<li><p>Baseline resilience</p>
</li>
</ul>
<p>These are <strong>exogenous</strong> (external) variables—they're not caused by other variables in our model.</p>
<p><img src="https://cdn-images-1.medium.com/max/1600/1*pZq420unVGdjWatDSQcfHA.png" alt /></p>
<h4 id="heading-step-2-identify-the-causal-mechanism"><strong>Step 2: Identify the Causal Mechanism</strong></h4>
<p><strong>Question:</strong> How do root causes lead to disease?</p>
<p><strong>Domain knowledge from plant pathology:</strong></p>
<p><strong>1. High environmental stress + watering → Leaf Moisture</strong></p>
<ul>
<li><p>Hot weather increases evaporation</p>
</li>
<li><p>Watering increases surface water</p>
</li>
<li><p>Combined: creates conditions for pathogens</p>
</li>
</ul>
<p><strong>2. Leaf Moisture → Pathogen Growth</strong></p>
<ul>
<li><p>Fungi need moisture to germinate</p>
</li>
<li><p>Bacteria need water film to enter plant</p>
</li>
<li><p>Critical threshold: ~6-8 hours of leaf wetness</p>
</li>
</ul>
<p><strong>3. Pathogen Growth → Disease</strong></p>
<ul>
<li><p>Sufficient pathogen load → infection</p>
</li>
<li><p>Varies by plant immunity</p>
</li>
</ul>
<p><strong>4. Disease + Plant Vigor → Symptom Severity</strong></p>
<ul>
<li><p>Same disease manifests differently</p>
</li>
<li><p>Healthy plants show mild symptoms</p>
</li>
<li><p>Weak plants show severe symptoms</p>
</li>
</ul>
<p><strong>5. Symptom Severity → Observable Symptoms</strong></p>
<ul>
<li><p>What we actually see</p>
</li>
<li><p>Yellowing, spots, wilting, etc.</p>
</li>
</ul>
<h4 id="heading-step-3-draw-the-arrows"><strong>Step 3: Draw the Arrows</strong></h4>
<p>Now we connect the dots:</p>
<p><img src="https://cdn-images-1.medium.com/max/1600/1*07XCjbhSY1VtRUF9zjCyNg.png" alt /></p>
<p><strong>Node Legend:</strong></p>
<ul>
<li><p><strong>Cyan</strong>: Exogenous (controllable or observable inputs)</p>
</li>
<li><p><strong>Purple</strong>: Intermediate mechanisms</p>
</li>
<li><p><strong>Pink</strong>: Latent (unobserved) variable</p>
</li>
</ul>
<h4 id="heading-step-4-validate-the-structure"><strong>Step 4: Validate the Structure</strong></h4>
<p>For each arrow, ask: "Does X <strong>directly cause</strong> Y?"</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Arrow</td><td>Justification</td><td>Validated?</td></tr>
</thead>
<tbody>
<tr>
<td>Environmental Stress → Leaf Moisture</td><td>Temperature/humidity affect surface water</td><td>✅</td></tr>
<tr>
<td>Watering → Leaf Moisture</td><td>Direct causal mechanism</td><td>✅</td></tr>
<tr>
<td>Leaf Moisture → Pathogen Growth</td><td>Pathogens need water</td><td>✅</td></tr>
<tr>
<td>Pathogen Growth → Disease</td><td>Sufficient load → infection</td><td>✅</td></tr>
<tr>
<td>Disease → Symptom Severity</td><td>Disease causes symptoms</td><td>✅</td></tr>
<tr>
<td>Plant Vigor → Symptom Severity</td><td>Vigor moderates expression</td><td>✅</td></tr>
<tr>
<td>Symptom Severity → Observable Symptoms</td><td>What we measure</td><td>✅</td></tr>
</tbody>
</table>
</div><p><strong>Missing arrows (intentionally):</strong></p>
<ul>
<li><p>Environmental Stress → Disease?</p>
<ul>
<li>NO direct arrow: stress affects disease ONLY through moisture</li>
</ul>
</li>
<li><p>Watering → Disease?</p>
<ul>
<li>NO direct arrow: watering affects disease ONLY through moisture</li>
</ul>
</li>
<li><p>Plant Vigor → Disease?</p>
<ul>
<li>NO direct arrow: vigor affects symptom severity, not disease presence</li>
</ul>
</li>
</ul>
<p>This is important! The absence of arrows encodes causal assumptions.</p>
<h4 id="heading-step-5-name-the-causal-roles"><strong>Step 5: Name the Causal Roles</strong></h4>
<p>Understanding special node types:</p>
<p><strong>Confounders:</strong></p>
<ul>
<li><p>Variables that affect both treatment and outcome</p>
</li>
<li><p>Example: If Environmental Stress affected both Watering AND Disease directly</p>
</li>
<li><p>Our model: No confounders (by design for simplicity)</p>
</li>
</ul>
<p><strong>Mediators:</strong></p>
<ul>
<li><p>Variables on the causal path</p>
</li>
<li><p>Example: Leaf Moisture mediates Environmental Stress → Disease</p>
</li>
<li><p>Pathogen Growth mediates Leaf Moisture → Disease</p>
</li>
</ul>
<p><strong>Colliders:</strong></p>
<ul>
<li><p>Variables caused by multiple parents</p>
</li>
<li><p>Example: Symptom Severity is a collider (caused by Disease AND Plant Vigor)</p>
</li>
<li><p>Special property: conditioning on colliders can create spurious associations!</p>
</li>
</ul>
<p><strong>Effect Modifiers:</strong></p>
<ul>
<li><p>Variables that change the magnitude of effects</p>
</li>
<li><p>Example: Plant Vigor modifies how Disease translates to Symptoms</p>
</li>
<li><p>High vigor → mild symptoms even with disease</p>
</li>
</ul>
<h3 id="heading-our-final-dag"><strong>Our Final DAG</strong></h3>
<p>7 nodes, 7 edges, complete causal story:</p>
<p><strong>Root Causes</strong> → <strong>Mechanisms</strong> → <strong>Observable Outcomes</strong></p>
<p>This is our working model. In Part 3, we'll use it for counterfactual reasoning. In Part 4, we'll design interventions.</p>
<p>But first, let's implement it in code.</p>
<hr />
<h2 id="heading-implementing-the-dag-in-python"><strong>Implementing the DAG in Python</strong></h2>
<h3 id="heading-coding-your-dag-with-dowhy"><strong>Coding Your DAG with DoWhy</strong></h3>
<p>Let's make this concrete with Python code.</p>
<p><strong>Install dependencies:</strong></p>
<pre><code class="lang-bash">pip install dowhy pandas numpy networkx matplotlib
</code></pre>
<p><strong>Define the DAG:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> dowhy <span class="hljs-keyword">import</span> CausalModel
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np

<span class="hljs-comment"># Define the causal graph</span>
causal_graph = <span class="hljs-string">"""
digraph {
    Environmental_Stress -&gt; Leaf_Moisture;
    Watering_Practice -&gt; Leaf_Moisture;
    Leaf_Moisture -&gt; Pathogen_Growth;
    Pathogen_Growth -&gt; Disease_Present;
    Disease_Present -&gt; Symptom_Severity;
    Plant_Vigor -&gt; Symptom_Severity;
    Symptom_Severity -&gt; Observable_Symptoms;
}
"""</span>

<span class="hljs-comment"># Create sample data (we'll use synthetic for now)</span>
np.random.seed(<span class="hljs-number">42</span>)
n_samples = <span class="hljs-number">1000</span>

data = pd.DataFrame({
    <span class="hljs-string">'environmental_stress'</span>: np.random.beta(<span class="hljs-number">2</span>, <span class="hljs-number">5</span>, n_samples),  <span class="hljs-comment"># 0-1 scale</span>
    <span class="hljs-string">'watering_practice'</span>: np.random.choice([<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>], n_samples),  <span class="hljs-comment"># 0=under, 1=optimal, 2=over</span>
    <span class="hljs-string">'plant_vigor'</span>: np.random.beta(<span class="hljs-number">8</span>, <span class="hljs-number">2</span>, n_samples),  <span class="hljs-comment"># Usually healthy</span>
    <span class="hljs-string">'leaf_moisture_hours'</span>: np.zeros(n_samples),  <span class="hljs-comment"># We'll compute</span>
    <span class="hljs-string">'pathogen_growth'</span>: np.zeros(n_samples),
    <span class="hljs-string">'disease_present'</span>: np.zeros(n_samples),
    <span class="hljs-string">'symptom_severity'</span>: np.zeros(n_samples),
})

<span class="hljs-comment"># Generate data according to causal structure</span>
<span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(n_samples):
    <span class="hljs-comment"># Leaf moisture depends on environmental stress and watering</span>
    base_moisture = <span class="hljs-number">5.0</span>  <span class="hljs-comment"># baseline</span>
    stress_effect = data.loc[i, <span class="hljs-string">'environmental_stress'</span>] * <span class="hljs-number">10</span>
    watering_effect = [<span class="hljs-number">-3</span>, <span class="hljs-number">0</span>, <span class="hljs-number">5</span>][data.loc[i, <span class="hljs-string">'watering_practice'</span>]]

    data.loc[i, <span class="hljs-string">'leaf_moisture_hours'</span>] = np.clip(
        base_moisture + stress_effect + watering_effect + np.random.normal(<span class="hljs-number">0</span>, <span class="hljs-number">1</span>),
        <span class="hljs-number">0</span>, <span class="hljs-number">24</span>
    )

    <span class="hljs-comment"># Pathogen growth depends on leaf moisture</span>
    moisture = data.loc[i, <span class="hljs-string">'leaf_moisture_hours'</span>]
    data.loc[i, <span class="hljs-string">'pathogen_growth'</span>] = np.clip(
        (moisture / <span class="hljs-number">24</span>) ** <span class="hljs-number">1.5</span> + np.random.normal(<span class="hljs-number">0</span>, <span class="hljs-number">0.1</span>),
        <span class="hljs-number">0</span>, <span class="hljs-number">1</span>
    )

    <span class="hljs-comment"># Disease depends on pathogen growth</span>
    pathogen = data.loc[i, <span class="hljs-string">'pathogen_growth'</span>]
    data.loc[i, <span class="hljs-string">'disease_present'</span>] = <span class="hljs-number">1</span> <span class="hljs-keyword">if</span> pathogen &gt; <span class="hljs-number">0.6</span> <span class="hljs-keyword">else</span> <span class="hljs-number">0</span>

    <span class="hljs-comment"># Symptom severity depends on disease and plant vigor</span>
    disease = data.loc[i, <span class="hljs-string">'disease_present'</span>]
    vigor = data.loc[i, <span class="hljs-string">'plant_vigor'</span>]
    data.loc[i, <span class="hljs-string">'symptom_severity'</span>] = np.clip(
        disease * (<span class="hljs-number">1</span> - vigor * <span class="hljs-number">0.5</span>) + np.random.normal(<span class="hljs-number">0</span>, <span class="hljs-number">0.1</span>),
        <span class="hljs-number">0</span>, <span class="hljs-number">1</span>
    )

print(data.head())
print(<span class="hljs-string">f"\nDisease prevalence: <span class="hljs-subst">{data[<span class="hljs-string">'disease_present'</span>].mean():<span class="hljs-number">.2</span>%}</span>"</span>)
</code></pre>
<p><strong>Create the causal model:</strong></p>
<pre><code class="lang-python">model = CausalModel(
    data=data,
    treatment=<span class="hljs-string">'leaf_moisture_hours'</span>,
    outcome=<span class="hljs-string">'symptom_severity'</span>,
    graph=causal_graph,
    common_causes=[<span class="hljs-string">'environmental_stress'</span>, <span class="hljs-string">'watering_practice'</span>],
    effect_modifiers=[<span class="hljs-string">'plant_vigor'</span>]
)

<span class="hljs-comment"># Visualize the graph</span>
model.view_model()

<span class="hljs-comment"># Identify the causal effect</span>
identified_estimand = model.identify_effect(proceed_when_unidentifiable=<span class="hljs-literal">True</span>)
print(identified_estimand)
</code></pre>
<p><strong>What this code does:</strong></p>
<ol>
<li><p><strong>Defines causal structure</strong> (the DAG)</p>
</li>
<li><p><strong>Generates synthetic data</strong> following causal equations</p>
</li>
<li><p><strong>Creates DoWhy model</strong> linking data to structure</p>
</li>
<li><p><strong>Identifies causal effect</strong> of leaf moisture on symptoms</p>
</li>
</ol>
<p><strong>The structural equations:</strong></p>
<pre><code class="lang-python"><span class="hljs-comment"># These are the causal mechanisms encoded in the data generation:</span>

Leaf_Moisture = f(Environmental_Stress, Watering_Practice, noise)
Pathogen_Growth = g(Leaf_Moisture, noise)
Disease = h(Pathogen_Growth, noise)
Symptom_Severity = j(Disease, Plant_Vigor, noise)
</code></pre>
<p>Each function represents a causal mechanism. The DAG shows which variables go into which functions.</p>
<h3 id="heading-querying-the-model"><strong>Querying the Model</strong></h3>
<p>Now we can ask causal questions:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Estimate causal effect</span>
estimate = model.estimate_effect(
    identified_estimand,
    method_name=<span class="hljs-string">"backdoor.linear_regression"</span>
)

print(<span class="hljs-string">f"Causal effect of leaf moisture on symptom severity: <span class="hljs-subst">{estimate.value:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"95% Confidence Interval: [<span class="hljs-subst">{estimate.get_confidence_intervals()[<span class="hljs-number">0</span>]:<span class="hljs-number">.4</span>f}</span>, <span class="hljs-subst">{estimate.get_confidence_intervals()[<span class="hljs-number">1</span>]:<span class="hljs-number">.4</span>f}</span>]"</span>)
</code></pre>
<p>This tells us: <strong>For every additional hour of leaf moisture, symptom severity increases by X.</strong></p>
<p>That's a causal claim, not correlation!</p>
<p><strong>Example output:</strong></p>
<pre><code class="lang-plaintext">Causal effect of leaf moisture on symptom severity: 0.0234
95% Confidence Interval: [0.0198, 0.0271]

Interpretation: Each additional hour of leaf moisture causes 
a 2.34% increase in symptom severity (statistically significant).
</code></pre>
<p>We'll do much more with this in Part 3 (counterfactuals) and Part 4 (interventions).</p>
<hr />
<h2 id="heading-common-dag-patterns-amp-pitfalls"><strong>Common DAG Patterns &amp; Pitfalls</strong></h2>
<h3 id="heading-causal-patterns-you-need-to-know"><strong>Causal Patterns You Need to Know</strong></h3>
<p>Understanding these patterns will help you build better DAGs.</p>
<h4 id="heading-pattern-1-confounding"><strong>Pattern 1: Confounding</strong></h4>
<p><img src="https://cdn-images-1.medium.com/max/1600/1*jyMZQbFxoP34ClUcBtjBVg.png" alt /></p>
<p><strong>Problem:</strong> Confounder causes both treatment and outcome, creating spurious association.</p>
<p><strong>Example:</strong></p>
<ul>
<li><p>Season (C) → Watering frequency (T)</p>
</li>
<li><p>Season (C) → Disease prevalence (O)</p>
</li>
</ul>
<p>You observe: More watering correlates with more disease.<br />Reality: It's because summer has both more watering AND more disease.</p>
<p><strong>Solution:</strong> Control for confounders in analysis (we'll cover this in Part 4).</p>
<h4 id="heading-pattern-2-mediation"><strong>Pattern 2: Mediation</strong></h4>
<p><img src="https://cdn-images-1.medium.com/max/1600/1*sZm4hOJIKH6Tb8LCpDGoEQ.png" alt /></p>
<p><strong>Definition:</strong> Mediator sits on the causal path between treatment and outcome.</p>
<p><strong>Example in our DAG:</strong></p>
<ul>
<li>Watering → Leaf Moisture → Pathogen Growth → Disease</li>
</ul>
<p>Leaf Moisture and Pathogen Growth are mediators.</p>
<p><strong>Why it matters:</strong></p>
<ul>
<li><p><strong>Total effect:</strong> Watering → Disease (full causal path)</p>
</li>
<li><p><strong>Direct effect:</strong> None (Watering doesn't cause Disease except through mediators)</p>
</li>
<li><p><strong>Mediated effect:</strong> Watering → Moisture → Pathogen → Disease</p>
</li>
</ul>
<p><strong>Intervention implications:</strong></p>
<ul>
<li><p>You can intervene at any point in the chain</p>
</li>
<li><p>Earlier intervention (reduce watering) prevents the entire cascade</p>
</li>
<li><p>Later intervention (fungicide for pathogen) only stops downstream effects</p>
</li>
</ul>
<h4 id="heading-pattern-3-collider-bias-the-trap"><strong>Pattern 3: Collider Bias (The Trap)</strong></h4>
<p><img src="https://cdn-images-1.medium.com/max/1600/1*My9KmnFlIqv_fWuCyBVxig.png" alt /></p>
<p><strong>Critical property:</strong> A and B are independent, but become correlated if you condition on C!</p>
<p><strong>Example:</strong></p>
<ul>
<li><p>Disease (A) → Symptom Severity (C)</p>
</li>
<li><p>Plant Vigor (B) → Symptom Severity (C)</p>
</li>
</ul>
<p>Symptom Severity is a collider.</p>
<p><strong>The trap:</strong> If you only analyze plants with severe symptoms (conditioning on collider), you'll find:</p>
<ul>
<li><p>Plants with low vigor tend to have disease</p>
</li>
<li><p>Plants with high vigor tend NOT to have disease</p>
</li>
</ul>
<p><strong>But this is spurious!</strong> You selected on the outcome.</p>
<p>In reality:</p>
<ul>
<li><p>Disease and Plant Vigor are independent (no arrow between them)</p>
</li>
<li><p>They only appear related when you filter by severe symptoms</p>
</li>
</ul>
<p><strong>Real-world example:</strong></p>
<p>Imagine you're studying what makes successful entrepreneurs. You only survey people who became billionaires (conditioning on outcome).</p>
<p>You find: High risk-taking OR exceptional luck leads to billions.</p>
<p>Among billionaires:</p>
<ul>
<li><p>High risk-takers had average luck</p>
</li>
<li><p>Low risk-takers had exceptional luck</p>
</li>
</ul>
<p><strong>Spurious correlation!</strong> Risk-taking and luck appear negatively correlated, but only because you conditioned on success (the collider).</p>
<p><strong>How to avoid:</strong> Don't condition on colliders unless you have a good reason.</p>
<h4 id="heading-pattern-4-selection-bias"><strong>Pattern 4: Selection Bias</strong></h4>
<p>Similar to collider bias, but about sample selection:</p>
<p><img src="https://cdn-images-1.medium.com/max/1600/1*imE2N5g4heBo7et3uSvTig.png" alt /></p>
<p><strong>Example:</strong> You train your model only on:</p>
<ul>
<li><p>Plants brought to clinic (Selection)</p>
</li>
<li><p>Which happens when: Disease is visible OR plant is expensive</p>
</li>
</ul>
<p>Now disease and plant value appear correlated—but only because of selection!</p>
<p><strong>Solution:</strong> Be aware of how your sample was selected.</p>
<h3 id="heading-dag-validation-checklist"><strong>DAG Validation Checklist</strong></h3>
<p>Before trusting your DAG:</p>
<ul>
<li><p>[ ] Every arrow represents a direct causal effect</p>
</li>
<li><p>[ ] Missing arrows represent true independence</p>
</li>
<li><p>[ ] No feedback loops (check: acyclic?)</p>
</li>
<li><p>[ ] Domain experts reviewed the structure</p>
</li>
<li><p>[ ] Edge cases considered (extreme values)</p>
</li>
<li><p>[ ] Alternative DAGs ruled out</p>
</li>
<li><p>[ ] Testable implications identified</p>
</li>
<li><p>[ ] Data validates independence claims</p>
</li>
</ul>
<hr />
<h2 id="heading-testing-your-dag"><strong>Testing Your DAG</strong></h2>
<h3 id="heading-how-to-know-if-your-dag-is-right"><strong>How to Know If Your DAG Is Right</strong></h3>
<p>A DAG makes testable predictions about independence relationships.</p>
<h4 id="heading-d-separation-reading-independence-from-structure"><strong>D-Separation: Reading Independence from Structure</strong></h4>
<p>Two variables are <strong>d-separated</strong> if all paths between them are blocked.</p>
<p><strong>Example from our DAG:</strong></p>
<p>Question: Are Environmental Stress and Plant Vigor independent?</p>
<p>Answer: <strong>Yes, they're independent</strong> (no path connects them).</p>
<p><strong>Testable prediction:</strong> In our data, Environmental Stress and Plant Vigor should be uncorrelated.</p>
<p>If we find correlation, our DAG is wrong!</p>
<pre><code class="lang-python"><span class="hljs-comment"># Test independence</span>
<span class="hljs-keyword">from</span> scipy.stats <span class="hljs-keyword">import</span> pearsonr

correlation, p_value = pearsonr(
    data[<span class="hljs-string">'environmental_stress'</span>], 
    data[<span class="hljs-string">'plant_vigor'</span>]
)

print(<span class="hljs-string">f"Correlation: <span class="hljs-subst">{correlation:<span class="hljs-number">.4</span>f}</span>"</span>)
print(<span class="hljs-string">f"P-value: <span class="hljs-subst">{p_value:<span class="hljs-number">.4</span>f}</span>"</span>)

<span class="hljs-keyword">if</span> p_value &gt; <span class="hljs-number">0.05</span>:
    print(<span class="hljs-string">"✓ Independent (DAG validated)"</span>)
<span class="hljs-keyword">else</span>:
    print(<span class="hljs-string">"✗ Dependent (DAG may be wrong)"</span>)
</code></pre>
<h4 id="heading-conditional-independence-tests"><strong>Conditional Independence Tests</strong></h4>
<p>More complex: variables may be independent given others.</p>
<p><strong>Example:</strong></p>
<p>Are Watering Practice and Disease independent given Leaf Moisture?</p>
<p><strong>Claim:</strong> Watering affects Disease ONLY through Moisture.</p>
<p><strong>Test:</strong> Given Moisture, Watering and Disease should be independent.</p>
<p>In notation: <code>Watering ⊥ Disease | Moisture</code></p>
<p><strong>Python test:</strong></p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> scipy.stats <span class="hljs-keyword">import</span> chi2_contingency

<span class="hljs-comment"># Group by leaf moisture levels</span>
data[<span class="hljs-string">'moisture_level'</span>] = pd.cut(
    data[<span class="hljs-string">'leaf_moisture_hours'</span>], 
    bins=<span class="hljs-number">3</span>, 
    labels=[<span class="hljs-string">'low'</span>, <span class="hljs-string">'med'</span>, <span class="hljs-string">'high'</span>]
)

<span class="hljs-comment"># Within each moisture level, test independence</span>
print(<span class="hljs-string">"Testing: Watering ⊥ Disease | Moisture\n"</span>)

<span class="hljs-keyword">for</span> level <span class="hljs-keyword">in</span> [<span class="hljs-string">'low'</span>, <span class="hljs-string">'med'</span>, <span class="hljs-string">'high'</span>]:
    subset = data[data[<span class="hljs-string">'moisture_level'</span>] == level]

    <span class="hljs-comment"># Contingency table: watering vs disease</span>
    table = pd.crosstab(
        subset[<span class="hljs-string">'watering_practice'</span>], 
        subset[<span class="hljs-string">'disease_present'</span>]
    )

    <span class="hljs-comment"># Chi-square test</span>
    chi2, p_value, dof, expected = chi2_contingency(table)

    print(<span class="hljs-string">f"Moisture <span class="hljs-subst">{level}</span>: p-value = <span class="hljs-subst">{p_value:<span class="hljs-number">.4</span>f}</span>"</span>)
    <span class="hljs-keyword">if</span> p_value &gt; <span class="hljs-number">0.05</span>:
        print(<span class="hljs-string">"  ✓ Independent (DAG validated)"</span>)
    <span class="hljs-keyword">else</span>:
        print(<span class="hljs-string">"  ✗ Dependent (DAG may be wrong)"</span>)
    print()
</code></pre>
<p><strong>Expected output:</strong></p>
<pre><code class="lang-plaintext">Testing: Watering ⊥ Disease | Moisture

Moisture low: p-value = 0.3421
  ✓ Independent (DAG validated)

Moisture med: p-value = 0.5634
  ✓ Independent (DAG validated)

Moisture high: p-value = 0.4523
  ✓ Independent (DAG validated)
</code></pre>
<p>If independence holds, our DAG structure is supported by data.</p>
<h4 id="heading-what-if-tests-fail"><strong>What If Tests Fail?</strong></h4>
<p>If your DAG fails independence tests:</p>
<p><strong>1. Missing arrow:</strong> Add direct causal link</p>
<ul>
<li>Example: Watering → Disease (direct effect we missed)</li>
</ul>
<p><strong>2. Wrong direction:</strong> Reverse an arrow</p>
<ul>
<li>Example: Maybe Disease → Moisture (sick plants retain water?)</li>
</ul>
<p><strong>3. Missing confounder:</strong> Add common cause</p>
<ul>
<li>Example: Season → both Watering AND Disease</li>
</ul>
<p><strong>4. Wrong assumptions:</strong> Reconsider causal mechanism</p>
<ul>
<li>Example: Different disease types have different causal paths</li>
</ul>
<p><strong>Iterate:</strong> Build DAG → Test → Revise → Repeat</p>
<p>This is the scientific method applied to causal structure!</p>
<h3 id="heading-advanced-validation-falsification-tests"><strong>Advanced Validation: Falsification Tests</strong></h3>
<pre><code class="lang-python"><span class="hljs-comment"># DoWhy includes built-in refutation tests</span>
<span class="hljs-keyword">from</span> dowhy <span class="hljs-keyword">import</span> CausalModel

<span class="hljs-comment"># Refute by adding random common cause</span>
refutation = model.refute_estimate(
    identified_estimand,
    estimate,
    method_name=<span class="hljs-string">"random_common_cause"</span>
)
print(refutation)

<span class="hljs-comment"># Expected: Effect should remain stable</span>
<span class="hljs-comment"># If effect changes dramatically, DAG may be missing confounders</span>
</code></pre>
<hr />
<h2 id="heading-practical-tips-for-dag-construction"><strong>Practical Tips for DAG Construction</strong></h2>
<h3 id="heading-start-simple-iterate"><strong>Start Simple, Iterate</strong></h3>
<p><strong>Don't try to model everything at once:</strong></p>
<ol>
<li><p><strong>Start with 3-5 key variables</strong></p>
<ul>
<li><p>Treatment of interest</p>
</li>
<li><p>Outcome of interest</p>
</li>
<li><p>1-3 confounders</p>
</li>
</ul>
</li>
<li><p><strong>Add complexity gradually</strong></p>
<ul>
<li><p>Mediators</p>
</li>
<li><p>Effect modifiers</p>
</li>
<li><p>Additional confounders</p>
</li>
</ul>
</li>
<li><p><strong>Test at each step</strong></p>
<ul>
<li><p>Validate new arrows</p>
</li>
<li><p>Check independence claims</p>
</li>
<li><p>Ensure model still makes sense</p>
</li>
</ul>
</li>
</ol>
<h3 id="heading-use-domain-expertise"><strong>Use Domain Expertise</strong></h3>
<p><strong>Best practices:</strong></p>
<ul>
<li><p><strong>Interview domain experts:</strong> "What causes X?" "Does Y affect Z directly?"</p>
</li>
<li><p><strong>Review literature:</strong> What causal mechanisms are established?</p>
</li>
<li><p><strong>Start with consensus:</strong> Build on well-known relationships</p>
</li>
<li><p><strong>Document assumptions:</strong> Write down why each arrow exists</p>
</li>
<li><p><strong>Invite criticism:</strong> Ask skeptics "What am I missing?"</p>
</li>
</ul>
<h3 id="heading-common-mistakes-to-avoid"><strong>Common Mistakes to Avoid</strong></h3>
<p><strong>1. Arrows everywhere</strong></p>
<ul>
<li><p>Don't connect everything</p>
</li>
<li><p>Missing arrows are meaningful (independence claims)</p>
</li>
</ul>
<p><strong>2. Correlation → Arrow</strong></p>
<ul>
<li><p>Just because X and Y correlate doesn't mean X → Y</p>
</li>
<li><p>Check for confounders first</p>
</li>
</ul>
<p><strong>3. Forgetting time</strong></p>
<ul>
<li><p>Causes must precede effects</p>
</li>
<li><p>Check temporal ordering</p>
</li>
</ul>
<p><strong>4. Ignoring mechanisms</strong></p>
<ul>
<li><p>Ask "HOW does X cause Y?"</p>
</li>
<li><p>If you can't explain it, maybe it's not causal</p>
</li>
</ul>
<p><strong>5. No validation</strong></p>
<ul>
<li><p>Always test your DAG</p>
</li>
<li><p>Data should support structure</p>
</li>
</ul>
<hr />
<h2 id="heading-complete-working-example"><strong>Complete Working Example</strong></h2>
<p>Here's a complete, runnable script you can use as a template:</p>
<pre><code class="lang-python"><span class="hljs-string">"""
Complete Causal DAG Implementation
Plant Disease Detection Example
"""</span>

<span class="hljs-keyword">from</span> dowhy <span class="hljs-keyword">import</span> CausalModel
<span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt
<span class="hljs-keyword">from</span> scipy.stats <span class="hljs-keyword">import</span> chi2_contingency, pearsonr

<span class="hljs-comment"># Set random seed for reproducibility</span>
np.random.seed(<span class="hljs-number">42</span>)

<span class="hljs-comment"># Define causal graph</span>
causal_graph = <span class="hljs-string">"""
digraph {
    Environmental_Stress [label="Environmental Stress"];
    Watering_Practice [label="Watering Practice"];
    Plant_Vigor [label="Plant Vigor"];
    Leaf_Moisture [label="Leaf Moisture"];
    Pathogen_Growth [label="Pathogen Growth"];
    Disease_Present [label="Disease Present"];
    Symptom_Severity [label="Symptom Severity"];

    Environmental_Stress -&gt; Leaf_Moisture;
    Watering_Practice -&gt; Leaf_Moisture;
    Leaf_Moisture -&gt; Pathogen_Growth;
    Pathogen_Growth -&gt; Disease_Present;
    Disease_Present -&gt; Symptom_Severity;
    Plant_Vigor -&gt; Symptom_Severity;
}
"""</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">generate_causal_data</span>(<span class="hljs-params">n_samples=<span class="hljs-number">1000</span></span>):</span>
    <span class="hljs-string">"""Generate data following the causal DAG structure."""</span>

    data = pd.DataFrame({
        <span class="hljs-string">'environmental_stress'</span>: np.random.beta(<span class="hljs-number">2</span>, <span class="hljs-number">5</span>, n_samples),
        <span class="hljs-string">'watering_practice'</span>: np.random.choice([<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, <span class="hljs-number">2</span>], n_samples),
        <span class="hljs-string">'plant_vigor'</span>: np.random.beta(<span class="hljs-number">8</span>, <span class="hljs-number">2</span>, n_samples),
        <span class="hljs-string">'leaf_moisture_hours'</span>: np.zeros(n_samples),
        <span class="hljs-string">'pathogen_growth'</span>: np.zeros(n_samples),
        <span class="hljs-string">'disease_present'</span>: np.zeros(n_samples),
        <span class="hljs-string">'symptom_severity'</span>: np.zeros(n_samples),
    })

    <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(n_samples):
        <span class="hljs-comment"># Causal mechanism 1: Environmental Stress + Watering → Leaf Moisture</span>
        base_moisture = <span class="hljs-number">5.0</span>
        stress_effect = data.loc[i, <span class="hljs-string">'environmental_stress'</span>] * <span class="hljs-number">10</span>
        watering_effect = [<span class="hljs-number">-3</span>, <span class="hljs-number">0</span>, <span class="hljs-number">5</span>][data.loc[i, <span class="hljs-string">'watering_practice'</span>]]

        data.loc[i, <span class="hljs-string">'leaf_moisture_hours'</span>] = np.clip(
            base_moisture + stress_effect + watering_effect + np.random.normal(<span class="hljs-number">0</span>, <span class="hljs-number">1</span>),
            <span class="hljs-number">0</span>, <span class="hljs-number">24</span>
        )

        <span class="hljs-comment"># Causal mechanism 2: Leaf Moisture → Pathogen Growth</span>
        moisture = data.loc[i, <span class="hljs-string">'leaf_moisture_hours'</span>]
        data.loc[i, <span class="hljs-string">'pathogen_growth'</span>] = np.clip(
            (moisture / <span class="hljs-number">24</span>) ** <span class="hljs-number">1.5</span> + np.random.normal(<span class="hljs-number">0</span>, <span class="hljs-number">0.1</span>),
            <span class="hljs-number">0</span>, <span class="hljs-number">1</span>
        )

        <span class="hljs-comment"># Causal mechanism 3: Pathogen Growth → Disease</span>
        pathogen = data.loc[i, <span class="hljs-string">'pathogen_growth'</span>]
        data.loc[i, <span class="hljs-string">'disease_present'</span>] = <span class="hljs-number">1</span> <span class="hljs-keyword">if</span> pathogen &gt; <span class="hljs-number">0.6</span> <span class="hljs-keyword">else</span> <span class="hljs-number">0</span>

        <span class="hljs-comment"># Causal mechanism 4: Disease + Plant Vigor → Symptom Severity</span>
        disease = data.loc[i, <span class="hljs-string">'disease_present'</span>]
        vigor = data.loc[i, <span class="hljs-string">'plant_vigor'</span>]
        data.loc[i, <span class="hljs-string">'symptom_severity'</span>] = np.clip(
            disease * (<span class="hljs-number">1</span> - vigor * <span class="hljs-number">0.5</span>) + np.random.normal(<span class="hljs-number">0</span>, <span class="hljs-number">0.1</span>),
            <span class="hljs-number">0</span>, <span class="hljs-number">1</span>
        )

    <span class="hljs-keyword">return</span> data

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">validate_dag</span>(<span class="hljs-params">data</span>):</span>
    <span class="hljs-string">"""Run validation tests on the DAG structure."""</span>

    print(<span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)
    print(<span class="hljs-string">"DAG VALIDATION TESTS"</span>)
    print(<span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)

    <span class="hljs-comment"># Test 1: Environmental Stress ⊥ Plant Vigor</span>
    print(<span class="hljs-string">"\n1. Testing: Environmental_Stress ⊥ Plant_Vigor"</span>)
    corr, p_val = pearsonr(data[<span class="hljs-string">'environmental_stress'</span>], data[<span class="hljs-string">'plant_vigor'</span>])
    print(<span class="hljs-string">f"   Correlation: <span class="hljs-subst">{corr:<span class="hljs-number">.4</span>f}</span>, P-value: <span class="hljs-subst">{p_val:<span class="hljs-number">.4</span>f}</span>"</span>)
    <span class="hljs-keyword">if</span> p_val &gt; <span class="hljs-number">0.05</span>:
        print(<span class="hljs-string">"   ✓ Independent (as expected)"</span>)
    <span class="hljs-keyword">else</span>:
        print(<span class="hljs-string">"   ✗ Dependent (DAG may be wrong!)"</span>)

    <span class="hljs-comment"># Test 2: Watering ⊥ Disease | Leaf Moisture</span>
    print(<span class="hljs-string">"\n2. Testing: Watering ⊥ Disease | Leaf_Moisture"</span>)
    data[<span class="hljs-string">'moisture_level'</span>] = pd.cut(
        data[<span class="hljs-string">'leaf_moisture_hours'</span>], 
        bins=<span class="hljs-number">3</span>, 
        labels=[<span class="hljs-string">'low'</span>, <span class="hljs-string">'med'</span>, <span class="hljs-string">'high'</span>]
    )

    independence_holds = <span class="hljs-literal">True</span>
    <span class="hljs-keyword">for</span> level <span class="hljs-keyword">in</span> [<span class="hljs-string">'low'</span>, <span class="hljs-string">'med'</span>, <span class="hljs-string">'high'</span>]:
        subset = data[data[<span class="hljs-string">'moisture_level'</span>] == level]
        <span class="hljs-keyword">if</span> len(subset) &lt; <span class="hljs-number">10</span>:
            <span class="hljs-keyword">continue</span>

        table = pd.crosstab(subset[<span class="hljs-string">'watering_practice'</span>], subset[<span class="hljs-string">'disease_present'</span>])
        chi2, p_val, dof, expected = chi2_contingency(table)

        print(<span class="hljs-string">f"   Moisture <span class="hljs-subst">{level}</span>: p-value = <span class="hljs-subst">{p_val:<span class="hljs-number">.4</span>f}</span>"</span>, end=<span class="hljs-string">""</span>)
        <span class="hljs-keyword">if</span> p_val &gt; <span class="hljs-number">0.05</span>:
            print(<span class="hljs-string">" ✓"</span>)
        <span class="hljs-keyword">else</span>:
            print(<span class="hljs-string">" ✗"</span>)
            independence_holds = <span class="hljs-literal">False</span>

    <span class="hljs-keyword">if</span> independence_holds:
        print(<span class="hljs-string">"   ✓ Conditional independence holds"</span>)
    <span class="hljs-keyword">else</span>:
        print(<span class="hljs-string">"   ✗ Conditional independence violated"</span>)

    print(<span class="hljs-string">"\n"</span> + <span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">estimate_causal_effect</span>(<span class="hljs-params">data, causal_graph</span>):</span>
    <span class="hljs-string">"""Estimate causal effect using DoWhy."""</span>

    print(<span class="hljs-string">"\n"</span> + <span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)
    print(<span class="hljs-string">"CAUSAL EFFECT ESTIMATION"</span>)
    print(<span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)

    <span class="hljs-comment"># Create causal model</span>
    model = CausalModel(
        data=data,
        treatment=<span class="hljs-string">'leaf_moisture_hours'</span>,
        outcome=<span class="hljs-string">'symptom_severity'</span>,
        graph=causal_graph,
        common_causes=[<span class="hljs-string">'environmental_stress'</span>, <span class="hljs-string">'watering_practice'</span>],
        effect_modifiers=[<span class="hljs-string">'plant_vigor'</span>]
    )

    <span class="hljs-comment"># Identify causal effect</span>
    identified_estimand = model.identify_effect(proceed_when_unidentifiable=<span class="hljs-literal">True</span>)
    print(<span class="hljs-string">"\nIdentified Estimand:"</span>)
    print(identified_estimand)

    <span class="hljs-comment"># Estimate effect</span>
    estimate = model.estimate_effect(
        identified_estimand,
        method_name=<span class="hljs-string">"backdoor.linear_regression"</span>
    )

    print(<span class="hljs-string">f"\nCausal Effect: <span class="hljs-subst">{estimate.value:<span class="hljs-number">.4</span>f}</span>"</span>)
    print(<span class="hljs-string">f"Interpretation: Each additional hour of leaf moisture"</span>)
    print(<span class="hljs-string">f"causes a <span class="hljs-subst">{estimate.value:<span class="hljs-number">.4</span>f}</span> increase in symptom severity"</span>)

    <span class="hljs-comment"># Refutation test</span>
    print(<span class="hljs-string">"\nRefutation Test (Random Common Cause):"</span>)
    refutation = model.refute_estimate(
        identified_estimand,
        estimate,
        method_name=<span class="hljs-string">"random_common_cause"</span>
    )
    print(refutation)

    <span class="hljs-keyword">return</span> model, estimate

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">main</span>():</span>
    <span class="hljs-string">"""Run complete DAG analysis."""</span>

    print(<span class="hljs-string">"\n"</span> + <span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)
    print(<span class="hljs-string">"BUILDING CAUSAL DAG: PLANT DISEASE DETECTION"</span>)
    print(<span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)

    <span class="hljs-comment"># Generate data</span>
    print(<span class="hljs-string">"\nGenerating synthetic data (n=1000)..."</span>)
    data = generate_causal_data(n_samples=<span class="hljs-number">1000</span>)

    print(<span class="hljs-string">f"\nData Summary:"</span>)
    print(<span class="hljs-string">f"Disease prevalence: <span class="hljs-subst">{data[<span class="hljs-string">'disease_present'</span>].mean():<span class="hljs-number">.2</span>%}</span>"</span>)
    print(<span class="hljs-string">f"Mean symptom severity: <span class="hljs-subst">{data[<span class="hljs-string">'symptom_severity'</span>].mean():<span class="hljs-number">.3</span>f}</span>"</span>)
    print(<span class="hljs-string">f"Mean leaf moisture: <span class="hljs-subst">{data[<span class="hljs-string">'leaf_moisture_hours'</span>].mean():<span class="hljs-number">.2</span>f}</span> hours"</span>)

    <span class="hljs-comment"># Validate DAG</span>
    validate_dag(data)

    <span class="hljs-comment"># Estimate causal effect</span>
    model, estimate = estimate_causal_effect(data, causal_graph)

    print(<span class="hljs-string">"\n"</span> + <span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)
    print(<span class="hljs-string">"ANALYSIS COMPLETE"</span>)
    print(<span class="hljs-string">"="</span> * <span class="hljs-number">60</span>)
    print(<span class="hljs-string">"\nNext Steps:"</span>)
    print(<span class="hljs-string">"1. Part 3: Use this DAG for counterfactual reasoning"</span>)
    print(<span class="hljs-string">"2. Part 4: Design interventions based on causal effects"</span>)
    print(<span class="hljs-string">"3. Part 5: Scale to production systems"</span>)

    <span class="hljs-keyword">return</span> data, model, estimate

<span class="hljs-keyword">if</span> __name__ == <span class="hljs-string">"__main__"</span>:
    data, model, estimate = main()
</code></pre>
<p><strong>Save this as</strong> <code>causal_dag.py</code> and run:</p>
<pre><code class="lang-bash">python causal_dag.py
</code></pre>
<hr />
<h2 id="heading-youve-built-a-causal-model"><strong>You've Built a Causal Model</strong></h2>
<p>Congratulations! You now have:</p>
<p>✅ A complete causal DAG for plant disease<br />✅ Understanding of confounders, mediators, colliders<br />✅ Working Python implementation with DoWhy<br />✅ Methods to validate your causal structure<br />✅ Template code you can adapt to any domain</p>
<p><strong>This is the foundation.</strong> Everything we do next builds on this DAG.</p>
<hr />
<h2 id="heading-whats-next-counterfactual-reasoning"><strong>What's Next: Counterfactual Reasoning</strong></h2>
<p>In <strong>Part 3</strong> (Friday, Jan 16), we'll use this DAG to answer questions like:</p>
<ul>
<li><p>"This plant has disease. <strong>Would it be healthy if I had watered less?</strong>"</p>
</li>
<li><p>"I applied intervention X. <strong>What would have happened without it?</strong>"</p>
</li>
<li><p>"<strong>Why</strong> did this specific plant get diseased when that one didn't?"</p>
</li>
</ul>
<p>These are <strong>counterfactual</strong> questions—the most powerful form of causal reasoning.</p>
<p>We'll implement:</p>
<ul>
<li><p>Counterfactual inference algorithms</p>
</li>
<li><p>"What if" scenario analysis</p>
</li>
<li><p>Personalized explanation generation</p>
</li>
<li><p>Individual treatment effect estimation</p>
</li>
</ul>
<hr />
<h2 id="heading-your-homework-before-part-3"><strong>Your Homework Before Part 3</strong></h2>
<p><strong>1. Run the code</strong> in this article</p>
<ul>
<li><p>Generate the data</p>
</li>
<li><p>Build the DAG</p>
</li>
<li><p>Validate the structure</p>
</li>
<li><p>Estimate causal effects</p>
</li>
</ul>
<p><strong>2. Modify the DAG</strong></p>
<ul>
<li><p>Add a new variable (e.g., "Soil Quality")</p>
</li>
<li><p>Add corresponding arrows</p>
</li>
<li><p>Update the data generation</p>
</li>
<li><p>Test if it still validates</p>
</li>
</ul>
<p><strong>3. Apply to your domain</strong></p>
<ul>
<li><p>Think about a problem you're working on</p>
</li>
<li><p>Identify 5-7 key variables</p>
</li>
<li><p>Draw a DAG on paper</p>
</li>
<li><p>What causal questions would you want to answer?</p>
</li>
</ul>
<p><strong>4. Prepare questions</strong></p>
<ul>
<li><p>What's unclear about DAG construction?</p>
</li>
<li><p>What validation tests are you curious about?</p>
</li>
<li><p>What challenges do you foresee for your domain?</p>
</li>
</ul>
<p>Bring these to Part 3. We're going deeper.</p>
<hr />
<p><strong>Series Navigation:</strong></p>
<ul>
<li><p><a target="_blank" href="https://blog.neoforgelabs.tech/why-causality-matters-for-ai">← Part 1: Why Causality Matters</a></p>
</li>
<li><p><strong>Part 2: Building Your First Causal DAG</strong> ← You are here</p>
</li>
<li><p><a target="_blank" href="https://file+.vscode-resource.vscode-cdn.net/Volumes/CORTEX/neoforge/content/blog/tutorials/part-2-building-causal-dag.md#">Part 3: Counterfactual Reasoning</a> (Jan 16)</p>
</li>
<li><p>Part 4: Intervention Design (Jan 21)</p>
</li>
<li><p>Part 5: Distributed Systems (Jan 23)</p>
</li>
</ul>
<p><strong>Code &amp; Resources:</strong></p>
<ul>
<li><p><a target="_blank" href="https://github.com/cod3smith/plant-disease-causal">GitHub Repository</a></p>
</li>
<li><p><a target="_blank" href="https://microsoft.github.io/dowhy/">DoWhy Documentation</a></p>
</li>
<li><p><a target="_blank" href="http://dagitty.net/">DAGitty (Interactive DAG Tool)</a></p>
</li>
</ul>
<hr />
<p><em>This is part of my research at NeoForge Labs on causal AI systems. Follow along as we build production-grade causal reasoning from scratch.</em></p>
<p><strong>Questions?</strong> Drop them in the comments below. I read and respond to everything.</p>
<p><strong>Found this useful?</strong> Share it with someone who's struggling with production ML failures. Let's build better AI together.</p>
]]></content:encoded></item><item><title><![CDATA[Part 1: Why Causality Matters for AI]]></title><description><![CDATA[Your AI model achieves 95% accuracy predicting plant diseases from images. Impressive, right?
You deploy it to farmers. It works… until it doesn't. When farmers follow its recommendations, nothing happens. Sometimes, things get worse. The model saw p...]]></description><link>https://blog.neoforgelabs.tech/part-1-why-causality-matters-for-ai</link><guid isPermaLink="true">https://blog.neoforgelabs.tech/part-1-why-causality-matters-for-ai</guid><category><![CDATA[Machine Learning]]></category><category><![CDATA[Artificial Intelligence]]></category><category><![CDATA[Programming Blogs]]></category><category><![CDATA[Data Science]]></category><dc:creator><![CDATA[Kelyn Njeri]]></dc:creator><pubDate>Mon, 12 Jan 2026 04:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1768049889796/645dac51-c859-44ea-9508-2729538a558d.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Your AI model achieves 95% accuracy predicting plant diseases from images. Impressive, right?</p>
<p>You deploy it to farmers. It works… until it doesn't. When farmers follow its recommendations, nothing happens. Sometimes, things get worse. The model saw patterns, learned correlations, but understood nothing about why diseases occur or what actually causes them. This is the correlation trap, and it's everywhere in modern AI.</p>
<p>Today, we're going to explore why the future of AI isn't just about bigger models or more data. It's about causality: understanding the mechanisms that generate our data, not just the patterns within it. By the end of the series, you'll build a causal reasoning system that doesn't just predict plant diseases, it explains why they occur and recommends interventions that actually work.</p>
<p>Let's start with the fundamental question: What's the difference between correlation and causation, and why should you care?</p>
<hr />
<h2 id="heading-the-pattern-recognition-machine">The Pattern Recognition Machine</h2>
<p>Modern machine learning is fundamentally a pattern-matching engine. Given data about X and Y, it learns:</p>
<p><strong>P(Y | X)</strong> - "What is the probability of Y given that we observe X?"</p>
<p>This works brilliantly for:</p>
<ul>
<li><p><strong>Image Classification:</strong> "Given these pixels, is this a cat?"</p>
</li>
<li><p><strong>Recommendation systems:</strong> "Given this user's history, what will they like?"</p>
</li>
<li><p><strong>Spam detection:</strong> "Given this email's features, is it spam?"</p>
</li>
</ul>
<p>But here's the problem: <strong>observing X is not the same as changing X.</strong></p>
<h3 id="heading-the-classic-trap-ice-cream-and-drowning">The Classic Trap: Ice Cream and Drowning</h3>
<p>Imagine you're building a public safety AI. Your model discovers a strong correlation:</p>
<p>When ice cream sales go up, drowning deaths also go up.</p>
<p>Should you ban ice cream to prevent drowning?</p>
<p>Obviously not. The real causal structure is:</p>
<ul>
<li><p>Hot weather increases ice cream sales</p>
</li>
<li><p>Hot weather increases number of people going swimming hence leading to more drowning deaths</p>
</li>
</ul>
<p>In this case, hot weather is a <strong>confounder</strong>, it causes both variables. Ice cream sales and drowning deaths are correlated but not causally related. Your ML model sees the correlation but it has no idea about the mechanism.</p>
<h3 id="heading-why-this-breaks-in-production">Why This Breaks in Production</h3>
<p>You might think, "Sure, but that's an obvious example. In practice, we'd catch that." Would you?</p>
<p>Consider our plant disease detector:</p>
<ul>
<li><p>It learns: Yellowing leaves → Nitrogen deficiency</p>
</li>
<li><p>Correlation: 90% accuracy</p>
</li>
</ul>
<p>But what it misses:</p>
<ul>
<li><p>Overwatering → Root rot → Yellowing</p>
</li>
<li><p>Fungal infection → Yellowing</p>
</li>
<li><p>Natural senescence → Yellowing</p>
</li>
</ul>
<p>The model sees "yellowing = nitrogen deficiency" because that's the most common pattern in the training data. But when you apply nitrogen fertilizer to an overwatered plant, you make the problem worse.</p>
<p><strong>Correlation told you what's common. Causation tells you what actually works.</strong></p>
<hr />
<h2 id="heading-pearls-ladder-the-three-levels-of-intelligence">Pearl's Ladder: The Three Levels of Intelligence</h2>
<p>Judea Pearl, the godfather of causal inference, describes three levels of causal reasoning:</p>
<p><img src="https://miro.medium.com/v2/resize:fit:1400/1*xIeo5xDE2ek1b4E-XnzC2A.png" alt /></p>
<p>Let's break these down with our plant disease example:</p>
<h3 id="heading-level-1-association-seeing">Level 1: Association (Seeing)</h3>
<p><strong>Question:</strong> "What symptoms are present?"<br /><strong>Notation:</strong> P(Symptoms | Disease)<br /><strong>ML Capability:</strong> ✅ Current AI excels here</p>
<p><strong>Example:</strong></p>
<ul>
<li><p>Observation: Plant has brown spots and yellowing leaves</p>
</li>
<li><p>Model predicts: "85% probability of early blight"</p>
</li>
</ul>
<p>This is correlation. The model sees patterns but doesn't understand the mechanisms.</p>
<h3 id="heading-level-2-intervention-doing">Level 2: Intervention (Doing)</h3>
<p><strong>Question:</strong> "What happens if I change watering frequency?"<br /><strong>Notation:</strong> P(Disease | do(Watering = optimal))<br /><strong>ML Capability:</strong> ❌ Most AI fails here</p>
<p>The <strong>do()</strong> operator is crucial. It represents intervention, actively changing a variable, not just observing it.</p>
<p><strong>Example:</strong></p>
<ul>
<li><p><strong>Observational:</strong> P(Disease | Watering = high) might show correlation</p>
</li>
<li><p><strong>Interventional:</strong> P(Disease | do(Watering = optimal)) shows causal effect</p>
</li>
</ul>
<p><strong>The difference:</strong></p>
<ul>
<li><p>Observation: Plants that are overwatered tend to be diseased (maybe because sick plants retain water?)</p>
</li>
<li><p>Intervention: If we reduce watering, does disease decrease? (causal effect)</p>
</li>
</ul>
<h3 id="heading-level-3-counterfactuals-imagining">Level 3: Counterfactuals (Imagining)</h3>
<p><strong>Question:</strong> "Would this plant be healthy if I had watered it differently?"<br /><strong>Notation:</strong> P(Healthy | Watered differently, saw disease)<br /><strong>ML Capability:</strong> ❌❌ Almost no AI does this</p>
<p>This is the most powerful level. You're asking about alternate realities:</p>
<p><strong>Example:</strong></p>
<ul>
<li><p>Factual: "I watered heavily, and the plant developed root rot"</p>
</li>
<li><p>Counterfactual: "<strong>If I had watered moderately, would the plant be healthy?</strong>"</p>
</li>
</ul>
<p>This requires understanding:</p>
<ol>
<li><p>The causal mechanism (overwatering → root rot)</p>
</li>
<li><p>The specific instance (this plant, these conditions)</p>
</li>
<li><p>Alternate histories (what would have been different)</p>
</li>
</ol>
<p><strong>Most AI systems operate at Level 1. Human experts operate at Levels 2 and 3. We're going to build AI that does the same.</strong></p>
<hr />
<h2 id="heading-the-problems-with-pure-correlation">The Problems with Pure Correlation</h2>
<p>Let's be concrete about why correlation-based ML fails in practice:</p>
<h3 id="heading-problem-1-distribution-shift">Problem 1: Distribution Shift</h3>
<p>Your model learns from data collected in:</p>
<ul>
<li><p>Season: Summer</p>
</li>
<li><p>Location: Greenhouse A</p>
</li>
<li><p>Conditions: Controlled environment</p>
</li>
</ul>
<p>You deploy to:</p>
<ul>
<li><p>Season: Winter</p>
</li>
<li><p>Location: Outdoor farm</p>
</li>
<li><p>Conditions: Wild weather variation</p>
</li>
</ul>
<p><strong>What happens?</strong> All the correlations change. Your model has no idea what remains true (causal relationships) vs. what was just a coincidence (spurious correlation).</p>
<h3 id="heading-problem-2-spurious-correlations">Problem 2: Spurious Correlations</h3>
<p>Training data artifact: Most diseased plants in your dataset are near the south wall of the greenhouse. The model then learns to correlate south wall to disease.</p>
<p><strong>Reality:</strong> South wall gets more light → higher temperature → more humidity → disease.</p>
<p>When you tell a farmer, "move your plants away from south-facing walls," you've given useless advice based on spurious correlation.</p>
<p><strong>With causal knowledge:</strong> You'd recommend humidity control, which actually addresses the mechanism.</p>
<h3 id="heading-problem-3-no-intervention-guidance">Problem 3: No Intervention Guidance</h3>
<p>Even when your model correctly identifies disease, it can't answer:</p>
<ul>
<li><p>What should I do about it?</p>
</li>
<li><p>Which intervention will be most effective?</p>
</li>
<li><p>What's the root cause I should address?</p>
</li>
</ul>
<p>It can only tell you: "This looks like early blight" (association).</p>
<p>It cannot tell you: "Reduce watering and improve air circulation" (intervention).</p>
<h3 id="heading-what-we-need-instead">What We Need Instead</h3>
<p>A causal model that:</p>
<ol>
<li><p><strong>Explains mechanisms:</strong> Why does disease occur?</p>
</li>
<li><p><strong>Predicts interventions:</strong> What happens if I change X?</p>
</li>
<li><p><strong>Handles distribution shift:</strong> Which relationships are stable across contexts?</p>
</li>
<li><p><strong>Enables counterfactual reasoning:</strong> What would have happened if…?</p>
</li>
</ol>
<p>This is what we're building in this series.</p>
<hr />
<h2 id="heading-a-different-approach-causal-graphs">A Different Approach: Causal Graphs</h2>
<p>Instead of learning correlations from data, we explicitly model causal relationships:</p>
<p><img src="https://miro.medium.com/v2/resize:fit:1400/1*sO4LcMxgk9iCUliG3qIqaQ.png" alt /></p>
<p>This Directed Acyclic Graph (DAG) represents our causal understanding:</p>
<ul>
<li><p><strong>Arrows show causation</strong>, not just correlation</p>
</li>
<li><p><strong>No arrow means no direct causal effect</strong></p>
</li>
<li><p><strong>Structure encodes domain knowledge</strong></p>
</li>
</ul>
<p>With this graph, we can answer intervention questions:</p>
<p><strong>Q:</strong> "What happens if I reduce watering?"<br /><strong>A:</strong> Follow the causal path: Watering ↓ → Moisture ↓ → Pathogen Growth ↓ → Disease ↓</p>
<p>This is fundamentally different from correlation. We're modeling the <strong>data-generating process</strong>, not just patterns in data.</p>
<h3 id="heading-the-power-of-do">The Power of do()</h3>
<p>The <strong>do()</strong> operator represents intervention:</p>
<ul>
<li><p><strong>P(Disease | Watering = high):</strong> Observation (what we see)</p>
</li>
<li><p><strong>P(Disease | do(Watering = low)):</strong> Intervention (what would happen if we change it)</p>
</li>
</ul>
<p>These are different!</p>
<p>Observation includes confounders. Maybe plants that are naturally disease-prone are also overwatered by worried farmers.</p>
<p>Intervention breaks the confounding. We're asking: independent of everything else, what's the causal effect?</p>
<h3 id="heading-whats-coming-in-this-series">What's Coming in This Series</h3>
<p>Over the next 5 articles, you'll learn to:</p>
<ol>
<li><p><strong>Part 2:</strong> Build causal DAGs from domain knowledge</p>
</li>
<li><p><strong>Part 3:</strong> Use counterfactual reasoning to predict alternate outcomes</p>
</li>
<li><p><strong>Part 4:</strong> Design interventions based on causal effects</p>
</li>
<li><p><strong>Part 5:</strong> Scale causal inference to production systems</p>
</li>
</ol>
<p>By the end, you'll have built a complete causal diagnostic system for plant diseases, and you'll understand how to apply these principles to any domain.</p>
<hr />
<h2 id="heading-case-study-the-yellowing-leaves-mystery">Case Study: The Yellowing Leaves Mystery</h2>
<p>Let's make this concrete with a real diagnostic scenario.</p>
<h3 id="heading-the-correlation-approach">The Correlation Approach</h3>
<p>Farmer brings you a plant with yellowing leaves.</p>
<p>Your ML model:</p>
<ol>
<li><p>Analyzes image</p>
</li>
<li><p>Matches pattern to training data</p>
</li>
<li><p>Outputs: "80% probability: Nitrogen deficiency"</p>
</li>
</ol>
<p><strong>Recommendation:</strong> Apply nitrogen fertilizer</p>
<h3 id="heading-what-actually-happens">What Actually Happens</h3>
<p>Farmer applies nitrogen. Plant gets worse.</p>
<p><strong>Why?</strong> The actual cause was overwatering leading to root rot. Adding nitrogen to an already-sick plant stressed it further.</p>
<h3 id="heading-the-causal-approach">The Causal Approach</h3>
<p>Instead of just pattern matching, we reason causally:</p>
<p><img src="https://miro.medium.com/v2/resize:fit:1400/1*7_qX_2josISgTpcfCaoLyA.png" alt /></p>
<p><strong>Causal diagnostic process:</strong></p>
<ol>
<li><p><strong>Identify possible causes</strong> (multiple hypotheses)</p>
</li>
<li><p><strong>Check diagnostic indicators</strong> for each cause</p>
</li>
<li><p><strong>Find root cause</strong> via causal mechanism</p>
</li>
<li><p><strong>Recommend intervention</strong> targeting the actual cause</p>
</li>
</ol>
<p><strong>Results:</strong></p>
<ul>
<li><p>Soil moisture: Very high ✓</p>
</li>
<li><p>Soil nitrogen: Normal levels</p>
</li>
<li><p>Leaf spots: None</p>
</li>
<li><p>Affected leaves: Throughout plant</p>
</li>
</ul>
<p><strong>Diagnosis:</strong> Overwatering → Root rot → Nutrient uptake impaired → Yellowing</p>
<p><strong>Intervention:</strong> Reduce watering, improve drainage, let soil dry</p>
<p><strong>Outcome:</strong> Plant recovers</p>
<h3 id="heading-the-difference">The Difference</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Correlation ML</td><td>Causal Reasoning</td></tr>
</thead>
<tbody>
<tr>
<td>Pattern matching</td><td>Mechanism understanding</td></tr>
<tr>
<td>Single prediction</td><td>Multiple hypotheses</td></tr>
<tr>
<td>No "why"</td><td>Explains root cause</td></tr>
<tr>
<td>Generic recommendation</td><td>Targeted intervention</td></tr>
<tr>
<td>Fails on edge cases</td><td>Handles novel scenarios</td></tr>
</tbody>
</table>
</div><p><strong>This is why causality matters.</strong></p>
<hr />
<h2 id="heading-where-were-headed">Where We're Headed</h2>
<p>You've now seen why correlation isn't enough. Pattern matching fails when:</p>
<ul>
<li><p>Distributions shift</p>
</li>
<li><p>Interventions are needed</p>
</li>
<li><p>You need to explain "why"</p>
</li>
</ul>
<p>In <strong>Part 2</strong>, we'll get hands-on. You'll learn to:</p>
<ul>
<li><p>Build your first causal DAG</p>
</li>
<li><p>Encode domain knowledge as a graph structure</p>
</li>
<li><p>Identify confounders, mediators, and colliders</p>
</li>
<li><p>Validate your causal assumptions</p>
</li>
</ul>
<p>We'll continue with our plant disease example, constructing the complete causal graph that maps environmental factors → physiological responses → observable symptoms.</p>
<p>By the end of Part 2, you'll have a working causal model, the foundation for everything that comes after.</p>
<h3 id="heading-your-challenge">Your Challenge</h3>
<p>Before Part 2, think about a problem in your domain:</p>
<ul>
<li><p>What patterns do your ML models learn?</p>
</li>
<li><p>What's the actual causal mechanism?</p>
</li>
<li><p>Where have you seen correlation fail?</p>
</li>
</ul>
<p>Bring these questions to Part 2. We're going to build something better.</p>
<hr />
<p><strong>Series Navigation:</strong></p>
<ul>
<li><p><strong>Part 1: Why Causality Matters</strong> ← You are here</p>
</li>
<li><p><a class="post-section-overview" href="#">Part 2: Building Your First Causal DAG</a> (Jan 15)</p>
</li>
<li><p><a class="post-section-overview" href="#">Part 3: Counterfactual Reasoning</a> (Jan 17)</p>
</li>
<li><p>Part 4: Intervention Design (Jan 22)</p>
</li>
<li><p>Part 5: Distributed Systems (Jan 24)</p>
</li>
</ul>
<hr />
<p><em>This is part of my research at NeoForge Labs on causal AI systems. Follow along as we build production-grade causal reasoning from scratch.</em></p>
<p><strong>Questions?</strong> Drop them in the comments below.</p>
]]></content:encoded></item></channel></rss>