Methods & Pipelines: Decision Guide

Practical, decision-first guidance for implementing the analyses.

This guide helps you choose the right analysis approach for your data. It also documents the two validated pipelines developed in this project and the validation strategies used to confirm results.

Which Pipeline Should You Use?

graph TD
    A[Start: Biomarker Discovery Goal] --> B{Multiple independent<br/>datasets available?}
    B -->|No — single study| C[Use within-study validation only;<br/>generalizability will be limited]
    B -->|Yes| D{Do you believe biomarkers<br/>are reactive to stress?}
    D -->|Unsure — test both| E[Run Two-Step Classifier<br/>on all samples]
    D -->|Yes — reactive only| F{Do you have control<br/>AND treated groups?}
    F -->|No| E
    F -->|Yes| G{Will Step 1 yield<br/>> 100 stress genes?}
    G -->|Unsure / No| E
    G -->|Yes| H[Try Stepwise Differential<br/>Abundance first]
    H --> I{Does Step 2 yield<br/>convincing results?}
    I -->|No| E
    I -->|Yes| J[Validate with LOSO]
    E --> J

When in doubt: Use the Two-Step Classifier. It is the primary validated approach in this project, does not require control groups, and preserves both innate and reactive biomarkers.

Use Stepwise Differential Abundance only if you have a strong prior belief that biomarkers are reactive (not innate), and you have sufficient stress-responsive genes (> 100) surviving Step 1.

Pipeline Overview

	Stepwise Differential Abundance	Two-Step Classifier
Status	⚠️ Partially validated	✅ Validated (6-gene panel)
Requires control groups	Yes	No
Captures innate biomarkers	❌ No	✅ Yes
Handles small gene sets	❌ VST breaks down	✅ Logistic regression works
Best for	Reactive biomarkers, large gene sets	Any dataset design
Cross-study validation	LOSO	LOSO

Pipeline Details

4b. Stepwise Differential Abundance — Two-step filtering: control vs. treated → resistant vs. sensitive
4c. Two-Step Classifier — Reproducibility scoring → logistic regression; primary validated approach
4d. Validation & Pitfalls — Avoiding overfitting, LOSO protocol, batch effects

Next: Start with the Two-Step Classifier if you’re implementing a new analysis, or read Validation & Pitfalls to understand cross-study validation requirements.