✨ Welcome to Almond Farm ✨     🌟 Updated daily! 🌟     ★ Sign our guestbook! ★     💾 Bookmark this page! 💾     🔥 Best site on the web 🔥     ✨ get on farming ✨    
🔬 PUBLIC HEALTH — UNLIKELY CONNECTIONS EXPLORER
_

An open-source, multi-agent research platform that surfaces hidden correlations between health outcomes and environmental, occupational, and geographic factors. Integrates public datasets, scientific literature, and structured causal analysis to identify leads that warrant formal investigation.

Python Claude API (claude-sonnet-4-20250514) Streamlit
Active Project

The Unlikely Correlations Engine

Systematic detection of hidden health-environment associations

Motivation

Epidemiological research identified elevated Parkinson's disease rates among populations living near golf courses. Investigation revealed the underlying factor was chronic pesticide exposure from course maintenance. Residential proximity to treated land — not the recreational facility itself — was the relevant variable. The association was detectable in existing public datasets, but required cross-referencing sources that are not typically analyzed together.

Approach

This project applies that same cross-referencing approach systematically. The platform correlates health outcome data with geographic, occupational, environmental, and demographic variables at the US county level. When a statistically significant association is detected, it searches the published literature for supporting or contradicting evidence, evaluates causal plausibility, and generates structured study proposals for further investigation.

Architecture
1

Correlation Detection

Cross-reference health outcomes with geographic, occupational, environmental, and demographic data at the US county level. Compute statistical correlations and identify spatial clusters using geospatial autocorrelation analysis (LISA).

  • CDC Wonder — mortality by ICD-10 code
  • CMS Chronic Conditions — disease prevalence by county
  • EPA — pesticide use, toxic release inventory, Superfund sites
  • USGS — land use, water quality, pesticide application
  • Census ACS — demographic and socioeconomic confounders
  • County Health Rankings — composite health indicators
2

Literature Mining

Search PubMed, Semantic Scholar, and other academic databases for published evidence related to detected correlations. Synthesize findings across studies, identify contradictions, and flag evidence gaps.

  • PubMed / NCBI E-utilities
  • Semantic Scholar API
  • FDA FAERS — adverse event reports
  • WHO Global Health Observatory
3

Causation Analysis

Evaluate detected associations against established causal frameworks. Assess confounders, dose-response relationships, biological plausibility, temporality, and consistency across studies.

  • Cross-study comparison and contradiction analysis
  • Exposure pathway modeling
  • Bradford Hill criteria assessment
4

Study Design

For associations with sufficient supporting evidence, generate structured research proposals including study population, methodology, controls, outcome measures, and timeline.

  • Existing study methodologies as templates
  • NIH and WHO study design frameworks

Learning Path

Week 1 Claude API fundamentals and Streamlit Projects 4#5
Week 2 Public data API integration Projects 7#8
Week 3 Statistical correlation and geospatial analysis Projects 1#2
Week 4 Advanced retrieval, multi-source synthesis Projects 3, #6, #9#10

The 10 Projects

🔬

#1 Unlikely Correlations Engine

active

Multi-agent pipeline for detecting hidden associations between health outcomes and environmental, occupational, or geographic factors. Integrates 20+ public data sources, searches published literature for supporting evidence, evaluates causal plausibility, and generates structured study proposals for further investigation.

Data Sources
  • CDC Wonder / CMS Chronic Conditions — health outcomes by county
  • EPA — pesticide use, toxic releases, Superfund sites, air quality
  • USGS — land use, water quality, pesticide application estimates
  • PubMed / Semantic Scholar — academic literature
  • Census ACS / County Health Rankings — demographic confounders
  • openFDA FAERS — adverse event reports
First Milestone

End-to-end demonstration using the Parkinson's disease / pesticide exposure case. Reproduce the known correlation, retrieve supporting literature, and generate a causation assessment.

Technical Components

Multi-agent architecture, data APIs, geospatial analysis, literature synthesis

👷

#2 Occupational Disease Pattern Miner

planned

Analyze health outcome data by occupation to identify statistically unusual illness rates. Surface occupations with elevated risk for specific disease categories relative to population baselines.

Data Sources
  • NIOSH occupational health datasets
  • BLS occupational data
  • Census County Business Patterns
First Milestone

Ranked table of occupations with the highest deviation from baseline rates for a user-selected disease category, with confidence intervals.

Technical Components

Statistical z-scores, pandas dataframes, Streamlit data display

💊

#3 FDA Adverse Event Anomaly Detector

planned

Analyze FDA FAERS reports to identify unexpected co-occurrences between drugs and health conditions not listed in current labeling.

Data Sources
  • openFDA FAERS API
First Milestone

Input a drug name, return a ranked list of reported conditions sorted by divergence from expected adverse event profile.

Technical Components

REST API integration, statistical anomaly detection, AI-assisted interpretation

📋

#4 Policy Brief Summarizer

planned

Structured summarization of health policy documents, academic papers, and institutional reports. Extracts key findings, recommendations, evidence quality, and identifies gaps requiring further research.

Data Sources
  • User-uploaded PDFs or pasted text
First Milestone

Single-page application accepting document text and returning a structured summary with cited findings and evidence assessment.

Technical Components

Claude API integration, prompt engineering, Streamlit interface design

📚

#5 Multi-Document Research Q&A

planned

Retrieval-augmented generation system for querying across a library of research documents. Synthesizes answers from multiple sources with citations to specific documents and passages.

Data Sources
  • User-uploaded PDF library
First Milestone

Upload three documents, submit a natural language query, receive a synthesized answer with source attribution.

Technical Components

RAG architecture, vector embeddings, document indexing

⚖️

#6 Cross-Study Contradiction Finder

planned

Comparative analysis of multiple studies on the same health topic. Identifies agreements, contradictions, and methodological differences. Assesses weight of evidence accounting for study design, population, and potential sources of bias.

Data Sources
  • User-uploaded PDFs or PubMed IDs
First Milestone

Input two paper abstracts, receive structured comparison of findings, methodology, and identified discrepancies.

Technical Components

Structured prompting, comparative analysis, evidence synthesis

📊

#7 Health System Scorecard Dashboard

planned

Comparative visualization of health system indicators across countries. Covers UHC service coverage index, health expenditure, out-of-pocket costs, workforce density, and related metrics.

Data Sources
  • WHO Global Health Observatory API
  • World Bank Health Nutrition and Population API
First Milestone

Select countries and indicators, generate side-by-side comparison dashboard with trend lines.

Technical Components

Data API integration, interactive charting, dashboard layout

💰

#8 Global Health Funding Tracker

planned

Analysis of global health funding flows from major donors. Identifies trends, gaps, and alignment opportunities across disease areas and geographic regions.

Data Sources
  • IATI (International Aid Transparency Initiative) open data
  • Gates Foundation grant database
First Milestone

Filter grants by disease area and year, generate funding trend analysis with identified gaps.

Technical Components

Data filtering, AI-assisted synthesis, export functionality

🎙️

#9 Stakeholder Interview Synthesizer

planned

Automated extraction of themes, key findings, notable statements, and action items from interview transcripts and stakeholder consultation records.

Data Sources
  • User-uploaded .txt or .docx transcripts
First Milestone

Process a single transcript, extract structured themes and action items with source attribution.

Technical Components

File handling, structured extraction, JSON output formatting

🌍

#10 Health Policy Diffusion Tracker

planned

Analysis of how specific health policies propagate across countries over time. Identifies contextual factors — economic, political, demographic — that correlate with adoption timing.

Data Sources
  • WHO policy databases
  • Academic literature via OpenAlex API
  • World Bank governance indicators
First Milestone

Map the adoption timeline of a specific policy globally and generate a hypothesis about factors correlated with early adoption.

Technical Components

Timeline visualization, multi-source data joins, hypothesis generation

10 projects Python + Streamlit almondfarm.us