Product Offering Memorandum

Platinum Medical QA Training Data

Prepared: February 2026
Firm: Swarm & Bee (S&B)
Products: 13 SKUs — Sample, Specialty, Bundle, Vault, Enterprise
Format: JSONL with 80/10/10 splits
Support: build@swarmandbee.com

Section 01

Product Overview

Swarm & Bee produces CoVe-verified platinum medical question-answer pairs for machine learning model training, fine-tuning, and evaluation. Every pair passes through our Chain-of-Verification pipeline, where each factual claim is independently checked by a 235 billion parameter model.

Our datasets cover 14 medical specialties — from cardiology and radiology to pharmacology and emergency medicine. Available as individual specialty packs, bundles, or the complete vault.

235B

Verifier Model

LLM Calls / Claim

79%

Raw Reject Rate

Specialties

10K+

Platinum Pairs

94%+

Disclaimer Rate

Section 02

Package Contents

Every product ships as a ZIP containing the following files:

[product].jsonl	Full dataset — all pairs for this product
splits/train.jsonl	Training set (80% of pairs)
splits/val.jsonl	Validation set (10% of pairs)
splits/test.jsonl	Test set (10% of pairs)
PRODUCT_OM.txt	Product-specific offering memorandum
LICENSE.txt	Commercial data license

Each pair is a single JSON object with the following fields:

id	Unique identifier (hash-based, deterministic)
question	Clinical scenario or medical question
answer	Evidence-based response with clinical disclaimers
specialty	Medical specialty classification
source	Generation pipeline identifier
cove_status	Verification result (PASS or REWRITE)

Section 03

Production Pipeline

Every pair passes through a six-stage pipeline from medical literature to verified training data:

Harvest

10+ medical sources

Generate

Specialty QA models

Audit

Multi-stage quality gate

Verify

CoVe 235B fact-check

Rewrite

235B corrects errors

Deliver

Platinum vault only

Sources: PubMed, PMC full-text, FDA drug labels, ClinicalTrials.gov, Semantic Scholar, medRxiv, Europe PMC, HuggingFace medical datasets, GitHub medical repositories, Reddit medical communities.

Section 04

Verification Methodology

Chain-of-Verification (CoVe) is a three-step verification framework from Meta AI research, adapted for medical fact-checking. For every claim in every answer, three independent LLM calls determine accuracy.

Step 1 — Plan

Extract all factual claims from the answer. For each claim, generate a verification question that can independently confirm or refute it. Drug dosages, contraindications, guidelines, and clinical protocols each become separate verification targets.

Step 2 — Execute

Route each verification question to the 235B parameter model (Qwen3-235B). The verifier has no access to the original answer — it answers from its own knowledge base. This independence is critical for detecting hallucinated content.

Step 3 — Compare

Compare the 235B verification response with the original claim. Three outcomes: PASS (claim verified correct), FLAG (claim questionable — sent to 235B for rewrite with verified facts), FAIL (claim incorrect — permanently rejected).

A single failed claim downgrades the entire pair. FLAG pairs are rewritten by the 235B model using only independently verified facts. The result is platinum-grade training data where every factual claim has been independently verified.

Section 05

Suggested Training Configurations

Compatible with any HuggingFace-based training framework. Recommended configurations for medical fine-tuning:

Recommended Base Models

PrimaryQwen2.5-7B-Instruct AlternativeLlama-3.1-8B-Instruct GeneralMistral-7B-Instruct-v0.3

LoRA Configuration (7B Models)

Rank64 Alpha128 Target Modulesq, k, v, o, gate, up, down Dropout0.05

Training Configuration

Epochs3 Batch Size4 (effective 16) Learning Rate2e-4 Max Sequence Length2048 Schedulercosine Warmup3%

Hardware Requirements

Minimum1x 24GB VRAM (RTX 4090) Recommended1x 48GB+ VRAM (RTX 6000) Optimal1x RTX 6000 Pro 96GB (Blackwell)

Expected Results (~500 Platinum Pairs, 7B Base)

Medical QA Accuracy+4-8% over base Concept Coverage50%+ on specialty evals Disclaimer Compliance90%+ Training Time30-90 min (single GPU)

Section 06

Sample Pair

Cardiology — Acute Coronary Syndrome

Q: A 62-year-old male presents with substernal chest pressure radiating to the left arm for 45 minutes. ECG shows ST-elevation in leads II, III, and aVF. Troponin I is 2.4 ng/mL. BP 138/82, HR 88. What is the immediate management approach, and what are the key pharmacotherapy decisions in the first 24 hours?

A: This presentation is consistent with an inferior STEMI. Immediate management includes activating the cardiac catheterization lab for primary PCI (target door-to-balloon time <90 minutes). Administer aspirin 325mg chewable, ticagrelor 180mg loading dose (or clopidogrel 600mg if ticagrelor is contraindicated), and unfractionated heparin 60 U/kg bolus (max 4000 U). Sublingual nitroglycerin for ongoing pain, with caution given inferior MI (monitor for right ventricular involvement and hypotension). Post-PCI: dual antiplatelet therapy for 12 months, high-intensity statin (atorvastatin 80mg), beta-blocker (metoprolol tartrate 25mg BID titrated to HR goal), and ACE inhibitor (lisinopril 2.5–5mg daily) once hemodynamically stable. This response is for educational purposes. All clinical decisions should be made by qualified healthcare providers based on individual patient assessment.

CoVe Verified · 235B · Platinum

Section 07

License

Upon purchase, you receive a non-exclusive, non-transferable, perpetual license:

Use the dataset for training, fine-tuning, and evaluating machine learning models
Use models trained on the dataset in commercial applications
Create derivative works (trained models) from the dataset
Internal research and development

Redistribute, resell, or sublicense the raw dataset
Share purchased data files publicly or with unlicensed parties
Claim authorship of the dataset itself
Use the dataset to build a competing data product

Enterprise licenses permit use by up to 10 individuals within a single organization. Full terms at data.swarmandbee.com/terms.html.

Section 08

Medical Disclaimer

This dataset is intended for machine learning research and model training purposes only. The medical content within is not intended as clinical advice, diagnostic guidance, or treatment recommendations.

While every pair has been verified through our CoVe pipeline, no dataset is guaranteed to be 100% error-free. Users are responsible for validating outputs of any models trained on this data before deploying in clinical or healthcare settings.

Swarm & Bee is not a healthcare provider and does not provide medical advice. Our products are data assets for AI/ML development.

View Products Contact Us