Swarm & Bee · Platinum Data

Product Offering Memorandum

Platinum Medical QA Training Data

Prepared
February 2026
Firm
Swarm & Bee (S&B)
Products
13 SKUs — Sample, Specialty, Bundle, Vault, Enterprise
Format
JSONL with 80/10/10 splits
Support
build@swarmandbee.com
Forward to a Friend

Section 01

Product Overview

Swarm & Bee produces CoVe-verified platinum medical question-answer pairs for machine learning model training, fine-tuning, and evaluation. Every pair passes through our Chain-of-Verification pipeline, where each factual claim is independently checked by a 235 billion parameter model.

Our datasets cover 14 medical specialties — from cardiology and radiology to pharmacology and emergency medicine. Available as individual specialty packs, bundles, or the complete vault.

235B

Verifier Model

3

LLM Calls / Claim

79%

Raw Reject Rate

14

Specialties

10K+

Platinum Pairs

94%+

Disclaimer Rate

Section 02

Package Contents

Every product ships as a ZIP containing the following files:

[product].jsonlFull dataset — all pairs for this product
splits/train.jsonlTraining set (80% of pairs)
splits/val.jsonlValidation set (10% of pairs)
splits/test.jsonlTest set (10% of pairs)
PRODUCT_OM.txtProduct-specific offering memorandum
LICENSE.txtCommercial data license

Each pair is a single JSON object with the following fields:

idUnique identifier (hash-based, deterministic)
questionClinical scenario or medical question
answerEvidence-based response with clinical disclaimers
specialtyMedical specialty classification
sourceGeneration pipeline identifier
cove_statusVerification result (PASS or REWRITE)

Section 03

Production Pipeline

Every pair passes through a six-stage pipeline from medical literature to verified training data:

01

Harvest

10+ medical sources

02

Generate

Specialty QA models

03

Audit

Multi-stage quality gate

04

Verify

CoVe 235B fact-check

05

Rewrite

235B corrects errors

06

Deliver

Platinum vault only

Sources: PubMed, PMC full-text, FDA drug labels, ClinicalTrials.gov, Semantic Scholar, medRxiv, Europe PMC, HuggingFace medical datasets, GitHub medical repositories, Reddit medical communities.

Section 04

Verification Methodology

Chain-of-Verification (CoVe) is a three-step verification framework from Meta AI research, adapted for medical fact-checking. For every claim in every answer, three independent LLM calls determine accuracy.

Step 1 — Plan

Extract all factual claims from the answer. For each claim, generate a verification question that can independently confirm or refute it. Drug dosages, contraindications, guidelines, and clinical protocols each become separate verification targets.

Step 2 — Execute

Route each verification question to the 235B parameter model (Qwen3-235B). The verifier has no access to the original answer — it answers from its own knowledge base. This independence is critical for detecting hallucinated content.

Step 3 — Compare

Compare the 235B verification response with the original claim. Three outcomes: PASS (claim verified correct), FLAG (claim questionable — sent to 235B for rewrite with verified facts), FAIL (claim incorrect — permanently rejected).

A single failed claim downgrades the entire pair. FLAG pairs are rewritten by the 235B model using only independently verified facts. The result is platinum-grade training data where every factual claim has been independently verified.

Section 05

Suggested Training Configurations

Compatible with any HuggingFace-based training framework. Recommended configurations for medical fine-tuning:

Recommended Base Models

PrimaryQwen2.5-7B-Instruct AlternativeLlama-3.1-8B-Instruct GeneralMistral-7B-Instruct-v0.3

LoRA Configuration (7B Models)

Rank64 Alpha128 Target Modulesq, k, v, o, gate, up, down Dropout0.05

Training Configuration

Epochs3 Batch Size4 (effective 16) Learning Rate2e-4 Max Sequence Length2048 Schedulercosine Warmup3%

Hardware Requirements

Minimum1x 24GB VRAM (RTX 4090) Recommended1x 48GB+ VRAM (RTX 6000) Optimal1x RTX 6000 Pro 96GB (Blackwell)

Expected Results (~500 Platinum Pairs, 7B Base)

Medical QA Accuracy+4-8% over base Concept Coverage50%+ on specialty evals Disclaimer Compliance90%+ Training Time30-90 min (single GPU)

Section 06

Sample Pair

Cardiology — Acute Coronary Syndrome

Q: A 62-year-old male presents with substernal chest pressure radiating to the left arm for 45 minutes. ECG shows ST-elevation in leads II, III, and aVF. Troponin I is 2.4 ng/mL. BP 138/82, HR 88. What is the immediate management approach, and what are the key pharmacotherapy decisions in the first 24 hours?

A: This presentation is consistent with an inferior STEMI. Immediate management includes activating the cardiac catheterization lab for primary PCI (target door-to-balloon time <90 minutes). Administer aspirin 325mg chewable, ticagrelor 180mg loading dose (or clopidogrel 600mg if ticagrelor is contraindicated), and unfractionated heparin 60 U/kg bolus (max 4000 U). Sublingual nitroglycerin for ongoing pain, with caution given inferior MI (monitor for right ventricular involvement and hypotension). Post-PCI: dual antiplatelet therapy for 12 months, high-intensity statin (atorvastatin 80mg), beta-blocker (metoprolol tartrate 25mg BID titrated to HR goal), and ACE inhibitor (lisinopril 2.5–5mg daily) once hemodynamically stable. This response is for educational purposes. All clinical decisions should be made by qualified healthcare providers based on individual patient assessment.
CoVe Verified · 235B · Platinum

Section 07

License

Upon purchase, you receive a non-exclusive, non-transferable, perpetual license:

Enterprise licenses permit use by up to 10 individuals within a single organization. Full terms at data.swarmandbee.com/terms.html.

Section 08

Medical Disclaimer

This dataset is intended for machine learning research and model training purposes only. The medical content within is not intended as clinical advice, diagnostic guidance, or treatment recommendations.

While every pair has been verified through our CoVe pipeline, no dataset is guaranteed to be 100% error-free. Users are responsible for validating outputs of any models trained on this data before deploying in clinical or healthcare settings.

Swarm & Bee is not a healthcare provider and does not provide medical advice. Our products are data assets for AI/ML development.

View Products Contact Us