See exactly how our AI evaluates your symptoms — transparent, cautious, and never a replacement for your doctor
Tell the AI what you are experiencing in your own words. You do not need medical terminology — describe it however feels natural.
The AI is designed to understand everyday language. 'My stomach has been hurting after I eat' is just as useful as 'postprandial epigastric pain.' The system uses natural language processing to extract clinical concepts from your description: symptom type, location, severity, duration, and associated factors. It handles vague descriptions gracefully — if you say 'I feel weird,' it asks clarifying questions rather than guessing. The AI also recognizes urgency signals: descriptions suggesting chest pain, difficulty breathing, severe sudden headaches, signs of stroke (facial drooping, arm weakness, speech difficulty), or other emergencies trigger an immediate recommendation to call 911 or visit an emergency room before any further analysis.
The AI asks targeted follow-up questions to narrow possibilities — similar to the intake questions a doctor would ask, but explained in plain language.
This differential diagnosis process typically involves 5-10 follow-up questions, adapted based on your answers. For abdominal pain, the AI might ask: Where exactly is the pain (upper, lower, left, right, center)? Is it sharp, dull, cramping, or burning? When did it start? Is it constant or does it come and go? Does eating make it better or worse? Are there associated symptoms like nausea, fever, or changes in bowel habits? Have you recently traveled, changed your diet, or started new medications? Each answer eliminates some possibilities and increases the probability of others. The AI explains why it is asking each question — 'I am asking about location because upper right abdominal pain suggests different conditions than lower left pain' — so the process feels educational rather than interrogative.
Using your symptom profile, the AI generates a ranked list of possible conditions, from most likely to least likely, using a medical knowledge base covering over 10,000 conditions.
The AI uses a probabilistic model that weighs multiple factors: symptom-condition associations (how commonly this symptom appears in each condition), symptom combinations (a cough alone has very different implications than a cough plus fever plus night sweats), demographic factors (age, sex, and any medical history you share affect which conditions are more or less likely), and temporal patterns (sudden onset vs. gradual, duration, progression). The model does not simply match keywords to diseases — it understands that the same symptom can mean very different things in different contexts. A persistent headache in a 25-year-old with screen-heavy work is most likely tension-type; the same headache in a 65-year-old with a history of hypertension warrants different consideration. The AI generates a ranked list but is designed to be appropriately cautious: it always includes serious possibilities that need to be ruled out, even if they are statistically unlikely, because the cost of missing a serious condition is far higher than the cost of an extra doctor visit.
Each possible condition is categorized by urgency: emergency (go to ER now), urgent (see a doctor within 24-48 hours), routine (schedule an appointment), or self-manageable (with guidance).
This is the most critical safety step in the process. The AI errs heavily on the side of caution — it is designed to over-refer rather than under-refer. The urgency levels are: Emergency (red) means symptoms could indicate a life-threatening condition and the user should seek immediate medical attention. Urgent (orange) means symptoms suggest a condition that needs professional evaluation within 24-48 hours. Routine (yellow) means the symptoms are likely non-urgent but should be evaluated by a healthcare provider at an upcoming appointment. Self-manageable (green) means the symptoms are consistent with a common, self-limiting condition that can be monitored at home with specific watch-for criteria. Even at the green level, the AI provides specific escalation triggers — symptoms that, if they develop, should prompt immediate medical attention. The AI never tells you 'you are fine' — it tells you what the evidence suggests and what level of medical attention is appropriate.
For each possible condition on the list, the AI provides a clear, jargon-free explanation: what it is, why your symptoms match, how it is typically diagnosed, and how it is treated.
Medical information is only useful if you can understand it. For each condition, the AI provides: a one-sentence summary in plain English (e.g., 'acid reflux is when stomach acid flows back into your esophagus, causing burning and discomfort after eating'), why your specific symptoms fit this condition (e.g., 'your symptoms match because the pain occurs after eating, is located in the upper abdomen, and is described as burning'), how a doctor would confirm or rule out this condition (e.g., 'typically diagnosed based on symptoms and response to a trial of antacids; if persistent, an endoscopy may be recommended'), and standard treatment approaches (e.g., 'lifestyle modifications like avoiding trigger foods, eating smaller meals, and not lying down after eating; over-the-counter antacids for mild cases; prescription proton pump inhibitors for moderate to severe cases'). The explanations deliberately avoid causing unnecessary alarm while being honest about what each condition involves.
The AI generates a summary you can bring to your doctor visit — organized symptom history, questions to ask, and relevant details that help your provider make an accurate diagnosis faster.
This step transforms your AI conversation into a practical tool for your actual healthcare. The summary includes: a structured symptom timeline (when symptoms started, how they have changed, what makes them better or worse), relevant medical history you shared during the conversation, a list of the conditions the AI flagged as possible (framed as 'conditions to consider' rather than 'diagnoses'), specific questions to ask your doctor based on the AI's analysis (e.g., 'Ask about whether an H. pylori test is appropriate given the symptom duration'), and any medications, supplements, or recent life changes that might be relevant. Doctors consistently report that patients who arrive with organized symptom histories get more productive appointments — instead of spending 10 minutes reconstructing the timeline, the doctor can jump straight to evaluation and diagnosis. The summary explicitly states that it was generated by an AI tool and is provided as a patient-prepared symptom log, not a diagnosis.
For self-manageable conditions, the AI provides specific monitoring instructions: what to watch for, when to escalate, and what you can safely do at home.
If the AI's assessment suggests your symptoms are likely self-limiting (e.g., a common cold, mild food intolerance, muscle strain), it provides specific home management guidance: what over-the-counter options may help (e.g., 'acetaminophen or ibuprofen for pain, as directed on the label'), lifestyle modifications (e.g., 'avoid spicy and acidic foods for 2 weeks to see if symptoms improve'), a monitoring timeline (e.g., 'these symptoms typically resolve within 5-7 days'), and clear escalation criteria (e.g., 'see a doctor if the pain becomes severe, you develop a fever above 101 F, you notice blood in your stool, or symptoms have not improved after 2 weeks'). The escalation criteria are the most important part — they define the boundary between 'watch and wait' and 'seek professional help now.' The AI is conservative in these thresholds, preferring to send you to a doctor slightly early rather than slightly late.
Accuracy measured using a validation set of 1,000 symptom presentations with confirmed diagnoses across 15 medical specialties. Each case was presented to the AI using the patient's own language (not clinical terminology). 'Top-3 accuracy' means the confirmed diagnosis appeared within the AI's top 3 ranked possibilities. Emergency detection sensitivity was measured separately using 200 cases with confirmed emergency conditions — the AI correctly identified 198/200 as requiring immediate medical attention (99% sensitivity). The 2 missed cases involved atypical presentations of cardiac events with non-classic symptom patterns.
Health Copilot uses a structured approach to symptom analysis that mirrors the clinical reasoning process doctors use, adapted for a conversational format. Understanding this process helps you provide better information and interpret the results appropriately.
Natural language understanding: The first layer of the system translates your everyday language into clinical concepts. When you say 'my stomach hurts after eating,' the AI maps this to several clinical parameters: symptom (pain), location (abdominal — with follow-up needed to determine which quadrant), timing (postprandial), and trigger (food intake). This mapping happens automatically, so you never need to learn medical terminology. The AI handles colloquialisms, regional expressions, and imprecise descriptions — 'I feel like my chest is tight' is understood as a description that could indicate respiratory, cardiac, or musculoskeletal issues, triggering appropriate follow-up questions for each possibility.
Differential diagnosis tree: Based on your initial description, the AI constructs a tree of possible conditions. Each follow-up question is designed to prune branches of this tree. If you report that your abdominal pain is in the upper right quadrant, conditions affecting the lower left quadrant become less likely and are moved down the list. If you report the pain worsens after fatty meals specifically, gallbladder-related conditions move up. The tree typically starts with 15-30 possible conditions for a common symptom and narrows to 3-7 after the questioning phase.
Bayesian probability weighting: The AI does not just match symptoms to diseases — it uses probabilistic reasoning that accounts for base rates. Chest pain in a 28-year-old non-smoking woman has very different probabilistic implications than chest pain in a 62-year-old man with hypertension and diabetes. The AI factors in demographic information and any medical history you share to adjust its probability estimates. This is why it asks about age, sex, and relevant medical history — not to be intrusive, but because these factors genuinely change what conditions are more or less likely.
Safety-first design: The system is explicitly designed to prioritize sensitivity (not missing serious conditions) over specificity (correctly excluding non-serious conditions). This means the AI will sometimes include a serious condition on its possibility list even when it is statistically unlikely, if your symptoms have any overlap with that condition's presentation. Some users find this overly cautious. We consider it the right trade-off — the cost of false reassurance (missing something serious) is much higher than the cost of an unnecessary doctor visit.
Being transparent about limitations is especially important in health. Here is what the AI fundamentally cannot do, no matter how sophisticated it becomes, and why these limitations mean you still need a human healthcare provider.
It cannot examine you physically. A doctor's physical examination provides information that no description can fully convey. Pressing on your abdomen to check for tenderness, guarding, and rebound pain. Listening to your heart and lungs with a stethoscope. Looking at your throat, ears, or skin. Checking your range of motion. Measuring your blood pressure, heart rate, and oxygen saturation. These physical findings often differentiate between conditions that present with identical self-reported symptoms. An AI working from text descriptions alone is fundamentally limited in a way that will not be solved by better algorithms.
It cannot order or interpret tests. Many diagnoses require laboratory work (blood tests, urine tests, cultures), imaging (X-rays, CT scans, MRIs, ultrasounds), or specialized tests (ECGs, endoscopies, biopsies). The AI can explain which tests might be relevant to your situation and why — helping you understand what your doctor orders — but it cannot order those tests, receive the results, or provide a definitive interpretation. Lab Results Copilot can help you understand test results once you have them, but it supplements rather than replaces your doctor's interpretation.
It cannot prescribe treatment. The AI can explain general treatment approaches for various conditions (how acid reflux is typically treated, what medications are commonly used for hypertension, what physical therapy might look like for back pain), but it cannot prescribe specific medications, determine dosages for your situation, or account for drug interactions with your current medications. Treatment decisions require a licensed healthcare provider who can weigh benefits, risks, and your complete medical history.
It cannot provide emergency care. In an emergency, every second counts. The AI is designed to recognize emergency symptoms and immediately direct you to call 911 or visit an ER. But it cannot provide the interventions an emergency requires — CPR guidance, wound management, or stabilization. If you believe you are experiencing a medical emergency, call 911 first. The AI is a preparation and education tool, not an emergency response system.
It cannot replace the doctor-patient relationship. Medicine involves trust, continuity of care, knowledge of your personal and family history over time, and the human judgment that comes from years of clinical experience. The AI provides a point-in-time analysis based on what you share in a single conversation. Your doctor sees the full picture — and that full picture matters for your health.
Health is the domain where AI safety matters most. A false negative — telling someone their symptoms are not concerning when they actually indicate a serious condition — could be dangerous. Here is how every layer of the system is designed to prevent that scenario.
Emergency detection is always-on. Before any analysis begins, every user message is screened for emergency indicators. Descriptions of chest pain with shortness of breath, sudden severe headache, signs of stroke (facial drooping, arm weakness, speech difficulty), severe allergic reactions, thoughts of self-harm, or significant traumatic injuries trigger an immediate emergency response that interrupts any ongoing analysis. This screening runs continuously throughout the conversation, not just on the first message — because a user might mention a critical symptom casually mid-conversation.
Escalation triggers are conservative. The thresholds for recommending medical attention are deliberately set lower than a clinician might set them. A doctor seeing a healthy 25-year-old with 3 days of mild headaches might say 'take some Tylenol and call me if it gets worse.' The AI, unable to perform the physical examination that informed that clinical judgment, takes the more cautious approach of recommending a doctor visit if symptoms persist beyond a specified window. We accept higher false-positive rates (recommending doctor visits that turn out unnecessary) as an acceptable trade-off for lower false-negative rates (missing something that needed attention).
No definitive diagnoses, ever. The AI never says 'you have condition X.' It says 'your symptoms are most consistent with condition X, but this should be confirmed by a healthcare provider.' This is not just a legal disclaimer — it reflects a genuine limitation. Without physical examination and diagnostic testing, a definitive diagnosis based on self-reported symptoms alone would be irresponsible. The AI presents possibilities ranked by likelihood, not conclusions.
Medication safety boundaries. The AI will provide general information about common treatments but will not recommend specific dosages, will not suggest stopping prescribed medications, and will flag potential interactions when users mention multiple medications. For detailed medication questions, Medication Copilot provides more specialized analysis, always with the recommendation to verify with your pharmacist or prescribing physician.
Mental health sensitivity. If the conversation reveals symptoms of depression, anxiety, suicidal ideation, or other mental health concerns — whether the user asks about them directly or they emerge during symptom discussion — the AI responds with appropriate resources including crisis helplines (988 Suicide and Crisis Lifeline) and encouragement to seek professional mental health support. Mental Health Copilot provides more specialized support for ongoing mental health concerns.
We report that our AI includes the correct condition in its top-3 list 89% of the time and in its top-5 list 94% of the time. Here is what those numbers mean in practice, how they were measured, and what they do not tell you.
What 89% top-3 accuracy means: In our validation study of 1,000 symptom presentations with confirmed diagnoses, the AI's top-3 ranked conditions included the actual diagnosis 89% of the time. This means that if you read the AI's top 3 possibilities, there is roughly a 9-in-10 chance the correct condition is listed. For the remaining 11%, the correct diagnosis was often in positions 4-7 on the list, and in a small number of cases (approximately 3%), the correct diagnosis was not listed at all — typically because the condition was rare or presented atypically.
What the accuracy does NOT mean: It does not mean the AI correctly identifies the single correct condition 89% of the time. Symptom-based assessment is inherently probabilistic — the same set of symptoms can be caused by multiple different conditions, and without testing, narrowing to a single diagnosis is often impossible even for an experienced physician. The AI's job is to help you understand the range of possibilities so you and your doctor can investigate efficiently.
Where accuracy is highest: Common conditions with distinctive symptom patterns (urinary tract infections, acid reflux, seasonal allergies, strep throat, pink eye) have top-3 accuracy above 95%. The AI performs well when symptoms are specific and the symptom-disease association is strong.
Where accuracy is lowest: Conditions with vague or overlapping symptoms (chronic fatigue, non-specific pain, autoimmune conditions in early stages) have lower accuracy because multiple conditions produce similar presentations. The AI handles this honestly by presenting a broader list of possibilities and recommending professional evaluation to differentiate between them, rather than forcing a false sense of certainty.
The 99% emergency detection rate: This is the number we are most focused on getting as close to 100% as possible. Of 200 emergency cases in our validation set (confirmed heart attacks, strokes, appendicitis, severe allergic reactions, meningitis), the AI correctly flagged 198 as requiring immediate emergency attention. The 2 missed cases involved atypical cardiac presentations — and this is an active area of improvement. We continuously expand the emergency detection system to cover more atypical presentations.
Health information is among the most sensitive data that exists. Here is exactly what happens with the symptoms and health details you share with the AI, written plainly without corporate hedging.
During your conversation: The symptoms, medical history, and health details you share are processed in server memory to generate your assessment. This data exists in RAM only during your active session. It is not written to a database, not stored on disk, and not indexed or searchable.
After your session ends: All health data from your conversation is purged from memory within 60 seconds of session end (or 60 minutes of inactivity). There is no archive of your symptoms, no record of what conditions were discussed, and no way for anyone — including Copilotly employees — to retrieve your health conversation after the session ends.
We do not use your health data for AI training. Your symptom descriptions, medical history, and conversation content are never used to train or fine-tune AI models. The AI's medical knowledge comes from published medical literature and clinical datasets — not from user conversations.
No data sharing with any third party. Your health information is not shared with insurance companies, employers, pharmaceutical companies, data brokers, advertisers, or any other entity. Period. We have no data-sharing agreements that involve user health information. Our revenue comes from subscriptions, not data monetization.
HIPAA considerations: Copilotly is an educational health information tool, not a covered entity under HIPAA in the way that a hospital or insurance company is. However, we voluntarily implement security practices that meet or exceed HIPAA security standards, including encryption in transit (TLS 1.3), encryption at rest for any temporary data, access controls, and audit logging. For enterprise healthcare organizations evaluating Copilotly, we can execute a Business Associate Agreement (BAA) and provide our security documentation.
Your control: You decide what to share. The AI works with whatever information you provide — you are never required to share your real name, age, medical history, or any identifying information. The assessment may be more accurate with more context, but the choice of what to disclose is entirely yours.
AI symptom checking is a powerful tool when used appropriately and a poor substitute when used in the wrong situations. Here is our honest guidance on when Health Copilot adds value and when you should skip it and go straight to a medical professional.
Good use cases (where AI adds clear value):
Poor use cases (go straight to a professional):
Available Everywhere
One Account. Every Platform.
How Copilotly Works: How AI Symptom Checking Works
Get AI-powered expert guidance for legal, medical, financial, and career questions.
Get StartedPick a copilot, ask your question, get professional-grade answers. 131 specialized AI copilots across 20 domains.