Data Protection Impact Assessment (DPIA) — Sehtak Platform
Prepared under: Federal Decree-Law No. 45 of 2021 on the Protection of Personal Data (PDPL), Article 21 Controller: Sehtak Health Technologies LLC (DHCC registration pending) Assessor: [DPO — to be appointed] Version: 1.0 (Draft — pending DPO sign-off) Date: [Effective on platform launch] Review cadence: Annual, or whenever a material processing change occurs
Why a DPIA Is Required
PDPL Article 21 requires a DPIA when processing is likely to result in a high risk to individuals, particularly when it involves:
- Sensitive personal data at scale (Art. 15) — Sehtak handles clinical records on a national scale.
- Automated decision-making or profiling — Sehtak uses an LLM to draft clinical notes.
- Cross-border transfers — not performed for PHI, but assessed to confirm that.
- Systematic monitoring — applicable to telehealth recordings and audit logs.
All five criteria apply to the Sehtak platform. This DPIA assesses the five highest-risk flows.
1. Description of the Processing
Sehtak is a UAE-first healthcare operating system connecting clinics, pharmacies, diagnostic centres, and patients through eight application surfaces that share a common API and database. Core flows:
1. Patient registers (direct, or via UAE Pass SOP3 identity verification). 2. Patient books an appointment (via patient app, facility microsite, or WhatsApp). 3. Clinic staff check the patient in, run an insurance eligibility check, and open an encounter. 4. Doctor records vitals, diagnosis (ICD-10-CM), prescription (DDC + RxNorm), and orders. 5. Encounter is completed. This event fans out to:
6. Pharmacy dispenses against the e-prescription. 7. Invoice generation, payment via Tap Payments, payment link delivered via WhatsApp.
- HIE submission (NABIDH for Dubai, Malaffi for Abu Dhabi, Riayati for Northern Emirates) via asynchronous BullMQ queues.
- Insurance eClaim submission (DHPO eClaimLink for Dubai, Shafafiya for Abu Dhabi) via a separate queue.
- Patient notification (WhatsApp first).
Data subjects are patients, their family members linked under a guardian, clinical and administrative staff, and facility owners.
Data categories include account identifiers, Emirates ID, clinical records (diagnoses, medications, vitals, lab results, imaging reports, allergies, immunisations), insurance policy and claim data, and payment metadata. See docs/compliance/privacy-policy.md §3.
All PHI processing occurs on UAE-resident infrastructure in AWS me-central-1. See CLAUDE.md §17.
2. Necessity and Proportionality
2.1 Is each processing purpose necessary?
| Processing purpose | Necessary? | Why it cannot be done with less data |
|---|---|---|
| Storing clinical records | Yes | Required by Federal Law No. 2 of 2019 (25-year retention). |
| Sharing to NABIDH | Yes — legally mandated in Dubai | DHA mandates submission for licensed Dubai facilities; opt-out is available. |
| Sharing to Malaffi/Riayati | Yes — when consented | Improves continuity of care across facilities; opt-in. |
| Submitting eClaims | Yes | Insurance reimbursement requires claim data; without it the clinic cannot be paid. |
| Insurance eligibility checks | Yes | Avoids patient out-of-pocket surprises; reduces denied claims. |
| AI ambient scribing | Optional per encounter | Reduces documentation burden; patient can always decline. |
| UAE Pass SOP3 verification | Yes for clinical access | Prevents identity fraud; SOP1 accounts are restricted to booking only. |
| Telehealth recording | Optional per session | Supports medico-legal defence; patient must explicitly consent. |
2.2 Data minimisation
- Sehtak stores only codes and structured fields that are needed for the clinical, regulatory, or billing purpose. Free-text narrative is retained when the doctor records it — this is within the scope of
provision of healthcare services. - AI scribe transcripts are purged at 30 days — the minimum that allows quality review.
- HIE payloads and eClaim XML are purged from the in-transit cache at 24 hours.
- IP addresses on audit logs are truncated when no longer needed for security investigation.
2.3 Proportionality
Processing scale is proportionate to the public-interest purpose (providing healthcare). Alternatives considered:
- Federated storage without a central database: rejected — breaks eClaims submission and HIE interoperability.
- Patient-held health records only: rejected — does not meet the clinical or regulatory workflow needs of facilities.
- Third-party cloud LLM for scribing: rejected — see §4.1 (breaches Federal Law No. 2 of 2019).
3. High-Risk Flows — Individual Assessments
Likelihood: 1 Rare / 2 Unlikely / 3 Possible / 4 Likely / 5 Almost certain Severity: 1 Negligible / 2 Minor / 3 Moderate / 4 Major / 5 Catastrophic Risk = L × S (max 25). Mitigated risk is after the controls in CLAUDE.md are applied.
3.1 AI ambient scribe (clinical audio)
Description. A microphone captures the doctor-patient conversation; expo-audio (mobile) or the Web Audio API (web) streams chunks to the API. The API posts them to the self-hosted Whisper large-v3 service on EC2 g5.xlarge (me-central-1) which returns a rolling transcript. When the doctor requests a SOAP note, the transcript is de-identified and sent to the self-hosted Qwen 2.5 72B Instruct AWQ LLM on EC2 g5.12xlarge (me-central-1). The SOAP draft is returned, the doctor reviews and edits, and the encounter is saved.
| Risk | L | S | Raw | Mitigation | Residual |
|---|---|---|---|---|---|
| PHI leaves UAE during inference | 4 | 5 | 20 | Both Whisper and vLLM run on EC2 inside a private VPC in me-central-1. No cloud inference endpoint is configured (no ANTHROPIC_API_KEY, no OPENAI_API_KEY, no AWS_BEDROCK_*). See CLAUDE.md §12 and rule #19. | 2 |
| Transcript contains direct identifiers | 5 | 4 | 20 | deidentifyTranscript() in packages/ai/src/deidentify.ts strips Emirates ID, name variants (Arabic + English), MRN, phone, DOB before every LLM call. See CLAUDE.md §12. | 4 |
| Model hallucinates clinical content | 4 | 4 | 16 | Every SOAP note requires doctor review and acceptance before it enters the record; rejection is logged. Temperature fixed at 0.1. See CLAUDE.md §12 encounter flow. | 8 |
| Audio intercepted in transit | 2 | 4 | 8 | TLS 1.3 between client and API; VPC-internal HTTPS between API and Whisper/vLLM. | 2 |
| Transcript retained too long | 3 | 3 | 9 | Encounter.aiScribeTranscript is AES-256-GCM encrypted and purged after 30 days via scheduled job. See CLAUDE.md §17. | 3 |
| Consent is not genuine | 3 | 3 | 9 | ConsentRecord with type = TREATMENT or a dedicated AI scribe consent must exist before the scribe can start. Logged in audit trail. | 3 |
| Staff misuse — listening to transcripts for non-clinical reasons | 3 | 4 | 12 | RBAC (staff only see their facility); every access creates an auditLogs row; anomaly detection on bulk access. | 4 |
Residual risk — acceptable given that every SOAP note is human-reviewed before acceptance and no open-source model has strong Arabic medical benchmark performance as of 2026.
3.2 NABIDH submissions (HL7 v2.5.1 with PHI)
Description. On encounter completion, a BullMQ job builds an HL7 v2.5.1 message (ADT^A01/A08, RDE^O11, ORU^R01, ORM^O01, VXU^V04, or MDM^T02) and POSTs it over HTTPS to the DHA NABIDH endpoint, authenticated by the Sheryan Facility ID in MSH-4. See CLAUDE.md §7.
| Risk | L | S | Raw | Mitigation | Residual |
|---|---|---|---|---|---|
| Message sent without patient consent | 3 | 5 | 15 | isNabidhConsentValid() in packages/hie/src/nabidh/consent-gate.ts checks for opt-out record before every send. See CLAUDE.md rule #7. | 3 |
| Wrong patient in message (mismatched Emirates ID) | 2 | 5 | 10 | Emirates ID is the primary index; patient_identifiers table has a unique index on (type, value). Builders validate Emirates ID against the 784-YYYY-NNNNNNN-C format before send. | 2 |
| Sheryan Facility ID swapped between facilities | 1 | 5 | 5 | MSH-4 is set from healthcare_facilities.sheryan_facility_id within the facility-scoped session. Cannot cross facilities due to RBAC. | 2 |
| HIE outage causes clinical disruption | 3 | 3 | 9 | HIE is always async via BullMQ. A failing HIE never prevents encounter completion. See CLAUDE.md rule #8. | 3 |
| Raw HL7 payload leaks via logs | 3 | 4 | 12 | Pino redaction configured for patientId, emiratesId, email, phone, memberID, insuranceNumber. Payload purged from sync_jobs.payload after 24 hours. See CLAUDE.md rule #2 and §17. | 3 |
| Credentials stolen | 2 | 5 | 10 | facility_integrations.credentials is AES-256-GCM encrypted. Never logged. Rotated annually. | 2 |
3.3 DHPO eClaims (XML with Emirates ID and diagnoses)
Description. On encounter completion, a second BullMQ job generates a DHPO claim XML containing the Emirates ID (mandatory), ICD-10-CM diagnoses, and CPT/HCPCS/DDC activities, and posts it over SOAP to the DHA eClaimLink endpoint. DDC is synced twice weekly from DHA. See CLAUDE.md §10.
| Risk | L | S | Raw | Mitigation | Residual |
|---|---|---|---|---|---|
| Emirates ID sent to the wrong payer | 2 | 5 | 10 | payer_id is bound to the patient's active insurance_profiles row. Integration tests validate payer routing against DHPO payer codes before submission. | 2 |
| Diagnosis code in wrong coding system | 3 | 4 | 12 | Schema enforces ICD-10-CM only (rule #16). Claim builder rejects non-CM codes at build time. | 3 |
| Missing DHA clinician code causes auto-rejection | 4 | 2 | 8 | clinician_code is required on every ClaimActivity; claim service validates presence before submission. See CLAUDE.md rule #14. | 2 |
| DDC code stale leading to rejected pharmacy claim | 4 | 2 | 8 | Scheduled DDC sync runs twice weekly. medication_catalog.ddc_last_updated tracked and alerted when stale > 7 days. See CLAUDE.md §10. | 2 |
| XML payload retained longer than needed | 3 | 3 | 9 | eclaim_submissions.xml_payload purged after 24 hours. Only the parsed remittance data is retained. | 2 |
| Silent payer error loses reimbursement | 3 | 3 | 9 | Poll remittance weekly; surface in clinic dashboard; max 2 resubmissions per DHA PD-05-2025. | 3 |
| Unverified telehealth encounter submitted for controlled drug | 2 | 5 | 10 | Prescription creation rejects isTelehealth=true AND containsControlled=true per Federal Decree-Law No. 38 of 2024. See CLAUDE.md rule #11. | 1 |
3.4 Facility microsites (patient booking form)
Description. Each facility has a public subdomain at [slug].sehtak.ae (or a custom CNAME) that exposes an unauthenticated booking form. A patient enters name, phone, Emirates ID (optional), reason for visit, and a preferred slot. The form creates an appointment with channel = FACILITY_SITE and sends a WhatsApp confirmation.
| Risk | L | S | Raw | Mitigation | Residual |
|---|---|---|---|---|---|
| Scraping and bot submissions | 5 | 2 | 10 | Rate limit per IP; CAPTCHA on booking form; phone OTP required to confirm. | 4 |
| Phishing of microsite look-alikes | 3 | 4 | 12 | SSL on all subdomains; custom domains require DNS verification; URL displayed clearly in all WhatsApp messages. | 6 |
| Unauthorised booking on someone else's behalf | 4 | 3 | 12 | Confirmation is sent to the phone number entered; OTP must be entered before the booking is accepted. Patient can cancel without logging in. | 3 |
| Form data persisted longer than needed | 3 | 2 | 6 | Unconfirmed drafts purged after 24 hours. Confirmed bookings become appointments and follow the normal retention. | 2 |
| Emirates ID captured without SOP3 | 4 | 3 | 12 | Microsite form does not ask for Emirates ID for first-time bookings; clinical access requires UAE Pass SOP3 at check-in. | 4 |
| Cross-site scripting against the form | 3 | 4 | 12 | Zod validation server-side; Content Security Policy; React escapes by default. | 2 |
3.5 Telehealth recordings
Description. Daily.co hosts the video session. When the patient consents, the session is recorded. The recording URL is stored in video_sessions.recording_url.
| Risk | L | S | Raw | Mitigation | Residual |
|---|---|---|---|---|---|
| Recording without consent | 2 | 5 | 10 | video_sessions.recording_consent_given must be true before recording starts; UI gate enforced client-side and verified on API. See UAE telehealth guidance. | 2 |
| Recording accessible via raw URL | 3 | 5 | 15 | Daily.co recordings fetched by Sehtak, re-hosted on S3 me-central-1, and served via 5-minute pre-signed URLs. See CLAUDE.md rule #4. | 3 |
| Recording retained indefinitely | 3 | 3 | 9 | Recording retention configured per facility, default 1 year, with per-patient revocation. Reviewed in the annual DPIA refresh. | 4 |
| Cross-border routing of video stream | 3 | 4 | 12 | Daily.co selects nearest POP for WebRTC media; metadata stays in Sehtak UAE. Recording files are fetched and stored in UAE S3. Noted as a residual — see §6.3. | 6 |
| Unauthorised family member viewing | 2 | 4 | 8 | Token-based join via video_participants.token; tokens single-use and short-lived. | 2 |
4. Why Self-Hosted Qwen 2.5 72B on EC2 me-central-1 Is the Only Compliant Choice
This is the most frequently challenged design decision. This section records the legal and technical analysis.
4.1 The legal constraint
Federal Law No. 2 of 2019 on the Use of ICT in the Areas of Health, Article 13, prohibits the transfer of health data outside the UAE without Ministry of Health authorisation. UAE law does not recognise a HIPAA-style de-identification safe harbour: even de-identified clinical text (symptoms, diagnoses, treatment plans with no direct identifiers) falls within the Law's scope because it describes a patient's health.
PDPL Article 22 adds a second gate: personal data may only cross borders if the receiving jurisdiction provides an adequate level of protection or an appropriate safeguard is in place. Both conditions must be satisfied; the 2019 Health Law does not yield to PDPL adequacy findings.
4.2 What was ruled out
| Option | Why rejected |
|---|---|
| Amazon Bedrock in me-central-1 | Bedrock me-central-1 has no runtime APIs. It offers only cross-region inference, which routes prompts out of the UAE. Breaches Federal Law No. 2 of 2019. |
| OpenAI (api.openai.com) | US-hosted inference. No UAE data residency. |
| Anthropic direct API | US/EU-hosted inference. No UAE data residency. |
| Azure OpenAI Service | No UAE region as of 2026. |
| Llama 3.3 70B self-hosted | Meta does not list Arabic as a supported language and explicitly warns against production use in unsupported languages. Sehtak operates bilingual Arabic/English clinically. |
4.3 What was chosen
- Qwen 2.5 72B Instruct AWQ — Arabic is officially listed in the Qwen 2.5 model card among 29+ supported languages.
- vLLM on EC2 g5.12xlarge (4× NVIDIA A10G, 96 GB VRAM) in me-central-1, inside a private VPC.
- Whisper large-v3 via faster-whisper on EC2 g5.xlarge (1× A10G, 24 GB VRAM) in me-central-1, same VPC.
- Endpoints exposed only on internal VPC DNS (
llm.internal:8000,whisper.internal:8080). No public routes, no NAT egress on these subnets. - Weights loaded from S3 me-central-1 into GPU memory at startup.
4.4 Residual legal risk
Low. The processing is end-to-end inside UAE infrastructure. De-identification in packages/ai/src/deidentify.ts runs before every LLM call as a belt-and-braces safeguard. Every AI output is reviewed by a licensed clinician.
5. Risks Identified Beyond the Five Flows
These are noted for completeness and tracked in the risk register, not expanded into full assessments in v1 of this DPIA.
- Family link abuse — a guardian retains access after the minor becomes an adult. Mitigation: family links auto-review on 18th birthday; guardian prompted to reconfirm or revoke.
- Staff off-boarding — a former doctor retains access. Mitigation:
facility_staff_links.endDategate; nightly job revokes expired sessions. - Desktop app local cache — a Tauri install on a shared workstation leaks patient list. Mitigation: OS-user-scoped encrypted SQLite; auto-lock on 15 min idle; wipe on logout.
- Push token hijack — pushing notification to an old device. Mitigation:
devices.lastSeenAttracked; tokens expire at 90 days inactive. - Insurance eligibility cache poisoning — Mitigation: eligibility results keyed by (patient, insurance, facility, day) and signed by DHPO response hash.
6. Consultation
The following stakeholders must review and sign off on this DPIA before the platform goes live:
| Stakeholder | Role in this DPIA | Status |
|---|---|---|
| Data Protection Officer | Owner of this document | [To be appointed] |
| Clinical Lead | Validates clinical flows and residual risk acceptance | [To be appointed] |
| CISO / Head of Security | Validates the technical controls referenced in CLAUDE.md §15 | [To be appointed] |
| External UAE healthcare law counsel | Confirms the Federal Law No. 2 of 2019 analysis in §4 | [To be engaged] |
| DHA NABIDH onboarding team | Confirms the consent gate and message formats | [Pending SIT] |
| DOH / Malaffi onboarding team | Confirms Malaffi consent and SD-WAN architecture | [Pending onboarding] |
| ADHICS v2.0 assessor (TASNEEF-RINA or HLB JAFZA) | Confirms ADHICS mapping in adhics-v2-controls.md | [Pending engagement] |
Patient representatives will be consulted through a usability study during Phase 2 (see CLAUDE.md §24).
7. Review Cadence
- Annual review by the DPO, recorded in a new version of this document.
- Ad-hoc review whenever any of the following occurs:
- A new data category is introduced.
- A new processor is engaged.
- A new inference model is deployed.
- A regulatory change (new DHA circular, MOHAP directive, DOH ADHICS update, PDPL amendment).
- A material security incident (high or critical severity, per
incident_reports.severity).
8. Sign-Off
| Role | Name | Signature | Date |
|---|---|---|---|
| DPO | [To be appointed] | ||
| CEO | [To be appointed] | ||
| CISO | [To be appointed] |
This DPIA is a draft and is not yet approved. It will be finalised and signed before the first patient is onboarded.