Data Protection — Handling Sensitive Data with AI Tools

Introduction

One of the highest-risk mistakes nonprofit leaders make with AI is sharing sensitive data—health information, financial data, personally identifiable information—with consumer-grade AI tools. A program director innocently uses ChatGPT to de-identify a client case study but doesn't fully remove identifying information. Accidentally, she's disclosed confidential client data to an AI system she doesn't control. Similarly, a grant writer uses ChatGPT to summarize confidential program data, again unknowingly exposing protected information. These scenarios happen regularly.

This lesson provides practical guidance for protecting sensitive data when using AI tools. We'll explore data sensitivity classifications, safe and unsafe uses of different tool types, de-identification strategies, compliance requirements, and incident response procedures.

Data Sensitivity Levels

Not all data requires the same level of protection. Understanding data sensitivity helps you make appropriate decisions about which tools can safely be used with different types of information.

Highly Sensitive Data

This data requires maximum protection and strict limitations on AI tool use.

Health Information: Medical records, diagnoses, treatment information, mental health records, substance use histories (HIPAA protected)
Education Records: Student records, academic performance, special education information (FERPA protected)
Directly Identifying Information: Names when associated with other data that could identify individuals, social security numbers, passport numbers, driver's license numbers
Financial Account Information: Bank account numbers, credit card numbers, routing numbers, tax return information
Justice System Information: Criminal records, arrest records, case information for incarcerated individuals

Moderately Sensitive Data

This data requires protection but can be used with some AI tools, particularly with de-identification or appropriate security measures.

Demographic Information: Age, race, ethnicity, gender, disability status, immigration status when not directly identifying
Contact Information: Email addresses, phone numbers, home addresses without other identifying information
Program Participation Data: Whether someone participated in a program, general program outcomes without identifying information
Behavioral or Circumstance Data: Information about housing status, employment, family situation when not directly identifying

Lower-Sensitivity Data

This data can generally be safely shared with consumer AI tools without special restrictions.

Public Information: Information already publicly available or published
Aggregated Data: Statistics, trends, and patterns not traceable to individuals
Organizational Data: Information about your organization, funding sources, program types, service areas
General Knowledge: Information about nonprofit management, grant writing, fundraising strategy

Safe and Unsafe Data Practices by Tool Type

Different categories of AI tools have different security and privacy characteristics. Understanding these differences helps you make appropriate choices about which tools to use with different data types.

Consumer-Grade AI Tools (ChatGPT, Google Gemini, Copilot)

UNSAFE FOR
Highly Sensitive Data

CAUTION WITH
Moderately Sensitive Data

SAFE FOR
Lower-Sensitivity Data

Consumer-grade AI tools transmit data to third-party servers, train on provided data, and may retain that data indefinitely. They're not appropriate for sensitive data. However, using them with lower-sensitivity data and appropriately de-identified moderately sensitive data is fine.

Example: Using ChatGPT Safely

Safe: "Help me brainstorm grant writing strategies for health nonprofits" or "Summarize trends in youth homelessness based on published statistics"

Unsafe: Sharing actual client stories that contain identifying information, sharing program participant lists, sharing health histories of actual clients

Enterprise AI Tools (Microsoft 365 Copilot, Google Workspace AI, Claude API with enterprise agreements)

CAUTION WITH
Highly Sensitive Data

SAFE FOR
Moderately Sensitive Data

SAFE FOR
Lower-Sensitivity Data

Enterprise AI tools typically offer stronger data protection, security certifications (SOC 2, ISO 27001), and commitments not to train on your data. They're appropriate for moderately sensitive data and most use cases with highly sensitive data, provided you have data processing agreements in place. However, some highly sensitive uses may still require additional protections.

Purpose-Built Healthcare or Education AI (designed for HIPAA/FERPA compliance)

SAFE FOR
Highly Sensitive Data

SAFE FOR
Moderately Sensitive Data

SAFE FOR
Lower-Sensitivity Data

AI tools built specifically for healthcare or education and certified as HIPAA or FERPA compliant are appropriate for the data they're designed for. These tools have built-in protections, security certifications, and contractual obligations appropriate for highly sensitive data.

De-Identification Strategies

De-identification—removing or altering information that could identify individuals—allows you to use lower-sensitivity versions of sensitive data with consumer AI tools. However, de-identification is complex. Removing obvious identifying information isn't always sufficient if other data could still reveal identity.

De-Identification Best Practices

Remove Direct Identifiers

Always remove names, addresses, phone numbers, email addresses, dates of birth, social security numbers, and other directly identifying information before sharing data with AI tools.

Be Careful with Quasi-Identifiers

Even without direct identifiers, combinations of demographic information can identify individuals. A case study mentioning "a 34-year-old transgender woman with HIV in rural Montana" might re-identify the individual within that community. Removing or generalizing quasi-identifiers prevents re-identification.

Aggregate or Generalize

Instead of specific ages, use age ranges. Instead of specific locations, use regions. Instead of specific diagnoses, use diagnosis categories. This reduces re-identification risk while preserving utility for analysis.

Suppress Rare Values

If a program has few clients with a particular characteristic (e.g., only one transgender client), that characteristic becomes identifying. Suppress or generalize such rare values.

Verify De-Identification

Don't assume de-identification succeeds. After removing identifiers, review the data to ensure you can't reasonably re-identify individuals. When in doubt, remove more information rather than less.

Compliance Requirements

Legal and regulatory frameworks create compliance requirements for data protection with AI tools. Key regulations include:

HIPAA (Health Insurance Portability and Accountability Act)

If your organization handles protected health information, HIPAA creates strict requirements. You cannot share protected health information with AI tools unless you have a Business Associate Agreement (BAA) in place. Most consumer AI tools don't offer BAAs. Enterprise healthcare AI tools typically do.

FERPA (Family Educational Rights and Privacy Act)

If your organization works with education records, FERPA restricts disclosure. You cannot share education records with third-party AI tools without appropriate agreements in place. School-focused AI tools typically have FERPA-compliant data agreements.

GDPR (General Data Protection Regulation)

If your organization serves anyone in the EU or processes EU resident data, GDPR applies. GDPR creates strict requirements around data processing, consent, and accountability. Using consumer AI tools with EU resident data without appropriate data processing agreements violates GDPR.

State Privacy Laws

States including California (CCPA), Virginia (VCDPA), and others have enacted privacy laws. These laws require organizations to disclose data uses and sometimes require consent for data sharing. Compliance varies by law and by how your organization uses data.

Data Protection Policies

Your organization should establish clear data protection policies specific to AI tool use. Policies should cover:

Data classification: How your organization categorizes data sensitivity
Tool eligibility: Which tools can be used with which data types
De-identification requirements: When and how data must be de-identified before using AI tools
Approval processes: Who approves sharing data with AI tools
Audit and monitoring: How the organization monitors compliance with data protection requirements
Incident response: What to do if data is accidentally shared inappropriately
Data agreements: Requirements for Business Associate Agreements or data processing agreements with tool providers

Incident Response for Data Breaches

Despite precautions, accidental data disclosures sometimes occur. Effective incident response minimizes harm and demonstrates accountability.

Incident Response Steps

Identify what happened: Determine exactly what data was disclosed, when, how, and to whom. This requires immediate investigation.
Secure the situation: Delete the data from the AI tool if possible. Change passwords if credentials were exposed. Implement immediate protections to prevent further disclosure.
Assess impact: Determine whether anyone could be harmed by the disclosure. Could exposed information be used to identify or harm affected individuals? Is the information sensitive enough to trigger regulatory notification requirements?
Notify affected parties: If individuals could be harmed, notify them what happened, what data was exposed, and what they should do. Notify relevant regulatory authorities if required (HIPAA breaches, FERPA violations, etc.).
Investigate root cause: Why did the disclosure happen? Was it a policy violation? A training gap? A systems failure? Understanding the root cause prevents recurrence.
Implement corrections: Address the root cause. If it was a training gap, provide additional training. If it was a systems failure, fix the system. If it was a policy gap, revise policies.
Document thoroughly: Keep records of the incident, investigation, notification, and corrective actions. This documentation demonstrates responsible incident management to regulators and stakeholders.

Key Insight

Regulators are generally more forgiving of incidents that organizations discover and manage responsibly than of incidents they learn about from other sources. Transparent, thorough incident response demonstrates accountability and reduces regulatory penalties.

Creating a Data Protection Culture

Policies only work if staff understand them and internalize data protection as a value. Building a strong data protection culture requires:

Clear training: All staff should understand data protection requirements and specifically how they apply to AI tools.
Accessible policies: Make policies available and understandable, not hidden in dense documents.
Decision support: When staff are uncertain about whether specific data can be used with specific tools, provide clear processes to ask for guidance.
Leadership modeling: Leaders and managers who visibly prioritize data protection send powerful signals about organizational values.
Incident response as learning: When incidents occur, use them as learning opportunities rather than solely as discipline. Staff are more likely to report breaches they discover if they believe the response will be supportive.

Next: Policy Templates and Customization

Learn how to adapt AI governance policy templates for your organization's context and size.

Start Lesson 7

Data Protection with AI Tools

Introduction

Data Sensitivity Levels

Highly Sensitive Data

Moderately Sensitive Data

Lower-Sensitivity Data

Safe and Unsafe Data Practices by Tool Type

Consumer-Grade AI Tools (ChatGPT, Google Gemini, Copilot)

Enterprise AI Tools (Microsoft 365 Copilot, Google Workspace AI, Claude API with enterprise agreements)

Purpose-Built Healthcare or Education AI (designed for HIPAA/FERPA compliance)

De-Identification Strategies

De-Identification Best Practices

Remove Direct Identifiers

Be Careful with Quasi-Identifiers

Aggregate or Generalize

Suppress Rare Values

Verify De-Identification

Compliance Requirements

HIPAA (Health Insurance Portability and Accountability Act)

FERPA (Family Educational Rights and Privacy Act)

GDPR (General Data Protection Regulation)

State Privacy Laws

Data Protection Policies

Incident Response for Data Breaches

Incident Response Steps

Creating a Data Protection Culture

Next: Policy Templates and Customization