Data Protection with AI Tools

Handling Sensitive Data Safely

35-minute read

Introduction

One of the highest-risk mistakes nonprofit leaders make with AI is sharing sensitive data—health information, financial data, personally identifiable information—with consumer-grade AI tools. A program director innocently uses ChatGPT to de-identify a client case study but doesn't fully remove identifying information. Accidentally, she's disclosed confidential client data to an AI system she doesn't control. Similarly, a grant writer uses ChatGPT to summarize confidential program data, again unknowingly exposing protected information. These scenarios happen regularly.

This lesson provides practical guidance for protecting sensitive data when using AI tools. We'll explore data sensitivity classifications, safe and unsafe uses of different tool types, de-identification strategies, compliance requirements, and incident response procedures.

Data Sensitivity Levels

Not all data requires the same level of protection. Understanding data sensitivity helps you make appropriate decisions about which tools can safely be used with different types of information.

Highly Sensitive Data

This data requires maximum protection and strict limitations on AI tool use.

  • Health Information: Medical records, diagnoses, treatment information, mental health records, substance use histories (HIPAA protected)
  • Education Records: Student records, academic performance, special education information (FERPA protected)
  • Directly Identifying Information: Names when associated with other data that could identify individuals, social security numbers, passport numbers, driver's license numbers
  • Financial Account Information: Bank account numbers, credit card numbers, routing numbers, tax return information
  • Justice System Information: Criminal records, arrest records, case information for incarcerated individuals

Moderately Sensitive Data

This data requires protection but can be used with some AI tools, particularly with de-identification or appropriate security measures.

  • Demographic Information: Age, race, ethnicity, gender, disability status, immigration status when not directly identifying
  • Contact Information: Email addresses, phone numbers, home addresses without other identifying information
  • Program Participation Data: Whether someone participated in a program, general program outcomes without identifying information
  • Behavioral or Circumstance Data: Information about housing status, employment, family situation when not directly identifying

Lower-Sensitivity Data

This data can generally be safely shared with consumer AI tools without special restrictions.

  • Public Information: Information already publicly available or published
  • Aggregated Data: Statistics, trends, and patterns not traceable to individuals
  • Organizational Data: Information about your organization, funding sources, program types, service areas
  • General Knowledge: Information about nonprofit management, grant writing, fundraising strategy

Safe and Unsafe Data Practices by Tool Type

Different categories of AI tools have different security and privacy characteristics. Understanding these differences helps you make appropriate choices about which tools to use with different data types.

Consumer-Grade AI Tools (ChatGPT, Google Gemini, Copilot)

UNSAFE FOR
Highly Sensitive Data
CAUTION WITH
Moderately Sensitive Data
SAFE FOR
Lower-Sensitivity Data

Consumer-grade AI tools transmit data to third-party servers, train on provided data, and may retain that data indefinitely. They're not appropriate for sensitive data. However, using them with lower-sensitivity data and appropriately de-identified moderately sensitive data is fine.

Example: Using ChatGPT Safely

Safe: "Help me brainstorm grant writing strategies for health nonprofits" or "Summarize trends in youth homelessness based on published statistics"

Unsafe: Sharing actual client stories that contain identifying information, sharing program participant lists, sharing health histories of actual clients

Enterprise AI Tools (Microsoft 365 Copilot, Google Workspace AI, Claude API with enterprise agreements)

CAUTION WITH
Highly Sensitive Data
SAFE FOR
Moderately Sensitive Data
SAFE FOR
Lower-Sensitivity Data

Enterprise AI tools typically offer stronger data protection, security certifications (SOC 2, ISO 27001), and commitments not to train on your data. They're appropriate for moderately sensitive data and most use cases with highly sensitive data, provided you have data processing agreements in place. However, some highly sensitive uses may still require additional protections.

Purpose-Built Healthcare or Education AI (designed for HIPAA/FERPA compliance)

SAFE FOR
Highly Sensitive Data
SAFE FOR
Moderately Sensitive Data
SAFE FOR
Lower-Sensitivity Data

AI tools built specifically for healthcare or education and certified as HIPAA or FERPA compliant are appropriate for the data they're designed for. These tools have built-in protections, security certifications, and contractual obligations appropriate for highly sensitive data.

De-Identification Strategies

De-identification—removing or altering information that could identify individuals—allows you to use lower-sensitivity versions of sensitive data with consumer AI tools. However, de-identification is complex. Removing obvious identifying information isn't always sufficient if other data could still reveal identity.

De-Identification Best Practices

Remove Direct Identifiers

Always remove names, addresses, phone numbers, email addresses, dates of birth, social security numbers, and other directly identifying information before sharing data with AI tools.

Be Careful with Quasi-Identifiers

Even without direct identifiers, combinations of demographic information can identify individuals. A case study mentioning "a 34-year-old transgender woman with HIV in rural Montana" might re-identify the individual within that community. Removing or generalizing quasi-identifiers prevents re-identification.

Aggregate or Generalize

Instead of specific ages, use age ranges. Instead of specific locations, use regions. Instead of specific diagnoses, use diagnosis categories. This reduces re-identification risk while preserving utility for analysis.

Suppress Rare Values

If a program has few clients with a particular characteristic (e.g., only one transgender client), that characteristic becomes identifying. Suppress or generalize such rare values.

Verify De-Identification

Don't assume de-identification succeeds. After removing identifiers, review the data to ensure you can't reasonably re-identify individuals. When in doubt, remove more information rather than less.

Compliance Requirements

Legal and regulatory frameworks create compliance requirements for data protection with AI tools. Key regulations include:

HIPAA (Health Insurance Portability and Accountability Act)

If your organization handles protected health information, HIPAA creates strict requirements. You cannot share protected health information with AI tools unless you have a Business Associate Agreement (BAA) in place. Most consumer AI tools don't offer BAAs. Enterprise healthcare AI tools typically do.

FERPA (Family Educational Rights and Privacy Act)

If your organization works with education records, FERPA restricts disclosure. You cannot share education records with third-party AI tools without appropriate agreements in place. School-focused AI tools typically have FERPA-compliant data agreements.

GDPR (General Data Protection Regulation)

If your organization serves anyone in the EU or processes EU resident data, GDPR applies. GDPR creates strict requirements around data processing, consent, and accountability. Using consumer AI tools with EU resident data without appropriate data processing agreements violates GDPR.

State Privacy Laws

States including California (CCPA), Virginia (VCDPA), and others have enacted privacy laws. These laws require organizations to disclose data uses and sometimes require consent for data sharing. Compliance varies by law and by how your organization uses data.

Data Protection Policies

Your organization should establish clear data protection policies specific to AI tool use. Policies should cover:

Incident Response for Data Breaches

Despite precautions, accidental data disclosures sometimes occur. Effective incident response minimizes harm and demonstrates accountability.

Incident Response Steps

  1. Identify what happened: Determine exactly what data was disclosed, when, how, and to whom. This requires immediate investigation.
  2. Secure the situation: Delete the data from the AI tool if possible. Change passwords if credentials were exposed. Implement immediate protections to prevent further disclosure.
  3. Assess impact: Determine whether anyone could be harmed by the disclosure. Could exposed information be used to identify or harm affected individuals? Is the information sensitive enough to trigger regulatory notification requirements?
  4. Notify affected parties: If individuals could be harmed, notify them what happened, what data was exposed, and what they should do. Notify relevant regulatory authorities if required (HIPAA breaches, FERPA violations, etc.).
  5. Investigate root cause: Why did the disclosure happen? Was it a policy violation? A training gap? A systems failure? Understanding the root cause prevents recurrence.
  6. Implement corrections: Address the root cause. If it was a training gap, provide additional training. If it was a systems failure, fix the system. If it was a policy gap, revise policies.
  7. Document thoroughly: Keep records of the incident, investigation, notification, and corrective actions. This documentation demonstrates responsible incident management to regulators and stakeholders.
Key Insight

Regulators are generally more forgiving of incidents that organizations discover and manage responsibly than of incidents they learn about from other sources. Transparent, thorough incident response demonstrates accountability and reduces regulatory penalties.

Creating a Data Protection Culture

Policies only work if staff understand them and internalize data protection as a value. Building a strong data protection culture requires:

Next: Policy Templates and Customization

Learn how to adapt AI governance policy templates for your organization's context and size.

Start Lesson 7