Introduction to Data Loss Prevention (DLP)
What is DLP?
Data Loss Prevention (DLP) in Speedscale is a comprehensive feature that enables organizations to automatically discover, mask, and manage Personally Identifiable Information (PII) and sensitive data in their API traffic. DLP ensures compliance with regulations like GDPR, HIPAA, and PCI DSS while maintaining the ability to test with realistic data.
Understanding PII
Personally Identifiable Information (PII) refers to any data that can be used to identify, contact, or locate a specific individual. The protection of PII is critical for privacy, security, and regulatory compliance. To learn more about PII and its importance:
- NIST Guidelines on Protecting PII - National Institute of Standards and Technology's comprehensive guide on protecting PII
- What is PII? (FTC) - Federal Trade Commission's overview of PII and consumer privacy
- OWASP Privacy Risks - Open Web Application Security Project's information on privacy violations and risks
Two-Phase Workflow
DLP operates through a two-phase workflow:
- Discovery and Redaction Phase: Identify PII in test environments and create rules to mask it in production
- Test Data Generation Phase: Generate realistic test data to replace redacted tokens for testing purposes
Key Concepts
- PII Discovery: Automatically identifies sensitive data patterns in captured traffic
- Data Redaction: Replaces PII with
REDACTED-prefixed tokens before data reaches Speedscale cloud - Test Data Generation: Creates realistic but safe test data to replace redacted tokens
- DLP Rules: Configurable rules that define how data should be redacted and transformed
Why Use DLP?
Compliance Requirements
Many organizations must comply with strict data protection regulations:
- GDPR (General Data Protection Regulation): Protects EU citizens' personal data
- Official GDPR Text - Complete text of the General Data Protection Regulation
- GDPR.eu Guide - Comprehensive guide to understanding GDPR requirements
- HIPAA (Health Insurance Portability and Accountability Act): Protects health information
- HHS HIPAA Information - U.S. Department of Health and Human Services HIPAA resources
- HIPAA Compliance Guide - HIPAA Security Rule compliance information
- PCI DSS (Payment Card Industry Data Security Standard): Protects credit card data
- PCI Security Standards Council - Official PCI DSS standards and resources
- PCI DSS Quick Reference Guide - PCI DSS requirements and guidance
- Other Regulations: Various industry-specific and regional requirements
- CCPA (California Consumer Privacy Act) - California's comprehensive privacy law
- PIPEDA (Canada) - Canada's federal privacy law
DLP helps ensure that sensitive data never reaches Speedscale's cloud storage, maintaining compliance automatically. For security and compliance information, see the Security documentation.
If PII is accidentally sent to Speedscale's cloud storage, Enterprise customers can request data deletion. The SLA for data deletion depends on your support level. Contact your Speedscale account representative or support team for assistance with data deletion requests.
Security Benefits
- Prevents Data Exposure: Sensitive data is redacted before leaving your infrastructure
- Reduces Risk: Eliminates the risk of accidentally exposing PII in logs, snapshots, or analytics
- Maintains Privacy: Ensures customer and user data remains private throughout the testing process
Testing with Realistic Data
DLP enables you to:
- Capture production-like traffic patterns without exposing real PII
- Generate realistic test data that maintains data relationships and formats
- Test applications with data that behaves like production data but contains no sensitive information
- Maintain test data quality while ensuring security
Key Benefits
Automated PII Discovery
Speedscale's DLP engine automatically discovers over 30+ types of sensitive data patterns, including:
- Contact information (emails, phone numbers)
- Identity information (SSNs, UUIDs)
- Financial information (credit card numbers)
- Authentication tokens (JWTs)
- Location data (coordinates)
- Network information (IP addresses)
- And many more
Real-Time Data Redaction
DLP rules are applied in real-time as traffic flows through your forwarders:
- Data is redacted before it reaches Speedscale cloud
- Original data never leaves your infrastructure
- Redaction happens at the network level using eBPF technology
- No application code changes required
Maintained Test Data Quality
When generating test data:
- Data formats match original patterns
- Data relationships are preserved
- Test scenarios remain realistic
- No degradation in test effectiveness
Performance Optimization
DLP rules can be optimized for performance:
- Narrow filter criteria reduce processing overhead
- Targeted redaction minimizes impact
- Efficient pattern matching algorithms
- Configurable performance vs. security trade-offs
DLP Workflow Overview
DLP follows a three-phase workflow designed to protect data while enabling effective testing:
Phase 1: Discovery and Rule Creation (Test Environment)
In your test environment:
- Set Up Speedscale: Install Speedscale's eBPF collector to capture traffic (see Installation Guide and CLI Installation)
- Capture Traffic: Collect representative traffic from your test environment
- Create Snapshot: Generate a snapshot containing the captured traffic
- Review PII Discovery: Examine recommendations identifying discovered PII
- Create DLP Rules: Accept recommendations and create DLP rules for production use
This phase happens in a controlled test environment where you can safely analyze traffic patterns and identify sensitive data.
Phase 2: Production Redaction (Production Environment)
In your production environment:
- Apply DLP Rules: Assign DLP rules to forwarders in your Infrastructure configuration
- Verify Redaction: Confirm that PII is being replaced with
REDACTED-prefixed tokens - Monitor Traffic: Ensure redaction is working correctly without impacting performance
- Validate Compliance: Verify that no PII reaches Speedscale cloud storage
This phase protects your production data in real-time, ensuring sensitive information never leaves your infrastructure.
Phase 3: Test Data Generation (Testing Environment)
For testing purposes:
- Create Snapshot from Production: Capture traffic from production (now containing REDACTED tokens)
- Generate Test Data Recommendations: System analyzes REDACTED tokens and suggests appropriate test data types
- Apply Test Data Recommendations: Select and apply recommendations to replace REDACTED tokens with test data
- Use Snapshot for Testing: Use the snapshot with realistic test data for comprehensive testing
This phase enables you to test with realistic data patterns while maintaining security and compliance.
Additional Resources
Learn More About Data Protection
- NIST Cybersecurity Framework - Framework for improving critical infrastructure cybersecurity
- ISO/IEC 27001 - International standard for information security management
- Cloud Security Alliance - Best practices for secure cloud computing
- SANS Data Protection Resources - Security training and research on data protection
Next Steps
Now that you understand what DLP is and why it's valuable, you can proceed to:
- Discovering PII in Test Environment - Start the DLP workflow by discovering PII in your test environment
Related Documentation
- Creating Snapshots - Learn about creating and managing snapshots
- Capturing Traffic - Traffic capture concepts
- Cluster Inspector - Infrastructure and forwarder configuration
- Traffic Transforms Documentation - Deep dive into transform chains