Skip to main content

Introduction to Data Loss Prevention (DLP)

What is DLP?

Data Loss Prevention (DLP) in Speedscale is a comprehensive feature that enables organizations to automatically discover, mask, and manage Personally Identifiable Information (PII) and sensitive data in their API traffic. DLP ensures compliance with regulations like GDPR, HIPAA, and PCI DSS while maintaining the ability to test with realistic data.

Understanding PII

Personally Identifiable Information (PII) refers to any data that can be used to identify, contact, or locate a specific individual. The protection of PII is critical for privacy, security, and regulatory compliance. To learn more about PII and its importance:

Two-Phase Workflow

DLP operates through a two-phase workflow:

  1. Discovery and Redaction Phase: Identify PII in test environments and create rules to mask it in production
  2. Test Data Generation Phase: Generate realistic test data to replace redacted tokens for testing purposes

Key Concepts

  • PII Discovery: Automatically identifies sensitive data patterns in captured traffic
  • Data Redaction: Replaces PII with REDACTED- prefixed tokens before data reaches Speedscale cloud
  • Test Data Generation: Creates realistic but safe test data to replace redacted tokens
  • DLP Rules: Configurable rules that define how data should be redacted and transformed

Why Use DLP?

Compliance Requirements

Many organizations must comply with strict data protection regulations:

DLP helps ensure that sensitive data never reaches Speedscale's cloud storage, maintaining compliance automatically. For security and compliance information, see the Security documentation.

Enterprise Data Deletion

If PII is accidentally sent to Speedscale's cloud storage, Enterprise customers can request data deletion. The SLA for data deletion depends on your support level. Contact your Speedscale account representative or support team for assistance with data deletion requests.

Security Benefits

  • Prevents Data Exposure: Sensitive data is redacted before leaving your infrastructure
  • Reduces Risk: Eliminates the risk of accidentally exposing PII in logs, snapshots, or analytics
  • Maintains Privacy: Ensures customer and user data remains private throughout the testing process

Testing with Realistic Data

DLP enables you to:

  • Capture production-like traffic patterns without exposing real PII
  • Generate realistic test data that maintains data relationships and formats
  • Test applications with data that behaves like production data but contains no sensitive information
  • Maintain test data quality while ensuring security

Key Benefits

Automated PII Discovery

Speedscale's DLP engine automatically discovers over 30+ types of sensitive data patterns, including:

  • Contact information (emails, phone numbers)
  • Identity information (SSNs, UUIDs)
  • Financial information (credit card numbers)
  • Authentication tokens (JWTs)
  • Location data (coordinates)
  • Network information (IP addresses)
  • And many more

Real-Time Data Redaction

DLP rules are applied in real-time as traffic flows through your forwarders:

  • Data is redacted before it reaches Speedscale cloud
  • Original data never leaves your infrastructure
  • Redaction happens at the network level using eBPF technology
  • No application code changes required

Maintained Test Data Quality

When generating test data:

  • Data formats match original patterns
  • Data relationships are preserved
  • Test scenarios remain realistic
  • No degradation in test effectiveness

Performance Optimization

DLP rules can be optimized for performance:

  • Narrow filter criteria reduce processing overhead
  • Targeted redaction minimizes impact
  • Efficient pattern matching algorithms
  • Configurable performance vs. security trade-offs

DLP Workflow Overview

DLP follows a three-phase workflow designed to protect data while enabling effective testing:

Phase 1: Discovery and Rule Creation (Test Environment)

In your test environment:

  1. Set Up Speedscale: Install Speedscale's eBPF collector to capture traffic (see Installation Guide and CLI Installation)
  2. Capture Traffic: Collect representative traffic from your test environment
  3. Create Snapshot: Generate a snapshot containing the captured traffic
  4. Review PII Discovery: Examine recommendations identifying discovered PII
  5. Create DLP Rules: Accept recommendations and create DLP rules for production use

This phase happens in a controlled test environment where you can safely analyze traffic patterns and identify sensitive data.

Phase 2: Production Redaction (Production Environment)

In your production environment:

  1. Apply DLP Rules: Assign DLP rules to forwarders in your Infrastructure configuration
  2. Verify Redaction: Confirm that PII is being replaced with REDACTED- prefixed tokens
  3. Monitor Traffic: Ensure redaction is working correctly without impacting performance
  4. Validate Compliance: Verify that no PII reaches Speedscale cloud storage

This phase protects your production data in real-time, ensuring sensitive information never leaves your infrastructure.

Phase 3: Test Data Generation (Testing Environment)

For testing purposes:

  1. Create Snapshot from Production: Capture traffic from production (now containing REDACTED tokens)
  2. Generate Test Data Recommendations: System analyzes REDACTED tokens and suggests appropriate test data types
  3. Apply Test Data Recommendations: Select and apply recommendations to replace REDACTED tokens with test data
  4. Use Snapshot for Testing: Use the snapshot with realistic test data for comprehensive testing

This phase enables you to test with realistic data patterns while maintaining security and compliance.

Additional Resources

Learn More About Data Protection

Next Steps

Now that you understand what DLP is and why it's valuable, you can proceed to: