Apriori Algorithm Theory

The Apriori algorithm is a classic data mining technique for discovering frequent itemsets and association rules in transactional databases. It's widely used in market basket analysis, recommendation systems, and pattern discovery.

Problem Statement

Given a database of transactions, where each transaction contains a set of items:

Find frequent itemsets: sets of items that appear together frequently
Generate association rules: patterns like "if customers buy {A, B}, they likely buy {C}"

Key Concepts

1. Support

Support measures how frequently an itemset appears in the database:

Support(X) = (Transactions containing X) / (Total transactions)

Example: If {milk, bread} appears in 60 out of 100 transactions:

Support({milk, bread}) = 60/100 = 0.6 (60%)

2. Confidence

Confidence measures the reliability of an association rule:

Confidence(X => Y) = Support(X ∪ Y) / Support(X)

Example: For rule {milk} => {bread}:

Confidence = P(bread | milk) = Support({milk, bread}) / Support({milk})

If 60 transactions have {milk, bread} and 80 have {milk}:

Confidence = 60/80 = 0.75 (75%)

3. Lift

Lift measures how much more likely items are bought together than independently:

Lift(X => Y) = Confidence(X => Y) / Support(Y)

Lift > 1.0: Positive correlation (bought together)
Lift = 1.0: Independent (no relationship)
Lift < 1.0: Negative correlation (substitutes)

Example: For rule {milk} => {bread}:

Lift = 0.75 / 0.70 = 1.07

Customers who buy milk are 7% more likely to buy bread than average.

The Apriori Algorithm

Core Principle: Apriori Property

If an itemset is frequent, all of its subsets must also be frequent.

Contrapositive: If an itemset is infrequent, all of its supersets must also be infrequent.

This enables efficient pruning of the search space.

Algorithm Steps

1. Find all frequent 1-itemsets (individual items)
   - Scan database, count item occurrences
   - Keep items with support >= min_support

2. For k = 2, 3, 4, ...:
   a. Generate candidate k-itemsets from frequent (k-1)-itemsets
      - Join step: Combine (k-1)-itemsets that differ by one item
      - Prune step: Remove candidates with infrequent (k-1)-subsets

   b. Scan database to count candidate support

   c. Keep candidates with support >= min_support

   d. If no frequent k-itemsets found, stop

3. Generate association rules from frequent itemsets:
   - For each frequent itemset I with |I| >= 2:
     - For each non-empty subset A of I:
       - Generate rule A => (I \ A)
       - Keep rules with confidence >= min_confidence

Example Execution

Transactions:

T1: {milk, bread, butter}
T2: {milk, bread}
T3: {bread, butter}
T4: {milk, butter}

Step 1: Frequent 1-itemsets (min_support = 50%)

{milk}:   3/4 = 75% ✓
{bread}:  3/4 = 75% ✓
{butter}: 3/4 = 75% ✓

Step 2: Generate candidate 2-itemsets

Candidates: {milk, bread}, {milk, butter}, {bread, butter}

Step 3: Count support

{milk, bread}:   2/4 = 50% ✓
{milk, butter}:  2/4 = 50% ✓
{bread, butter}: 2/4 = 50% ✓

Step 4: Generate candidate 3-itemsets

Candidate: {milk, bread, butter}
Support: 1/4 = 25% ✗ (below threshold)

Frequent itemsets: {milk}, {bread}, {butter}, {milk, bread}, {milk, butter}, {bread, butter}

Association rules (min_confidence = 60%):

{milk} => {bread}    Conf: 2/3 = 67% ✓
{bread} => {milk}    Conf: 2/3 = 67% ✓
{milk} => {butter}   Conf: 2/3 = 67% ✓
{butter} => {milk}   Conf: 2/3 = 67% ✓
{bread} => {butter}  Conf: 2/3 = 67% ✓
{butter} => {bread}  Conf: 2/3 = 67% ✓

Complexity Analysis

Time Complexity

Worst case: O(2^n · |D| · |T|)

n = number of unique items
|D| = number of transactions
|T| = average transaction size

In practice: Much better due to pruning

Typical: O(n^k · |D|) where k is max frequent itemset size (usually < 5)

Space Complexity

O(n + |F|)

n = unique items
|F| = number of frequent itemsets (exponential worst case, but usually small)

Parameters

Minimum Support

Higher support (e.g., 50%):

Pros: Find common, reliable patterns
Cons: Miss rare but important associations

Lower support (e.g., 10%):

Pros: Discover niche patterns
Cons: Many spurious associations, slower

Rule of thumb: Start with 10-30% for exploratory analysis

Minimum Confidence

Higher confidence (e.g., 80%):

Pros: High-quality, actionable rules
Cons: Miss weaker but still meaningful patterns

Lower confidence (e.g., 50%):

Pros: More exploratory insights
Cons: Less reliable rules

Rule of thumb: 60-70% for actionable business insights

Strengths

Simplicity: Easy to understand and implement
Completeness: Finds all frequent itemsets (no false negatives)
Pruning: Apriori property enables efficient search
Interpretability: Rules are human-readable

Limitations

Multiple database scans: One scan per itemset size
Candidate generation: Exponential in worst case
Low support problem: Misses rare but important patterns
Binary transactions: Doesn't handle quantities or sequences

Improvements and Variants

FP-Growth: Avoids candidate generation using FP-tree (2x-10x faster)
Eclat: Vertical data format (item-TID lists)
AprioriTID: Reduces database scans
Weighted Apriori: Assigns weights to items
Multi-level Apriori: Handles concept hierarchies (e.g., "dairy" → "milk")

Applications

1. Market Basket Analysis

Cross-selling: "Customers who bought X also bought Y"
Product placement: Put related items near each other
Promotions: Bundle frequently bought items

2. Recommendation Systems

Collaborative filtering: Users who liked X also liked Y
Content discovery: Articles often read together

3. Medical Diagnosis

Symptom patterns: Patients with X often have Y
Drug interactions: Medications prescribed together

4. Web Mining

Clickstream analysis: Pages visited together
Session patterns: User navigation paths

5. Bioinformatics

Gene co-expression: Genes activated together
Protein interactions: Proteins that interact

Best Practices

Data preprocessing:
- Remove duplicates
- Filter noise (very rare items)
- Group similar items (e.g., "2% milk" and "whole milk" → "milk")
Parameter tuning:
- Start with balanced parameters (support=20-30%, confidence=60-70%)
- Increase support if too many rules
- Lower confidence to explore weak patterns
Rule filtering:
- Focus on high lift rules (> 1.2)
- Remove obvious rules (e.g., "butter => milk" if everyone buys milk)
- Check rule support (avoid rare but high-confidence spurious rules)
Validation:
- Test rules on holdout data
- A/B test recommendations
- Monitor business metrics (sales lift, conversion rate)

Common Pitfalls

Support too low: Millions of spurious rules
Support too high: Miss important niche patterns
Ignoring lift: High confidence ≠ useful (e.g., everyone buys bread)
Confusing correlation with causation: Apriori finds associations, not causes

Example Use Case: Grocery Store

Goal: Increase basket size through cross-selling

Data: 10,000 transactions, 500 unique items

Parameters: support=5%, confidence=60%

Results:

Rule: {diapers} => {beer}
  Support: 8% (800 transactions)
  Confidence: 75%
  Lift: 2.5

Interpretation:

8% of all transactions contain both diapers and beer
75% of diaper buyers also buy beer
Diaper buyers are 2.5x more likely to buy beer than average

Action:

Place beer near diapers
Offer "diaper + beer" bundle discount
Target diaper buyers with beer promotions

Expected Result: 10-20% increase in beer sales among diaper buyers

Mathematical Foundations

Set Theory

Frequent itemset mining is fundamentally about:

Power set: All 2^n possible itemsets from n items
Subset lattice: Hierarchical structure of itemsets
Anti-monotonicity: Apriori property (subset frequency ≥ superset frequency)

Probability

Association rules encode conditional probabilities:

Support: P(X)
Confidence: P(Y|X) = P(X ∩ Y) / P(X)
Lift: P(Y|X) / P(Y)

Information Theory

Mutual information: Measures dependence between itemsets
Entropy: Quantifies uncertainty in item distributions

EXTREME TDD - The Aprender Guide to Zero-Defect Machine Learning