What is Semantic Clustering?

Semantic clustering groups content by topic similarity, not just keywords. If you publish articles about "marathon training," "5K preparation," and "running nutrition," ILLIXIS recognizes these all relate to running and clusters them together.

This creates precise retargeting audiences. Instead of targeting "everyone who visited my blog," you target "readers interested in marathon training" or "readers who viewed nutrition content."

How It Works

Automated Nightly Analysis

Every night at 2:30 AM, ILLIXIS runs automatic clustering:

  1. Finds unassigned content - Published articles not yet in any cluster
  2. Extracts topics - Identifies themes using AI
  3. Calculates similarity - Compares topic overlap using Jaccard similarity (threshold: 0.3)
  4. Assigns to clusters - Adds content to existing clusters OR creates new ones
  5. Syncs audiences - Updates GA4, Meta, and LinkedIn Custom Audiences

You don't configure this. It happens automatically after publishing.

Topic-Based Clustering Algorithm

ILLIXIS uses topic overlap, not word embeddings:

  • Content A topics: ["marathon training", "running shoes", "endurance"]
  • Cluster B topics: ["marathon training", "race preparation", "running gear"]
  • Jaccard similarity: intersection / union = 2/6 = 0.33

If similarity > 0.3, content joins the cluster. Otherwise, a new cluster is created.

This is simpler and faster than vector-based clustering, with no ML library dependencies.

Finding Your Clusters

Go to Advertising in your main navigation. The dashboard shows all auto-created clusters:

  • Cluster name - Generated from dominant topics (e.g., "Marathon Training & Running")
  • Content count - Number of articles in this cluster
  • GA4 audience status - Synced, syncing, or not created
  • Meta audience status - Synced, syncing, or error
  • LinkedIn audience status - Synced, syncing, or not created
  • Last synced - When audience was last updated

Viewing Cluster Details

Click any cluster to see:

  • All content in this cluster - Titles, slugs, publication dates
  • Common topics - Top 10 themes shared across content
  • Sample titles - First 3 articles for quick context
  • Audience details - GA4/Meta/LinkedIn audience IDs and sync status

Use this to verify clustering accuracy. If unrelated content appears together, those articles may need better topic extraction (ensure they have clear themes).

Cluster Naming

ILLIXIS generates names from the most frequent topics:

  • Single dominant topic: "Sustainable Swimwear"
  • Multiple topics: "Marathon Training & Nutrition"
  • Fallback: "Content from [Date]" (if no clear topics)

Names are auto-generated. You cannot rename clusters manually (they represent semantic themes, not user-defined categories).

Audience Creation

Each cluster becomes one audience on each platform:

Google Analytics 4

  • Audience name: Topic: [Cluster Name] (e.g., "Topic: Marathon Training")
  • Membership duration: 30 days
  • Filter rules: Page path keywords from common topics
  • Sync method: GA4 Management API
  • Google Ads sync: Automatic (if GA4 + Google Ads are linked)

Meta (Facebook/Instagram)

  • Custom Audience name: [ILLIXIS] [Cluster Name] (e.g., "[ILLIXIS] Marathon Training")
  • Retention: 30 days
  • Source: Website visitors via Meta Pixel
  • Sync method: Meta Custom Audiences API
  • Status tracking: Syncing, synced, or error

LinkedIn

  • DMP Segment name: [ILLIXIS] [Cluster Name]
  • Retention: 30 days
  • Source: Website visitors via LinkedIn Insight Tag
  • Sync method: LinkedIn Matched Audiences API
  • Use case: B2B retargeting with professional audience data

Minimum Cluster Size

Clusters require at least 1 content item to create an audience. Single-article clusters are allowed (useful for high-value content like pillar posts).

Maximum cluster size: 50 articles (to avoid regex size limits in GA4 filter rules).

Topic Similarity Threshold

Content joins a cluster if similarity > 0.3 (30% topic overlap).

Lower threshold = larger clusters (more diverse content). Higher threshold = smaller clusters (more focused content).

This is not configurable. The 0.3 default balances audience size with relevance.

When Clusters Update

Clusters update automatically when:

  • New content is published - Nightly batch job processes unassigned articles
  • Topics are extracted - Content must have extracted topics to be clustered
  • Cluster membership changes - Common topics recalculate from all content in cluster

Audiences do NOT update when clusters change. GA4 audiences cannot be modified after creation (would require archive + recreate). Meta/LinkedIn audiences can update, but batch updates are expensive.

For now, clusters are flagged for re-sync when content changes, but re-syncing audiences is manual.

Automation Schedule

ILLIXIS runs several automated processes to keep your clusters and audiences up to date:

| Process | Schedule | Description |
|---------|----------|-------------|
| Cluster Recalculation | Weekly, Thursdays 4:00 AM UTC | Full semantic analysis of all content, recalculates cluster memberships |
| New Content Assignment | Within 24 hours of publishing | Newly published content is added to appropriate clusters |
| Topic Label Regeneration | Monthly | Cluster names and common topics are regenerated based on current content |
| Internal Linking Suggestions | After each cluster refresh | Linking recommendations update when cluster membership changes |

What This Means for You

  • Thursday mornings: Expect to see updated cluster memberships and possibly new clusters if content themes have shifted
  • After publishing: Your new content will appear in relevant clusters by the next day
  • Monthly: Cluster names may change slightly as topic labels regenerate to better reflect current content
  • Linking suggestions: Always current after any cluster refresh, helping you build stronger topic authority

You don't need to trigger any of these manually. They run automatically in the background.

Manual Trigger

If you publish content and want immediate clustering (without waiting for 2:30 AM):

  1. Go to Advertising dashboard
  2. Click "Sync Audiences" button
  3. ILLIXIS runs the clustering algorithm now
  4. Results appear in the dashboard (clusters created/updated, audiences synced)

This is useful after publishing a batch of articles.

Filter Rules

Each cluster creates a GA4 filter based on content URLs:

Keyword-Based (Default)

Uses top 10 keywords from common topics:

  • Cluster topics: ["marathon training", "running shoes", "race preparation"]
  • Filter: page_path_keywords: ["marathon-training", "running-shoes", "race-preparation"]
  • Matches: Any URL containing those keywords (OR logic)

Title Keywords (Supplemental)

If cluster has few topics, ILLIXIS extracts keywords from content titles:

  • Find words appearing in 2+ titles
  • Exclude stopwords (the, a, and, etc.)
  • Add to filter rules (up to 10 keywords total)

Fallback: Cluster Name

If no topics or title keywords exist:

  • Filter: page_path_contains: [cluster-slug]
  • Example: page_path_contains: "marathon-training"

This is the weakest match (requires exact slug in URL).

Syncing to Google Ads

GA4 audiences appear in Google Ads automatically if:

  1. GA4 is linked to Google Ads - Done in GA4 Admin > Product Links
  2. Audience has minimum size - Google Ads requires 1,000+ members (for Search campaigns)
  3. Audience is active - Status = "Active" in GA4

ILLIXIS creates the GA4 audience. Google handles the Ads sync.

Note: Display campaigns accept audiences under 1,000 members. Search campaigns do not.

Syncing to Meta

Meta Custom Audiences sync if:

  1. Meta connection is active - Go to Connectors, verify Meta Ads is authorized
  2. Meta Pixel is configured - Pixel ID saved in Meta connection settings
  3. Access token is valid - Tokens expire every 60 days (re-authorize if expired)

ILLIXIS creates Custom Audiences using the Meta Marketing API. Pixel events populate the audience with matching users.

Meta audience minimum: 100 members before ads can target it.

Syncing to LinkedIn

LinkedIn Matched Audiences sync if:

  1. LinkedIn connection is active - Go to Connectors, verify LinkedIn Ads is authorized
  2. LinkedIn Insight Tag is installed - Tag ID saved in LinkedIn connection settings
  3. Access token is valid - Tokens expire every 60 days (re-authorize if expanded)

ILLIXIS creates DMP Segments using the LinkedIn Marketing API. Insight Tag events populate the segment.

LinkedIn audience minimum: 300 members before ads can target it.

Troubleshooting

"No clusters created"

Cause: Content has no extracted topics.

Fix: Ensure articles are published and topics are extracted. Topics are auto-generated during content creation or content inventory sync.

Check any article in Content Hub → View Details → Scroll to "Extracted Topics" field.

"Cluster exists but no audience"

Cause: GA4/Meta/LinkedIn not connected, or sync failed.

Fix:

  1. Go to Connectors
  2. Verify GA4, Meta, or LinkedIn is authorized
  3. Check error messages in Advertising cluster details
  4. Click "Sync Audiences" to retry

"Audience synced to GA4 but not Google Ads"

Cause: GA4 and Google Ads are not linked in GA4 Admin.

Fix:

  1. Log into GA4 Admin
  2. Go to Product Links > Google Ads Links
  3. Link your Google Ads account
  4. Wait 24-48 hours for sync

ILLIXIS creates the GA4 audience correctly. The GA4-to-Ads link is external.

"Meta audience shows 0 size"

Cause: Meta Pixel not installed, or no visitors match the filter.

Fix:

  1. Verify Meta Pixel is installed on your site (use Meta Pixel Helper extension)
  2. Confirm Pixel ID matches the one in Connectors
  3. Wait 24 hours for Pixel to populate audience
  4. Check that page URLs match cluster filter rules

"LinkedIn audience creation failed"

Cause: Missing ad account, expired token, or insufficient permissions.

Fix:

  1. Go to Connectors > LinkedIn Ads
  2. Re-authorize with all required scopes: rw_ads, r_ads_reporting, rw_dmp_segments
  3. Verify ad account ID is saved
  4. Click "Sync Audiences" in Advertising

Performance

Clustering performance scales with content volume:

  • 100 articles: ~30 seconds
  • 500 articles: ~2 minutes
  • 1,000+ articles: ~5 minutes

Nightly batch job has a 5-minute timeout. If you have 1,000+ articles, clustering may timeout. Contact support to increase the limit.

Visualizing Clusters

Advertising dashboard shows:

  • Bubble chart - Cluster size and relationships
  • Cluster list - Sortable table with stats
  • Topic distribution - Top 10 topics across all clusters

Use this to identify:

  • Content gaps - Topics with only 1-2 articles (expand these)
  • Dominant themes - Topics appearing in many clusters (your core expertise)
  • Outliers - Content in "Miscellaneous" cluster (may need better topics)

Related Features

  • Content Hub - Source of clustered content
  • Advertising - Uses clusters for retargeting campaigns
  • GA4 Integration - Audiences sync here first
  • Meta Ads Integration - Custom Audiences for Facebook/Instagram
  • LinkedIn Ads Integration - DMP Segments for B2B retargeting

Best Practices

Write Clear Topics

Content must have clear topics for accurate clustering. If ILLIXIS generates generic topics like "content," "marketing," or "business," the clustering will be weak.

Good topics:

  • "Email segmentation strategies"
  • "Abandoned cart recovery"
  • "Product photography tips"

Bad topics:

  • "Content"
  • "Marketing"
  • "Tips"

Fix this by improving your content's theme clarity (clear headings, focused topics, specific keywords).

Monitor Cluster Quality

Check clusters monthly:

  1. View top 3 clusters (largest by content count)
  2. Read sample titles - Are they related?
  3. If unrelated content appears, those articles may need topic re-extraction

You cannot manually move content between clusters. The algorithm must recognize semantic similarity.

Use Clusters for Campaign Planning

Don't just retarget randomly. Plan campaigns by cluster:

  • Cluster: "Marathon Training" → Ad: "Join our 12-week marathon program"
  • Cluster: "Running Nutrition" → Ad: "Fuel your runs with our meal plans"
  • Cluster: "Injury Prevention" → Ad: "Download our free runner's stretching guide"

Match ad creative to cluster themes for higher CTR.

Let the Algorithm Work

Don't over-analyze individual clustering decisions. The algorithm optimizes for:

  1. Relevance - Similar content groups together
  2. Scalability - Handles 1,000+ articles efficiently
  3. Automation - Zero manual configuration

If a few articles seem misclustered, that's okay. Audience targeting is probabilistic. A 95% accuracy rate is sufficient.

FAQ

Q: Can I manually create clusters? A: No. Clusters are auto-generated based on topic similarity. Manual categorization defeats the purpose of semantic analysis.

Q: Can I rename clusters? A: No. Cluster names are generated from dominant topics. Renaming would disconnect the name from the content.

Q: Can I merge clusters? A: No. Merging is unnecessary. If two clusters should be one, improve topic extraction on those articles so the algorithm recognizes similarity.

Q: What happens to old content? A: Content published before clustering was enabled gets processed on the next nightly run. It's not excluded.

Q: Does this work with draft content? A: No. Only published content is clustered. Drafts are ignored.

Q: Can I exclude content from clustering? A: Not currently. All published content with topics gets clustered. If you don't want content in retargeting audiences, don't publish it.

Q: Does this affect my site's performance? A: No. Clustering runs as a background task. Your site is unaffected.

Q: What if I have 10,000 articles? A: Clustering is O(n*m) where n = unprocessed content, m = existing clusters. For large catalogs (1,000+ articles), contact support to optimize the batch job.

Q: Can I see clustering logs? A: Yes, if you have admin access. Each cluster creation and update is logged with similarity scores.

Q: Does this replace Google Analytics audiences? A: No. This creates GA4 audiences automatically. You can still create manual audiences in GA4 for specific use cases.

Q: What if I delete a cluster? A: Don't delete clusters manually. They'll be recreated on the next nightly run (content still exists). Instead, delete or unpublish the content.

Q: How often do audiences update? A: Clusters update nightly. GA4/Meta/LinkedIn audiences sync when cluster content changes. Audience membership updates happen in each platform (based on Pixel/Tag events).

Support

If clustering isn't working:

  1. Check Advertising dashboard for error messages
  2. Verify Connectors (GA4, Meta, LinkedIn) are authorized
  3. Confirm content has extracted topics (Content Hub > View Details)
  4. Review system logs (if admin access)
  5. Contact support with cluster ID and tenant ID

Ready to lose the stack?

One platform. You approve. ILLIXIS executes. Marketing that just happens.

Join the waitlistNo spam, everUnsubscribe anytime
First 20 founding members: 50% off any plan for your first year.

Marketing, Unstacked.