Skip to main content

Moderation Agents: Custom AI-Powered Comment Moderation

Learn how to create, test, and manage custom Moderation Agents that automatically moderate brand-specific content on Facebook. Define your own moderation rules, train agents with real examples, and protect your community from content that matters to you.

Filip Strycko avatar
Written by Filip Strycko
Updated over 3 weeks ago

What Are Moderation Agents?

Moderation Agents are custom AI-powered tools that let you define specific moderation rules beyond TrollWall's standard detection. While TrollWall automatically handles hate speech, spam, and other violations, Moderation Agents allow you to moderate content specific to your brand, community, or industry.

Key capabilities:

  • Define custom moderation topics with precise criteria

  • Train agents using real examples from your community

  • Understand context, typos, and misspellings automatically

  • Test agent performance before activation

  • Modify rules as your needs evolve

Current availability: Moderation Agents work exclusively with Facebook accounts.

Need access? If you don't see "Moderation Agents" in your menu, contact your sales account manager or use the in-app chat (bottom-right corner).


Creating New Moderation Agent

Step 1: Start Creation Process

  1. Navigate to Moderation Agents in the side panel

  2. Click "Create" to begin setup

  3. The agent will guide you through configuration via conversation

Step 2: Define Your Moderation Topic

The agent asks questions to understand what you want to moderate. Be as specific and detailed as possible.

The agent will also ask about edge cases and clarifications to better understand the scope of your moderation needs. Answer these follow-up questions to help the agent be as precise as possible.

Once you've provided sufficient context, click "Next" to proceed to comment sorting.

Step 3: Sort Comment Examples

The agent presents real comments from your social accounts—some matching your moderation criteria, others borderline cases. Your task is to classify each comment as Allowed or Hidden.

Requirements:

  • Minimum 10 examples marked as Allowed

  • Minimum 10 examples marked as Hidden

Important: These examples teach the agent the boundaries of your moderation policy. Take time to classify them accurately.

Step 4: Assign to Social Accounts

  1. Select which Facebook account(s) should use this agent

  2. Click "Save" to complete setup

Your agent is now created and ready to use.


Testing Moderation Agent

Before activating your agent, test its performance to ensure it behaves as expected.

Running Tests

  1. Go to Moderation Agents and click "Test" next to your agent

  2. Add test samples—example comments you want to evaluate

  3. For each sample, specify whether it should be Hidden or Allowed

  4. Click "Run Test" to see results

Testing Tip: Use new examples that weren't part of your agent's training. Testing with training examples will always show correct classification, which doesn't validate actual performance.

Understanding Results

After testing, you'll see:

  • Whether each sample was classified correctly (matches your expected result)

  • The agent's reasoning explaining its decision

Use the reasoning to:

  • Understand how the agent interprets different content

  • Identify gaps in your moderation rules

  • Refine your agent's configuration through additional training


Managing Moderation Agents

Agent Overview

The Moderation Agents screen displays all your agents with these options:

Create – Set up a new agent Settings – Edit agent configuration Test – Validate agent performance Start/Pause – Control when the agent moderates Delete – Remove an agent permanently

Editing Agent Configuration

Click "Settings" next to any agent to modify its setup:

Moderation Rules Tab

  • View current moderation rules generated from your configuration

  • Continue the conversation with your agent to refine rules

  • Add clarifications about recent developments or edge cases

Comments Tab

  • Review all example comments classified as Allowed or Hidden

  • Includes examples from initial setup plus any corrections made via the Train function in the Comments section

  • Change classification or remove examples as needed

Social Accounts Tab

  • Select which Facebook accounts use this agent

  • Modify account assignments without recreating the agent

Critical: After making any changes, click "Save" to retrain the agent with updated information. Changes take effect only after retraining completes.

Starting and Pausing Agents

Start – Activate the agent to begin moderating comments on assigned accounts Pause – Temporarily disable moderation without deleting the agent

Remember: Paused agents don't moderate new comments, but their configuration remains saved for future use.


Best Practices

Creating Effective Agents

  • Provide detailed descriptions of what should be moderated during the initial setup conversation

  • Use real examples from your community when training the agent

  • Include both obvious violations and borderline cases to help the agent understand boundaries

  • Test thoroughly using the Test section before activating your agent

  • One topic per agent – Create separate agents for different moderation topics. Agents perform best when specialized on a single, specific topic rather than handling multiple unrelated issues

Important: If you need to moderate multiple topics, create a separate agent for each topic rather than combining them into one.

Maintaining Agent Performance

  • Correct classifications in the Comments section using "Hide + Train" when you find misclassified comments

  • Add new examples when you encounter edge cases not covered by your initial training

  • Test after major changes to verify your configuration updates work as expected

Troubleshooting Common Issues

Agent hides too many or too few comments:

  1. Make corrections directly – For any wrongly classified comment, hide or unhide it and select which agent should learn from this correction

  2. Refine your moderation description – Make your topic definition more specific in the agent settings

  3. Analyze reasoning – Add misclassified comments as test samples in the Test section to see the agent's reasoning. Use this information to adjust your configuration

  4. Continue the conversation – Go to Settings and talk with your agent to clarify boundaries and explain specific scenarios

Tip: When the agent makes mistakes, the reasoning provided in test results often reveals what needs clarification in your moderation rules.


Getting Help

If you need assistance with Moderation Agents:

  • Use the in-app chat (bottom-right corner) for immediate support

  • Contact your TrollWall account manager for strategic guidance

  • Provide specific examples and test results when reporting issues

For technical issues, include:

  • Screenshots of agent settings

  • Examples of misclassified comments

  • Test results showing unexpected behavior

Did this answer your question?