A new venture, Moonbounce, has emerged on the digital safety landscape, securing $12 million in funding to advance its innovative approach to content moderation. Co-led by Amplify Partners and StepStone Group, this significant investment underscores a growing industry recognition that traditional methods of policing online content are no longer adequate, especially with the rapid proliferation of generative artificial intelligence. At the helm of Moonbounce is Brett Levenson, a former business integrity leader at Facebook, whose experiences navigating the complex aftermath of the Cambridge Analytica scandal profoundly shaped his understanding of digital platforms’ inherent vulnerabilities and the urgent need for a more robust, scalable solution.
The Genesis of a Solution: Lessons from Social Media’s Front Lines
Levenson’s tenure at Facebook, beginning in 2019, placed him squarely within an organization grappling with immense public scrutiny over data privacy, misinformation, and harmful content. The Cambridge Analytica controversy, which exposed the misuse of user data for political profiling, served as a stark wake-up call for the entire social media industry, highlighting systemic failures in platform governance and content oversight. Levenson initially believed that technological enhancements alone could remedy Facebook’s pervasive content moderation issues. However, he quickly confronted a reality far more intricate than mere technical shortcomings.
The operational core of content moderation at the time often involved human reviewers, tasked with the unenviable job of sifting through vast quantities of flagged material. These individuals were frequently expected to internalize and apply a dense, multi-page policy document, which had often been hastily machine-translated into their native languages, introducing potential ambiguities. With a mere 30 seconds allotted per piece of content, reviewers were pressed to make critical decisions: whether content violated rules, and subsequently, the appropriate action—blocking, user bans, or limiting spread. Levenson observed that the accuracy of these rapid assessments was barely better than random chance, often resulting in enforcement decisions that were, in his words, "slightly better than 50% accurate." This reactive, often delayed, and imprecise system meant that harmful content could persist online for extended periods, amplifying its potential negative impact before any corrective measures were taken. The sheer scale of user-generated content, combined with the human element’s limitations, created an almost insurmountable challenge for platforms striving for a safe online environment.
The Evolving Landscape of Digital Harm
The internet’s early days presented a relatively straightforward moderation challenge, primarily focused on spam and basic inappropriate content, often handled by volunteer moderators or forum administrators. As social media platforms exploded in popularity during the 2000s and 2010s, the volume and complexity of user-generated content escalated exponentially. This period saw a rise in sophisticated adversarial tactics, including coordinated disinformation campaigns, hate speech, cyberbullying, and the rapid spread of viral harmful content. Platforms struggled to keep pace, leading to a reactive cycle of crisis management rather than proactive prevention. The "human cost" of moderation also became a significant concern, with reports detailing the psychological toll on content reviewers exposed to graphic and disturbing material daily.
The advent of generative artificial intelligence, particularly large language models (LLMs) and advanced image generators, has introduced an entirely new dimension to this long-standing problem. While offering unprecedented creative and functional possibilities, these technologies also possess the capacity to generate harmful content at an unprecedented scale and sophistication. Incidents such as AI chatbots providing self-harm guidance to vulnerable teenagers or generating non-consensual deepfake imagery have underscored the urgent need for robust safety protocols. The ability of AI to mimic human communication and creativity makes detecting and mitigating these new forms of harm far more challenging than filtering keywords or identifying simple visual patterns. This technological leap has magnified the existing weaknesses in content moderation, pushing the industry toward a critical inflection point where traditional approaches are simply unsustainable.
Moonbounce’s Innovative Approach: Policy as Code in Action
Levenson’s deep-seated frustration with the limitations of existing moderation frameworks led him to conceptualize "policy as code"—a paradigm shift from static, often ambiguous policy documents to dynamic, executable logic. This vision forms the bedrock of Moonbounce, a company engineered to provide an essential safety layer wherever digital content originates, whether from human users or advanced AI systems.
Moonbounce’s operational model leverages a proprietary large language model, specifically trained to interpret and operationalize a customer’s unique policy documents. This allows the system to evaluate content in real-time, often within an impressive 300 milliseconds, and trigger predefined actions. Depending on a customer’s preferences, these actions can range from subtly slowing down the distribution of potentially risky content for human review, to immediately blocking high-risk material. This swift, automated enforcement significantly reduces the window of opportunity for harmful content to spread, moving platforms from a reactive cleanup model to a proactive prevention strategy.
The concept of "policy as code" is transformative because it imbues moderation rules with the same precision and consistency as software algorithms. Instead of human interpretation, which is prone to variability and cognitive load, policies become machine-readable directives that can be executed with speed and accuracy across massive datasets. This approach enables a level of consistency and scalability that is virtually impossible with human-centric or basic keyword-based systems. Moonbounce currently serves three primary verticals: platforms dealing with high volumes of user-generated content, such as dating applications; AI companies developing character-based or companion applications; and providers of AI image generation tools. The company’s technology is already supporting over 40 million daily reviews and catering to more than 100 million daily active users across various platforms, including AI companion startup Channel AI, image and video generation firm Civitai, and character roleplay platforms Dippy AI and Moescape.
Safety as a Differentiator: Market and Industry Impact
In an increasingly competitive digital landscape, Moonbounce posits that robust safety features can transition from being a burdensome compliance necessity to a compelling product benefit. Levenson articulates a vision where safety is not an afterthought but an integral component of a product’s value proposition, a strategic differentiator that enhances user trust and engagement. This perspective is gaining traction within the industry, as evidenced by major platforms like Tinder, whose head of trust and safety recently detailed how LLM-powered services have achieved a tenfold improvement in detection accuracy for their platform. Such enhancements not only protect users but also bolster a brand’s reputation and foster a healthier digital ecosystem.
The growing market for specialized trust and safety solutions reflects a broader societal demand for safer online environments. Investors like Lenny Pruss, General Partner at Amplify Partners, emphasize the critical need for real-time, objective guardrails in an era dominated by AI-mediated applications. Pruss notes that while content moderation has always challenged online platforms, the ubiquity of LLMs amplifies this challenge exponentially. The investment in Moonbounce reflects a belief that such objective, programmatic safety mechanisms are essential for the foundational integrity of future digital interactions. This shift highlights a proactive industry response to mounting legal and reputational pressures. AI companies, in particular, face heightened scrutiny and potential liability following incidents involving their technologies generating harmful or illicit content. Consequently, many are now actively seeking external expertise to bolster their safety infrastructure, recognizing that in-house solutions may not suffice. Moonbounce’s position as a third-party intermediary offers a distinct advantage, allowing it to focus solely on policy enforcement without being overwhelmed by the conversational context that often bogs down internal chatbot systems.
Beyond Blocking: Towards Nuanced Intervention
Looking ahead, Moonbounce is developing capabilities that transcend simple content blocking or removal. One such innovation is "iterative steering," a sophisticated approach designed to address complex and sensitive interactions, particularly those involving vulnerable users. This initiative comes in response to tragic incidents, such as the 2024 case involving a 14-year-old who reportedly became obsessed with an AI chatbot, leading to a lawsuit against the character AI platform.
Instead of an immediate, blunt refusal or termination of conversation when potentially harmful topics arise, iterative steering aims to dynamically intercept and redirect the dialogue. This involves modifying user prompts in real-time to guide the chatbot toward delivering a more actively supportive and constructive response. For instance, if a user expresses suicidal ideation, the system would intervene to steer the chatbot away from any potentially harmful or unhelpful responses, instead guiding it to offer empathetic listening combined with helpful resources or crisis intervention suggestions. This nuanced approach moves beyond binary judgment calls, seeking to foster a more positive and therapeutic interaction while still upholding safety policies. It represents a significant step towards creating AI systems that are not just safe, but also genuinely beneficial and supportive, especially in delicate situations.
The Future of Digital Governance and Moonbounce’s Role
The emergence of specialized third-party solutions like Moonbounce signals a maturing phase in digital governance. By offering independent, expert-driven safety infrastructure, these companies can help standardize best practices across various platforms and AI applications, fostering a more secure and trustworthy digital landscape. This independence also potentially mitigates concerns about proprietary restrictions or biases that might arise if a major platform developed and controlled the leading moderation technology.
Levenson, alongside co-founder Ash Bhardwaj, a former Apple colleague with extensive experience in large-scale cloud and AI infrastructure, envisions a future where robust safety mechanisms are universally accessible and adaptable. While acknowledging the potential for acquisition by a tech giant like Meta—a full-circle moment given his past—Levenson expresses a clear desire to prevent the technology from being restricted or monopolized. His primary concern is that a proprietary acquisition could limit the broader public benefit of Moonbounce’s innovations, hindering their widespread adoption across the diverse ecosystem of AI-driven applications. This stance reflects a broader aspiration within the burgeoning AI safety sector: to democratize access to advanced moderation tools, ensuring that all platforms, regardless of their size or resources, can contribute to building a safer, more responsible digital future. As generative AI continues its inexorable march into every facet of digital life, the imperative for effective, scalable, and ethically sound content moderation will only grow, making solutions like Moonbounce critically important for the sustained health and trustworthiness of our online world.








