Silly Hacks 2026

doomscRL

Training RL agents to optimize TikTok scrolling patterns for maximal brainrot, using Meta's TRIBEv2 brain-response model.

DemoGitHub

✺✺✺

The premise

Classic neuroscience experiment setup: monkey with recording electrode watching a stimulus screen, with juice reward mechanism — except we're doing this to ourselves, voluntarily, for free

Social media companies spend billions engineering recommendation algorithms that optimize for doomscroll-time.

But...

What if we want to optimize for doomscroll-intensity instead?

We can leverage Reinforcement Learning to train an agent to optimize the scrolling pattern with the explicit objective of maximizing predicted cortical activation across brain surface vertices during a tiktok session.

Bye bye manual scrolling.

No thanks i use ai

✺✺✺

How?

1

TikTok Videos

878 videos selected from the TikTok-10M dataset

Clustered into 9 groups by similarity

2

Brain Model

TRIBE v2 FmriEncoder predicts cortical response.

20,484 surface vertices

3

RL Scrolling Agent

PPO learns when to scroll and what video cluster to select next

Reward = avg activation + delta

We simulate how each moment in a TikTok session would activate the brain with a pretrained FmriEncoder, derive a heuristic for dopamine usage to use as a reward signal, and then train an RL agent to discover the scrolling behavior that produces the highest overall activation.

Gem reward visualization
Coal reward visualization

Architecture

Full RL data flow from raw video, through the FmriEncoder transformer, to brain activation predictions, reward computation, and PPO agent actions: scrolling and next video selection.

Feature Extraction (Preprocessing)

  • V-JEPA 2 — 8 layer groups × 1,280-dim visual features
  • Wav2Vec-BERT 2.0 — 9 layer groups × 1,024-dim audio features
  • Concatenated per-modality, projected into shared 1,152-dim space

Brain Model (runs during RL)

  • FmriEncoder: 8 transformer blocks with RoPE, ScaleNorm, scaled residuals
  • Low-rank projection head → 2,048 dims before final prediction
  • Subject-averaged readout layer → 20,484 cortical surface vertices (fsaverage5)

Reward Signal

  • DopamineReward: α × mean(|activation|) + β × ‖Δactivation‖
  • CortisolReward variant: region-weighted activation (anterior-ventral 2.2×)
  • Optional switch penalty + minimum dwell time enforcement

Trained variants

We explore three agent configurations with different RL setups and reward functions.

aScrollSelect

select_baseline

Baseline select policy — PPO + MLP, no penalties, dopamine reward

bScrollSelect

select_recurrent_lstm

LSTM policy with brain region observation, has richer context

cScrollSelect

select_cortisol

Cortisol reward: weights anterior-ventral regions 2.2× for stress-like activation

Training was run on 5x RTX PRO 4500 instances for about 12 hours. About $30.

Training runs on RunPod GPU instances
No thanks i use ai

✺✺✺

Training Results

They actually learn! Kinda. Here's the metrics from training the agents on 878 TikTok videos, each for 500K timesteps.

They basically learned to reward hack the env by scrolling as fast as possible (2Hz, the brain models frequency), which conveniently seems to fry the brain the most. They learn this behavior even when they have explicit penalties for fast scrolling. Technically speaking, this is what we asked for though.

Training metrics showing the agents overfitting

Charts load as you scroll.

✺✺✺

Demo

Loading real agent rollouts...

✺✺✺

What have we done?

how it feels to manually scroll

how it feels when ai scrolls for you

We have created the optimal doomscroller, applying literal state-of-the-art technology to the problem of maximizing dopamine consumption. This could counter-intuitively save you time and energy. Possibly. Use the model responsibly.

Pure brainrot
View on GitHubSilly Hacks 2026