• A Composable Message Parser for Chatbots

    A Composable Message Parser for Chatbots

    Chatbot code often starts as a handful of if statements:

    if (message.text.startsWith('/start')) {
      // ...
    } else if (message.text.startsWith('/help')) {
      // ...
    } else if (message.replyToBot) {
      // ...
    }

    That works until the bot grows permissions, group chats, replies, mentions, admin-only commands, feature flags, and fallback behavior. The problem is not that if is bad. The problem is that command detection, authorization, side effects, and event publishing become tangled in one place.

    A cleaner approach is to treat message handling as a small pipeline:

    • predicates decide whether a parser should run;
    • parsers turn a message into domain events;
    • the dispatcher publishes events and stops when a parser says the message is fully handled.

    This keeps each rule small enough to test.

    Core Types

    Here is a compact TypeScript version:

    type ChatType = 'private' | 'group';
    type Role = 'guest' | 'member' | 'admin' | 'owner';
    
    type Message = {
      id: string;
      text: string;
      chat: { type: ChatType };
      from: { id: string; role: Role };
      replyToBot?: boolean;
      mentions?: string[];
    };
    
    type BotEvent =
      | { type: 'ACCESS_DENIED'; userId: string }
      | { type: 'ASK_QUESTION'; userId: string; text: string }
      | { type: 'SHOW_HELP'; userId: string };
    
    type ParseCommand = {
      event: BotEvent;
      stop?: boolean;
    };
    
    type Predicate = (message: Message) => boolean;
    type Parser = {
      name: string;
      when: Predicate;
      parse: (message: Message) => Iterable<ParseCommand>;
    };

    The important split is when versus parse.

    when is a pure yes/no decision. parse produces one or more commands. The command contains the event and an optional stop flag that tells the dispatcher not to try later parsers.

    Processing a Message

    The processing function is a generator. It does not publish anything by itself; it only yields commands.

    function* processMessage(
      parsers: Parser[],
      message: Message,
    ): Generator<ParseCommand> {
      for (const parser of parsers) {
        try {
          if (!parser.when(message)) {
            continue;
          }
    
          yield* parser.parse(message);
        } catch (error) {
          console.error(`Parser failed: ${parser.name}`, error);
        }
      }
    }

    Publishing stays at the edge:

    type Publish = (event: BotEvent) => Promise<void>;
    
    const createMessageHandler =
      (parsers: Parser[], publish: Publish) => async (message: Message) => {
        for (const command of processMessage(parsers, message)) {
          await publish(command.event);
    
          if (command.stop) {
            break;
          }
        }
      };

    This boundary is useful. You can test parsing without a network, database, queue, or bot API client.

    Predicate Combinators

    Predicates should be boring. Small predicates are easy to compose:

    const and =
      (...predicates: Predicate[]): Predicate =>
      (message) =>
        predicates.every((predicate) => predicate(message));
    
    const or =
      (...predicates: Predicate[]): Predicate =>
      (message) =>
        predicates.some((predicate) => predicate(message));
    
    const not =
      (predicate: Predicate): Predicate =>
      (message) =>
        !predicate(message);

    Then define domain predicates:

    const inPrivateChat: Predicate = (message) => message.chat.type === 'private';
    
    const repliesToBot: Predicate = (message) => message.replyToBot === true;
    
    const mentionsBot =
      (botName: string): Predicate =>
      (message) =>
        message.mentions?.includes(botName) ?? false;
    
    const startsWithBotName =
      (botName: string): Predicate =>
      (message) =>
        message.text.trim().toLowerCase().startsWith(botName.toLowerCase());
    
    const hasCommonRights: Predicate = (message) =>
      ['member', 'admin', 'owner'].includes(message.from.role);
    
    const hasAdminRights: Predicate = (message) =>
      ['admin', 'owner'].includes(message.from.role);

    Now “addressed to the bot” is readable:

    const addressedToBot = or(
      inPrivateChat,
      repliesToBot,
      mentionsBot('botname'),
      startsWithBotName('botname'),
    );

    Cache Expensive Predicates

    Sometimes a predicate is not just a field check. It may look up a user, inspect a thread, or parse a command. If several parsers ask the same question about the same message object, cache the answer:

    const withCache = (predicate: Predicate): Predicate => {
      const cache = new WeakMap<Message, boolean>();
    
      return (message) => {
        if (cache.has(message)) {
          return cache.get(message)!;
        }
    
        const result = predicate(message);
        cache.set(message, result);
        return result;
      };
    };
    
    const addressedToBot$ = withCache(addressedToBot);
    const hasCommonRights$ = withCache(hasCommonRights);

    WeakMap is a good fit because the key is the message object itself. Once the message object is no longer used, the cache entry can be collected too.

    This is not a cross-request cache. It only helps when the same object flows through one processing pass.

    Parser Examples

    Parsers can be tiny generator functions:

    function* parseAccessDenied(message: Message): Generator<ParseCommand> {
      yield {
        event: { type: 'ACCESS_DENIED', userId: message.from.id },
        stop: true,
      };
    }
    
    function* parseHelp(message: Message): Generator<ParseCommand> {
      yield {
        event: { type: 'SHOW_HELP', userId: message.from.id },
        stop: true,
      };
    }
    
    function* parseQuestion(message: Message): Generator<ParseCommand> {
      const text = message.text.replace(/^botname/i, '').trim();
    
      yield {
        event: { type: 'ASK_QUESTION', userId: message.from.id, text },
        stop: true,
      };
    }

    Then the parser list becomes a routing table:

    const parsers: Parser[] = [
      {
        name: 'access-denied',
        when: and(addressedToBot$, not(hasCommonRights$)),
        parse: parseAccessDenied,
      },
      {
        name: 'help',
        when: and(addressedToBot$, hasCommonRights$, (message) =>
          message.text.includes('/help'),
        ),
        parse: parseHelp,
      },
      {
        name: 'question',
        when: and(addressedToBot$, hasCommonRights$),
        parse: parseQuestion,
      },
    ];

    Order matters. Put narrow, high-priority parsers first. Put broad fallback parsers last.

    Tests

    The predicate layer gives you cheap tests:

    const message: Message = {
      id: 'm1',
      text: 'botname explain queues',
      chat: { type: 'group' },
      from: { id: 'u1', role: 'member' },
    };
    
    test('or stops after the first true predicate', () => {
      const first = vi.fn(() => true);
      const second = vi.fn(() => true);
    
      expect(or(first, second)(message)).toBe(true);
      expect(first).toHaveBeenCalledWith(message);
      expect(second).not.toHaveBeenCalled();
    });
    
    test('processMessage stops when a command asks it to stop', async () => {
      const publish = vi.fn(async () => undefined);
      const handleMessage = createMessageHandler(parsers, publish);
    
      await handleMessage(message);
    
      expect(publish).toHaveBeenCalledWith({
        type: 'ASK_QUESTION',
        userId: 'u1',
        text: 'explain queues',
      });
    });

    The first test checks the combinator. The second checks the public behavior: given a message, which event is published?

    Practical Rules

    Keep predicates pure when possible. A predicate named hasAdminRights should not publish metrics, mutate the message, or write to storage.

    Keep random behavior out of predicates. If you want a bot to answer 5% of ambient group messages, inject the random decision at the edge and make it testable. Hidden randomness inside predicates makes routing hard to reproduce.

    Keep parser events domain-level. A parser should emit “ask a question” or “access denied”, not “call this Telegram endpoint”. Infrastructure belongs behind the publisher.

    This pattern is deliberately small. It gives you most of the benefit of a rules engine without turning the bot into a framework.