AIJan 20, 2024•10 min read

Building AI Bots with Persistent Memory for Discord & Telegram

Learn how to create intelligent chatbots that remember context across conversations using vector databases and LLMs.

Creating AI bots that remember conversations is a game-changer for user engagement. In this post, I'll walk you through how I built persistent memory into Discord and Telegram bots using vector databases and large language models.

The Problem with Stateless Bots

Most chatbots treat every message as a fresh start. They have no memory of previous interactions, which leads to frustrating experiences:

•No context retention - Users have to repeat information constantly
•Generic responses - The bot can't personalize based on history
•Broken conversations - Multi-turn dialogues feel disconnected

The Solution: Vector Databases + LLMs

By combining vector databases (like Pinecone, Weaviate, or Chroma) with large language models, we can create bots that:

•Remember user preferences and past conversations
•Provide contextually relevant responses
•Build genuine relationships with users over time
•Reference previous discussions naturally

Architecture Overview

Here's how the system works:

code

User Message → Embedding Generation → Vector Search → Context Retrieval → LLM Response
                     ↓
              Vector Database
              (Long-term Memory)

Key Components

•Embedding Model - Converts text to vector representations
•Vector Database - Stores and indexes embeddings for fast similarity search
•LLM - Generates responses using retrieved context
•Memory Manager - Handles storage and retrieval logic

Implementation

1. Setting Up the Vector Store

code

import { Pinecone } from '@pinecone-database/pinecone';
import OpenAI from 'openai';

const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pinecone.index('bot-memory');
const openai = new OpenAI();

// Create embedding from text
async function createEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: text,
  });
  return response.data[0].embedding;
}

2. Storing Conversations

code

interface MemoryEntry {
  userId: string;
  message: string;
  role: 'user' | 'assistant';
  timestamp: number;
}

async function storeMemory(entry: MemoryEntry): Promise<void> {
  const embedding = await createEmbedding(entry.message);

  await index.upsert([{
    id: `${entry.userId}-${entry.timestamp}`,
    values: embedding,
    metadata: {
      userId: entry.userId,
      message: entry.message,
      role: entry.role,
      timestamp: entry.timestamp,
    },
  }]);
}

3. Retrieving Relevant Context

code

async function getRelevantMemories(
  userId: string,
  query: string,
  limit: number = 5
): Promise<string[]> {
  const queryEmbedding = await createEmbedding(query);

  const results = await index.query({
    vector: queryEmbedding,
    topK: limit,
    filter: { userId: { $eq: userId } },
    includeMetadata: true,
  });

  return results.matches
    .sort((a, b) => (b.metadata?.timestamp ?? 0) - (a.metadata?.timestamp ?? 0))
    .map(match => `[${match.metadata?.role}]: ${match.metadata?.message}`);
}

4. Generating Contextual Responses

code

async function generateResponse(
  userId: string,
  userMessage: string
): Promise<string> {
  // Store the user's message
  await storeMemory({
    userId,
    message: userMessage,
    role: 'user',
    timestamp: Date.now(),
  });

  // Retrieve relevant past conversations
  const memories = await getRelevantMemories(userId, userMessage);

  // Build the prompt with context
  const systemPrompt = `You are a helpful assistant with memory of past conversations.
Here are relevant past interactions with this user:
${memories.join('\n')}

Use this context to provide personalized, contextually aware responses.`;

  const response = await openai.chat.completions.create({
    model: 'gpt-4-turbo-preview',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: userMessage },
    ],
  });

  const assistantMessage = response.choices[0].message.content ?? '';

  // Store the assistant's response
  await storeMemory({
    userId,
    message: assistantMessage,
    role: 'assistant',
    timestamp: Date.now(),
  });

  return assistantMessage;
}

Discord Integration

code

import { Client, GatewayIntentBits } from 'discord.js';

const client = new Client({
  intents: [
    GatewayIntentBits.Guilds,
    GatewayIntentBits.GuildMessages,
    GatewayIntentBits.MessageContent,
  ],
});

client.on('messageCreate', async (message) => {
  if (message.author.bot) return;
  if (!message.content.startsWith('!ask')) return;

  const query = message.content.slice(5).trim();
  const userId = message.author.id;

  const response = await generateResponse(userId, query);
  await message.reply(response);
});

client.login(process.env.DISCORD_TOKEN);

Results

After implementing this system:

•User engagement increased 3x - Users had longer, more meaningful conversations
•Conversation length doubled - The average session went from 3 to 6+ messages
•User satisfaction improved - Feedback indicated users felt "heard" and "remembered"

Best Practices

•Limit context window - Don't overwhelm the LLM with too much history
•Implement forgetting - Allow users to clear their conversation history
•Handle rate limits - Queue embedding operations to avoid API throttling
•Monitor costs - Vector operations and LLM calls can add up quickly

Conclusion

Building AI bots with persistent memory transforms the user experience from transactional to relational. The combination of vector databases and LLMs provides a powerful foundation for creating truly intelligent conversational agents.

Want to build something similar? Get in touch!