Back to blog
AIJan 20, 202410 min read

Building AI Bots with Persistent Memory for Discord & Telegram

Learn how to create intelligent chatbots that remember context across conversations using vector databases and LLMs.

Creating AI bots that remember conversations is a game-changer for user engagement. In this post, I'll walk you through how I built persistent memory into Discord and Telegram bots using vector databases and large language models.

The Problem with Stateless Bots

Most chatbots treat every message as a fresh start. They have no memory of previous interactions, which leads to frustrating experiences:

  • No context retention - Users have to repeat information constantly
  • Generic responses - The bot can't personalize based on history
  • Broken conversations - Multi-turn dialogues feel disconnected

The Solution: Vector Databases + LLMs

By combining vector databases (like Pinecone, Weaviate, or Chroma) with large language models, we can create bots that:

  • Remember user preferences and past conversations
  • Provide contextually relevant responses
  • Build genuine relationships with users over time
  • Reference previous discussions naturally

Architecture Overview

Here's how the system works:

code
User Message → Embedding Generation → Vector Search → Context Retrieval → LLM Response ↓ Vector Database (Long-term Memory)

Key Components

  1. Embedding Model - Converts text to vector representations
  2. Vector Database - Stores and indexes embeddings for fast similarity search
  3. LLM - Generates responses using retrieved context
  4. Memory Manager - Handles storage and retrieval logic

Implementation

1. Setting Up the Vector Store

code
import { Pinecone } from '@pinecone-database/pinecone'; import OpenAI from 'openai'; const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY }); const index = pinecone.index('bot-memory'); const openai = new OpenAI(); // Create embedding from text async function createEmbedding(text: string): Promise<number[]> { const response = await openai.embeddings.create({ model: 'text-embedding-3-small', input: text, }); return response.data[0].embedding; }

2. Storing Conversations

code
interface MemoryEntry { userId: string; message: string; role: 'user' | 'assistant'; timestamp: number; } async function storeMemory(entry: MemoryEntry): Promise<void> { const embedding = await createEmbedding(entry.message); await index.upsert([{ id: `${entry.userId}-${entry.timestamp}`, values: embedding, metadata: { userId: entry.userId, message: entry.message, role: entry.role, timestamp: entry.timestamp, }, }]); }

3. Retrieving Relevant Context

code
async function getRelevantMemories( userId: string, query: string, limit: number = 5 ): Promise<string[]> { const queryEmbedding = await createEmbedding(query); const results = await index.query({ vector: queryEmbedding, topK: limit, filter: { userId: { $eq: userId } }, includeMetadata: true, }); return results.matches .sort((a, b) => (b.metadata?.timestamp ?? 0) - (a.metadata?.timestamp ?? 0)) .map(match => `[${match.metadata?.role}]: ${match.metadata?.message}`); }

4. Generating Contextual Responses

code
async function generateResponse( userId: string, userMessage: string ): Promise<string> { // Store the user's message await storeMemory({ userId, message: userMessage, role: 'user', timestamp: Date.now(), }); // Retrieve relevant past conversations const memories = await getRelevantMemories(userId, userMessage); // Build the prompt with context const systemPrompt = `You are a helpful assistant with memory of past conversations. Here are relevant past interactions with this user: ${memories.join('\n')} Use this context to provide personalized, contextually aware responses.`; const response = await openai.chat.completions.create({ model: 'gpt-4-turbo-preview', messages: [ { role: 'system', content: systemPrompt }, { role: 'user', content: userMessage }, ], }); const assistantMessage = response.choices[0].message.content ?? ''; // Store the assistant's response await storeMemory({ userId, message: assistantMessage, role: 'assistant', timestamp: Date.now(), }); return assistantMessage; }

Discord Integration

code
import { Client, GatewayIntentBits } from 'discord.js'; const client = new Client({ intents: [ GatewayIntentBits.Guilds, GatewayIntentBits.GuildMessages, GatewayIntentBits.MessageContent, ], }); client.on('messageCreate', async (message) => { if (message.author.bot) return; if (!message.content.startsWith('!ask')) return; const query = message.content.slice(5).trim(); const userId = message.author.id; const response = await generateResponse(userId, query); await message.reply(response); }); client.login(process.env.DISCORD_TOKEN);

Results

After implementing this system:

  • User engagement increased 3x - Users had longer, more meaningful conversations
  • Conversation length doubled - The average session went from 3 to 6+ messages
  • User satisfaction improved - Feedback indicated users felt "heard" and "remembered"

Best Practices

  1. Limit context window - Don't overwhelm the LLM with too much history
  2. Implement forgetting - Allow users to clear their conversation history
  3. Handle rate limits - Queue embedding operations to avoid API throttling
  4. Monitor costs - Vector operations and LLM calls can add up quickly

Conclusion

Building AI bots with persistent memory transforms the user experience from transactional to relational. The combination of vector databases and LLMs provides a powerful foundation for creating truly intelligent conversational agents.

Want to build something similar? Get in touch!