Building AI Bots with Persistent Memory for Discord & Telegram
Learn how to create intelligent chatbots that remember context across conversations using vector databases and LLMs.
Creating AI bots that remember conversations is a game-changer for user engagement. In this post, I'll walk you through how I built persistent memory into Discord and Telegram bots using vector databases and large language models.
The Problem with Stateless Bots
Most chatbots treat every message as a fresh start. They have no memory of previous interactions, which leads to frustrating experiences:
- •No context retention - Users have to repeat information constantly
- •Generic responses - The bot can't personalize based on history
- •Broken conversations - Multi-turn dialogues feel disconnected
The Solution: Vector Databases + LLMs
By combining vector databases (like Pinecone, Weaviate, or Chroma) with large language models, we can create bots that:
- •Remember user preferences and past conversations
- •Provide contextually relevant responses
- •Build genuine relationships with users over time
- •Reference previous discussions naturally
Architecture Overview
Here's how the system works:
User Message → Embedding Generation → Vector Search → Context Retrieval → LLM Response
↓
Vector Database
(Long-term Memory)
Key Components
- •Embedding Model - Converts text to vector representations
- •Vector Database - Stores and indexes embeddings for fast similarity search
- •LLM - Generates responses using retrieved context
- •Memory Manager - Handles storage and retrieval logic
Implementation
1. Setting Up the Vector Store
import { Pinecone } from '@pinecone-database/pinecone';
import OpenAI from 'openai';
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pinecone.index('bot-memory');
const openai = new OpenAI();
// Create embedding from text
async function createEmbedding(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
});
return response.data[0].embedding;
}2. Storing Conversations
interface MemoryEntry {
userId: string;
message: string;
role: 'user' | 'assistant';
timestamp: number;
}
async function storeMemory(entry: MemoryEntry): Promise<void> {
const embedding = await createEmbedding(entry.message);
await index.upsert([{
id: `${entry.userId}-${entry.timestamp}`,
values: embedding,
metadata: {
userId: entry.userId,
message: entry.message,
role: entry.role,
timestamp: entry.timestamp,
},
}]);
}3. Retrieving Relevant Context
async function getRelevantMemories(
userId: string,
query: string,
limit: number = 5
): Promise<string[]> {
const queryEmbedding = await createEmbedding(query);
const results = await index.query({
vector: queryEmbedding,
topK: limit,
filter: { userId: { $eq: userId } },
includeMetadata: true,
});
return results.matches
.sort((a, b) => (b.metadata?.timestamp ?? 0) - (a.metadata?.timestamp ?? 0))
.map(match => `[${match.metadata?.role}]: ${match.metadata?.message}`);
}4. Generating Contextual Responses
async function generateResponse(
userId: string,
userMessage: string
): Promise<string> {
// Store the user's message
await storeMemory({
userId,
message: userMessage,
role: 'user',
timestamp: Date.now(),
});
// Retrieve relevant past conversations
const memories = await getRelevantMemories(userId, userMessage);
// Build the prompt with context
const systemPrompt = `You are a helpful assistant with memory of past conversations.
Here are relevant past interactions with this user:
${memories.join('\n')}
Use this context to provide personalized, contextually aware responses.`;
const response = await openai.chat.completions.create({
model: 'gpt-4-turbo-preview',
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userMessage },
],
});
const assistantMessage = response.choices[0].message.content ?? '';
// Store the assistant's response
await storeMemory({
userId,
message: assistantMessage,
role: 'assistant',
timestamp: Date.now(),
});
return assistantMessage;
}Discord Integration
import { Client, GatewayIntentBits } from 'discord.js';
const client = new Client({
intents: [
GatewayIntentBits.Guilds,
GatewayIntentBits.GuildMessages,
GatewayIntentBits.MessageContent,
],
});
client.on('messageCreate', async (message) => {
if (message.author.bot) return;
if (!message.content.startsWith('!ask')) return;
const query = message.content.slice(5).trim();
const userId = message.author.id;
const response = await generateResponse(userId, query);
await message.reply(response);
});
client.login(process.env.DISCORD_TOKEN);Results
After implementing this system:
- •User engagement increased 3x - Users had longer, more meaningful conversations
- •Conversation length doubled - The average session went from 3 to 6+ messages
- •User satisfaction improved - Feedback indicated users felt "heard" and "remembered"
Best Practices
- •Limit context window - Don't overwhelm the LLM with too much history
- •Implement forgetting - Allow users to clear their conversation history
- •Handle rate limits - Queue embedding operations to avoid API throttling
- •Monitor costs - Vector operations and LLM calls can add up quickly
Conclusion
Building AI bots with persistent memory transforms the user experience from transactional to relational. The combination of vector databases and LLMs provides a powerful foundation for creating truly intelligent conversational agents.
Want to build something similar? Get in touch!