Voice AI Agent

A modern voice-enabled AI agent built with Next.js that lets you have natural conversations with AI using speech recognition and text-to-speech capabilities.

Features

Voice Interaction: Speak to the AI and get audio responses
Real-time Speech Recognition: Convert speech to text using browser APIs
Text-to-Speech: AI responses are played back as audio
Tool Integration: Powered by Composio for advanced AI capabilities
Modern UI: Clean interface built with Tailwind CSS and Radix UI
Responsive Design: Works seamlessly across desktop and mobile devices

Tech Stack

Framework: Next.js 15.3.4 with App Router
Language: TypeScript
Styling: Tailwind CSS
UI Components: Radix UI + shadcn/ui
AI: OpenAI GPT models via LangChain
Tools: Composio Core for AI agent capabilities
Speech: Web Speech API for recognition and synthesis
State Management: Zustand
Animations: Framer Motion

Project Structure

voice-ai-agent/
├── app/                    # Next.js app directory
│   ├── api/               # API routes
│   │   ├── chat/          # Chat endpoint
│   │   └── tts/           # Text-to-speech endpoint
│   ├── globals.css        # Global styles
│   ├── layout.tsx         # Root layout
│   └── page.tsx           # Home page
├── components/            # React components
│   ├── ui/               # Base UI components
│   ├── chat-header.tsx   # Chat header component
│   ├── chat-input.tsx    # Message input component
│   ├── chat-interface.tsx # Main chat interface
│   ├── chat-messages.tsx # Messages display
│   └── settings-modal.tsx # Settings modal
├── hooks/                # Custom React hooks
│   ├── use-audio.ts      # Audio playback logic
│   ├── use-chat.ts       # Chat state management
│   ├── use-mounted.ts    # Mount detection
│   └── use-speech-recognition.ts # Speech recognition
├── lib/                  # Utility libraries
│   ├── validators/       # Input validation
│   ├── alias-store.ts    # State management
│   ├── constants.ts      # App constants
│   ├── error-handler.ts  # Error handling
│   └── utils.ts          # Helper functions
└── public/               # Static assets

Getting Started

Prerequisites

Node.js 18+
npm or yarn
OpenAI API key
Composio API key (optional)

Installation

Clone the repository:

git clone <repository-url>
cd voice-ai-agent

Install dependencies:

npm install

Set up environment variables:

cp .env.example .env.local

Add your API keys:

OPENAI_API_KEY=your_openai_api_key_here
COMPOSIO_API_KEY=your_composio_api_key_here

Run the development server:

npm run dev

Open http://localhost:3000 in your browser

Usage

Text Chat: Type messages in the input field and press Enter
Voice Chat: Click the microphone button to start voice input
Audio Responses: Toggle audio playback in the settings
Settings: Access configuration options via the settings modal

Development

Available Scripts

npm run dev          # Start development server with Turbopack
npm run build        # Build for production
npm run start        # Start production server
npm run lint         # Run ESLint

Key Components

ChatInterface: Main component orchestrating the chat experience
useChat: Manages chat state and API communication
useAudio: Handles text-to-speech functionality
useSpeechRecognition: Manages voice input with debouncing
API Routes: Handle chat processing and TTS generation

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

License

This project is licensed under the Apache License.