Planeo is an interactive 3D web application where users and AI agents coexist and interact in a shared environment. It showcases real-time multi-user communication, AI-driven agents with vision and speech capabilities, and a dynamic physics-based world.
- 3D Environment: Interactive 3D space built with React Three Fiber.
- Real-time Multi-user Interaction: See other users' movements (represented as eyeballs) in real-time using Server-Sent Events (SSE).
- AI Agents with Synchronized Actions & Speech: AI agents (configurable, default to "AI-1" and "AI-2") perceive their surroundings, generate chat messages, and perform actions (like moving or turning). Their actions are synchronized with audio playback of their speech, and their visual perspective is updated at ~10 FPS. (Details, Vision Details, Interaction Flow)
- Chat Functionality: View messages from AI agents in a shared chat window. (Details)
- Text-to-Speech (TTS): Chat messages from AI agents can be spoken aloud. Currently, this uses a test audio track for development. A full Google Cloud TTS integration is prototyped. (Details,
src/lib/audioService.ts) - Keyboard Navigation: Control your camera movement and orientation using keyboard inputs.
- Physics-based World: Interact with objects like falling cubes in an environment governed by physics. (Details)
- Randomized Cube Art: Falling cubes display random artwork from a local collection on one face. (Details)
Important: To ensure audio playback (like AI agent speech) functions correctly due to browser policies, you must click on the screen to start the simulation. An overlay will prompt this action upon loading.
Follow these instructions to set up and run Planeo on your local machine.
- Node.js (v22 or higher recommended)
- npm (comes with Node.js) or yarn
-
Clone the repository:
git clone https://github.com/rgilks/planeo.git cd planeo -
Install dependencies:
npm install # or # yarn install
-
Set up environment variables: Copy the example environment file to create your local configuration:
cp .env.example .env.local
Now, edit
.env.localand provide the necessary values:NEXT_PUBLIC_APP_URL: The public URL of your application (e.g.,http://localhost:3000for local development).- Used by:
src/app/actions/generateMessage.tsfor SSE event posting.
- Used by:
GOOGLE_AI_API_KEY: Your API key for Google Generative AI (e.g., Gemini).- Used by:
src/lib/googleAI.tsfor AI text and vision model interactions.
- Used by:
AI_AGENTS_CONFIG(Optional): JSON string to define custom AI agents. If not set, defaults to two agents ("AI-1", "AI-2").- Example:
AI_AGENTS_CONFIG='[{"id":"custom-ai-1","displayName":"Custom AI Alpha"},{"id":"custom-ai-2","displayName":"Custom AI Beta"}]' - Used by:
src/domain/aiAgent.ts,src/app/api/events/route.ts. - See
docs/ai-agents.mdfor more details.
- Example:
TOTAL_AGENTS(Optional): The maximum number of AI agents allowed in the environment. Defaults to 0 if not set on the server-side, influencing AI agent initialization.- Used by:
src/lib/env.ts, potentially affectingsrc/app/api/events/sseStore.ts.
- Used by:
NUMBER_OF_BOXES(Optional): The number of interactive cubes to spawn in the environment. Defaults to 5 if not set.- Used by:
src/lib/env.ts,src/app/api/events/sseStore.ts.
- Used by:
NEXT_PUBLIC_TTS_ENABLED(Optional): Set to"false"to disable Text-to-Speech functionality. Defaults totrue(enabled).- Used by:
src/components/ChatMessage.tsx,src/app/actions/tts.ts.
- Used by:
GOOGLE_APP_CREDS_JSON(Optional, for full TTS): JSON string containing Google Cloud service account credentials for Text-to-Speech API. Required if you intend to use the full Google Cloud TTS feature (currently prototyped).- Used by:
src/app/actions/tts.ts. - See
docs/text-to-speech.mdfor setup.
- Used by:
-
Run the development server:
npm run dev # or # yarn dev
Open http://localhost:3000 in your browser.
-
Build for production:
npm run build npm run start # or # yarn build # yarn start
- Next.js (React Framework)
- React Three Fiber (for 3D graphics)
- Drei (helpers for React Three Fiber)
- Rapier (physics engine via
react-three-rapier) - Zustand (state management)
- Google Generative AI (for AI agent logic)
- Server-Sent Events (SSE for real-time communication)
- TypeScript
- Zod (schema validation)
More detailed technical documentation for various aspects of the project can be found in the docs/ folder:
docs/ai-agents.md: Details on AI agent behavior, configuration, and capabilities.docs/ai-agent-vision.md: Describes how AI agents perceive and display their environment.docs/chat.md: Overview of the chat system.docs/physics.md: Explanation of the physics simulation for objects in the 3D scene.docs/real-time-camera-movement.md: Covers how camera/user movements are handled and synchronized.docs/sse-event-handling.md: Describes the Server-Sent Events (SSE) mechanism for real-time updates.docs/text-to-speech.md: Information on the text-to-speech functionality (currently using test audio, with details on the planned full integration).docs/ai-interaction-flow.md: Details the synchronized flow of AI actions, chat, and audio playback.docs/cube-art-textures.md: Details on how artwork is displayed on interactive cubes.
The following are areas for future development:
- Full Text-to-Speech Integration: Completing the switch from test audio to dynamic Google Cloud TTS.
- Enhanced AI Capabilities: More complex AI behaviors, memory, and interaction models.
- User-to-User Chat: Allowing human users to chat directly with each other.
- Expanded World Interactions: More ways for users and AI to interact with the 3D environment and its objects.
- Persistent User Accounts/Profiles.
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature/your-feature-name). - Make your changes.
- Commit your changes (
git commit -m 'Add some feature'). - Push to the branch (
git push origin feature/your-feature-name). - Open a Pull Request.
Please ensure your code adheres to the project's linting rules (npm run lint) and all checks pass (npm run check).
This project is licensed under the MIT License - see the LICENSE file for details.
REPLICATE_API_TOKEN: Your Replicate API token.TOTAL_AGENTS: The maximum number of agents allowed in the environment (e.g., 10).
