The user highlights a limitation where AI coding assistants like Copilot can only build individual code components, not entire data pipelines from start to finish. They request the ability to 'talk' directly with pipeline components, ask questions about their behavior, troubleshoot issues through dialogue, and iteratively improve the pipeline by requesting features (e.g., handling missing values, validating formats) through natural conversation.
Think of a data pipeline like a factory assembly line - it's made up of many smaller workstations (sub-functions), each handling a specific task using different tools and technologies. Just like how a car assembly line has stations for painting, welding, and installing parts, a data pipeline has components for extracting, transforming, validating, and loading data. - AI coding assistants like Cursor and GitHub Copilot are great for building individual pieces, but they can't yet handle the entire pipeline from start to finish. It's like having a smart assistant who can help you build each workstation perfectly, but can't design and coordinate the entire factory layout. Current capabilities include writing individual functions and components, and helping with specific coding tasks. However, designing complete end-to-end pipelines and managing complex inter-system dependencies remain challenging. The biggest breakthrough comes from being able to "talk" directly with your pipeline components. Instead of just writing code, you can now have conversations with your data processing functions. You can ask your pipeline components questions about their behavior, understand what's happening at each stage, and troubleshoot issues through dialogue rather than just reading logs. MCP combined with LLM create a powerful combination that lets you send test data and get instant feedback. You can ask questions like "Here's some sample data - show me what happens when it goes through your validation function" or "What would the output look like if I changed this parameter?" You can deep-dive into specific tasks by asking "Why did this transformation fail on these records?" or "What edge cases should I consider for this data type?" The system also allows you to iteratively improve your pipeline by requesting features like "Add a feature to handle missing values in this column" or "Update this function to also validate email formats." Instead of building data pipelines in isolation, you can now collaborate with AI throughout the development process - from writing individual components to testing, debugging, and enhancing the entire system through natural conversation.