版本公告 DataFlow Release Notes v202410 On this page
DataFlow Release Notes v202410 Algorithm Templates Provided algorithm templates, pre-defined a variety of algorithm templates, including data processing, data augmentation, and data generation, etc. Support for user-defined algorithm templates, including the operations such as add, delete, search, and modify. Support for data processing jobs running based on algorithm templates. Job Management Monitoring and management of job running status, including deletion operations. Viewing data processing results for each operator within the job; displaying the number of data items processed by each operator. Viewing data sample processing by each operator within the job, comparing the effects before and after processing. Real-time monitoring of Pipeline running status and viewing Logs, etc. Running Jobs Provides a series of data processing operations (e.g., removing invalid data, format conversion, data screening, etc.). More than 50 text data processing operators, including types such as Mapper, Filter, Deduplicator, etc. Support for displaying operators and examples in the UI, facilitating users to directly define and run data workflows through the UI. Provided Pipeline engine supports the parallel running of multiple jobs. Product Integration DataFlow integrated with CSGHub, unifying user login interface, complete dataset management, and data processing workflow. Support for dataset version management, centralized data processing for all datasets in a specified version, generating new versions that can be applied to large model fine-tuning, pre-training, or RAG.