Skip to main content

DataFlow Release Notes v202410

Algorithm Templates

  • Provided algorithm templates, pre-defined a variety of algorithm templates, including data processing, data augmentation, and data generation, etc.
  • Support for user-defined algorithm templates, including the operations such as add, delete, search, and modify.
  • Support for data processing jobs running based on algorithm templates.

Job Management

  • Monitoring and management of job running status, including deletion operations.
  • Viewing data processing results for each operator within the job; displaying the number of data items processed by each operator.
  • Viewing data sample processing by each operator within the job, comparing the effects before and after processing.
  • Real-time monitoring of Pipeline running status and viewing Logs, etc.

Running Jobs

  • Provides a series of data processing operations (e.g., removing invalid data, format conversion, data screening, etc.).
  • More than 50 text data processing operators, including types such as Mapper, Filter, Deduplicator, etc.
  • Support for displaying operators and examples in the UI, facilitating users to directly define and run data workflows through the UI.
  • Provided Pipeline engine supports the parallel running of multiple jobs.

Product Integration

  • DataFlow integrated with CSGHub, unifying user login interface, complete dataset management, and data processing workflow.
  • Support for dataset version management, centralized data processing for all datasets in a specified version, generating new versions that can be applied to large model fine-tuning, pre-training, or RAG.