REQUIREMENTS ARE BECOMING INCREASINGLY IMPORTANT
Essential for certain industries, particularly research centers, the demand for automatic data transfer engines between various storage systems is a pressing need.
- Organize data movement between different storage repository
- Maintain direct user access to different tier of data repositories which are active and/or archived
- Integration with workload managers for accelerating the efficiency of HPC/Supercomputing resources
- Provide a public API and SDK to facilitate integration with specific search applications
- Include end-users interfaces (HTLM GUI and bash client) to manage their owned data movement
Key Benefits of Automated Workflow
Researchers have the ability to transfer data to local parallel file systems for processing on the fastest HPC. Additionally, they can easily transfer data generated on supercomputers to cloud object storage, enabling researchers from around the globe to access it effortlessly.
Define tasks with a certain level of policy, such as Copy, Move. It allows organizations to better control their data.
- Monitoring and reporting
- Data Integrity
POLICY-BASED WORKFLOW MANAGER
Our robust task management capabilities empower users to have complete control and oversight over their data movement workflows. Users can effortlessly create and manage tasks for seamlessly migrating data between different storage tiers. They have the ability to efficiently handle large volumes of data and automate their workflows. By implementing filters, users can ensure that only the most relevant files are included in their data movement tasks, which greatly enhances the transfer process and minimizes the risk of errors or data loss. Our highly customizable platform allows users to create tailored filters that align with their specific needs. This includes setting up filters for specific file types, folders, or directories, as well as creating custom rules based on metadata or other criteria.
The workflow Manager incorporates a scheduler that enables the processing of various tasks initiated from different inputs. Execution can be automated, manual, or scheduled as needed.
Users can effortlessly manage and organize their data movement workflows with the filtering module, which allows for easy file filtering based on specific criteria. Choose between basic and advanced filtering options, with the advanced option providing an embedded scripting editor for added flexibility. Defining filters to automatically include or exclude files is a breeze, and users have a range of criteria to choose from, including file size, creation date, modification date, file type, and more. For more complex filtering needs, users can define filters that combine multiple criteria.
The priority management features in Nodeum ensure that higher-priority tasks do not monopolize all available resources, leaving other workflows waiting indefinitely.
Nodeum utilizes Quality of Service (QoS) techniques to prioritize data movement operations based on their priority level. This approach guarantees that critical demands receive the necessary resources while still providing fair access to non-critical demands. By assigning different priority levels to various tasks, Nodeum ensures that resources are allocated accordingly.
Additionally, Nodeum employs Fair Queuing techniques to distribute resources among different requests, ensuring that each request receives an equal share of the remaining resources. This method prevents any single data movement workflow from monopolizing all available resources, allowing all remaining workflows to have fair access.
In summary, the priority management and QoS techniques integrated into Nodeum guarantee efficient resource allocation and workflow management, eliminating resource contention and ensuring fair access to resources for all workflows.
This mechanism prevents overloading of task movement by monitoring the storage usage before transferring any files. Moreover, Nodeum ensures a smooth halt of the processing in case of any issues with the source or destination storage. Even if the source or destination becomes unreachable, the task is marked as "stop by system" and the task logs are updated with the root cause, error code, and the file logging is updated with the status of each file that was not processed correctly, along with an explanation for the reason.
Monitoring and reporting capabilities
The platform offers a centralized dashboard that empowers users to effortlessly monitor task progress and stay updated with real-time information about the status of their data. It enables users to effortlessly track the progress of their data movement workflows over time. Additionally, it provides comprehensive reporting and analytics, empowering users to gain valuable insights into their workflow performance and easily identify areas for optimization.
OPTIMIZE TO FILE MANAGEMENT PROCESSING
The solution is designed to optimize the file data management processing for different types of files.
Here are three types of concepts that help to understand:
- The solution streamlines the processing order,
- It utilizes a multi-threaded file batching technology where each task movement consists of three stages: preparation, execution, and finalization. Tasks are executed in parallel using a batch system that creates jobs. Each job contains a list of files, with the number of files in each job determined by a maximum limit or when the capacity of all files is reached.
- It includes specific queuing systems that enable the execution of different types of actions required during data movement processing.
The advantage is that time is not wasted in the discovery phase; files begin to be copied as soon as the first batch of data is received. The discovery process continues to run while the copying takes place.
Non-cryptographic hashes and cryptographic hashes
Data integrity is a critical aspect of any data management solution. Ensuring the accuracy, completeness, and unchanged nature of stored data is of utmost importance. This not only fosters user confidence but also shields organizations from potential liabilities.
The workflow manager serves its purpose through two main mechanisms: non-cryptographic hashes (such as xxHash) and cryptographic hashes (such as MD5).
MD5 hash algorithm is a commonly used function for validating data integrity. An MD5 checksum is a 32 digit hexadecimal number that represents the hash of the contents of a file. The calculation of an MD5 is an industry standard so the integrity can be checked on any system.
It is an Extremely fast Hash algorithm, the hash is faster to be generated. It is highly portable, and hashes are identical across all platforms (little / big endian). This is more and more popular in the video device, Nodeum uses the xxHash64be algorithm which is compatible with other product and software in the industries
HOOKS & CALLBACK
Nodeum's hook feature enables users to execute custom scripts or commands during specific events within each Data Movement process. These scripts or commands can automate additional tasks, integrate with other systems, or perform specific actions based on the event.
For instance, users can configure a hook to run a custom script after a data movement task is completed. This script can then perform additional actions like sending email notifications, updating metadata databases, or triggering events.
Hooks can be created through the Nodeum Console, the ND Client, or the RESTful API interface. The hook configuration allows users to define the triggering event, the script or command to run, and any parameters or variables to pass to it.
By leveraging Nodeum hooks, users can significantly enhance the automation and integration capabilities of the solution, enabling them to customize workflows and extend the platform's functionality to meet their specific requirements.
" The multifunctional Data Mover NODEUM® simplifies the data transfer between cloud object stores with a Swift interface and fast parallel file systems with a POSIX interface."