Skip to content
Snippets Groups Projects
README 3.91 KiB
Newer Older
              _                 __    _                
      _____  (_)___ ___________/ /_  (_)   _____  _____
     / ___/ / / __ `/ ___/ ___/ __ \/ / | / / _ \/ ___/
    / /__  / / /_/ / /  / /__/ / / / /| |/ /  __/ /    
    \___/_/ /\__,_/_/   \___/_/ /_/_/ |___/\___/_/     
       /___/                                           


DESCRIPTION

cjarchiver is a Python script that can be used to compress 
a directory including all its files and subdirectories.

-----------------------------------------------------------

PREREQUISITES

In order to use cjarchiver in sciCORE clusters we need to 
load the cjarchiver module:

ml cjarchiver

This will load the default version. If you need a specific
version you can search for it with:

ml spider cjarchiver

To archive a directory our current working directory should
be at the same level than the target directory. Additionally
it is mandatory that the target directory contains a metadata
file with JSON format. The user should create this file
following this format:

{
        "name": "NAME OF INVESTIGATOR",
        "email": "EMAIL OF INVESTIGATOR",
        "pi_name": "NAME OF PI",
        "pi_email": "EMAIL OF PI",
        "project": "INSERT PROJECT NAME HERE",
        "project_start_date": "YYYY-MM-DD",
        "project_end_date": "YYYY-MM-DD",
        "description": "INSERT PROJECT DESCRIPTION HERE MULTILINE IS NOT OK",
        "collaborators":[
                { "name": "COLLABORATOR NAME",
                  "email": "COLLABORATOR EMAIL"
                },
                { "name": "COLLABORATOR NAME",
                  "email": "COLLABORATOR EMAIL"
                }
        ],
        "comments": "ADDITIONAL COMMENTS (E.G. LEGAL REQUIREMENTS REGARDING DURATION OF DATA PRESERVATION, ETC...)"
}

-----------------------------------------------------------

USAGE

To execute cjarchiver:

cjarchiver <target_directory> [options]

If successful, this will generate four files with the name 
format <username>_YYYYMMDDThhmmss_<targetfoldername> and the 
following extensions:

.log      - with the outputs of the script.
.json     - copy of the ARCHIVE_METADATA.json file.
.manifest - with the full list of archived files including 
            permissions, ownership, size, date, and path.
.md5sum   - with the full list of archived files and their 
            corresponding path and MD5 checksum.
.tar.bz2  - compressed archive of the target directory.

The manifest and md5sum files are also automatically copied 
inside of the target directory and, therefore, included in 
the .tar.bz2 file.

After the creation of these files, cjarchiver renames the 
target directory as <targetdirectory>.toberemoved/
 
As its name indicates, <targetdirectory>.toberemoved/ can be 
deleted, but prior to that, we strongly recommend to check that 
the .tar.bz2 file has been created.

-----------------------------------------------------------

OPTIONS

-h,  --help:               Shows a help message and exits.
-x <subdirectory>,   
--exclude <subdirectory>:  The user can specify subdirectories 
                           to be excluded from archiving (only 
                           first level subdirectories names, not 
                           full path). It can be repeated for 
                           additional subdirectories.

-----------------------------------------------------------

EXAMPLES

Archive directory "old_data":
    cjarchive old_data            

Archive directory "old_data" but exclude "old_data/bad_exp"
    cjarchive old_data -x bad_exp

Archive directory "old_data" but exclude "old_data/bad_exp"
and "old_data/bad_data"
    cjarchive old_data -x bad_exp -x bad_data

-----------------------------------------------------------

KNOWN ISSUES

cjarchiver uses the find command to create the manifest and 
the md5sum files. It is known that find might fail when used 
through NFS to access remote directories. We recommend to use 
cjarchiver locally (i.e. directly where the target data is 
located).