Data download#

Many examples in the rest of the tutorials will use data from the Dartbrains course and other sample data. This tutorial describes how to download these data if you want to be able to run the code yourself.

Install datalad#

In a Terminal window (or however you prefer interacting with your shell) install required software to download data through datalad. Below we install git-annex through Homebrew and datalad through pip

> brew install git-annex

==> Caveats
==> git-annex
To start git-annex now and restart at login:
  brew services start git-annex
Or, if you don't want/need a background service you can just run:
  /opt/homebrew/opt/git-annex/bin/git-annex assistant --autostart
> pip install datalad

Create a data directory and navigate into it. This will be where you will tell datalad to store the data

> mkdir data
> cd data
> datalad install https://gin.g-node.org/ljchang/Localizer

install(ok): /Users/zenkavi/Documents/RangelLab/IntroTofMRI/data/Localizer (dataset)

Note this does not actually download any data but creates the dataset structure and records the metadata to know where to pull data from when you want to. You can confirm that nothing has been dowloaded as follows

> cd Localizer 
> datalad status --annex all

1794 annex'd files (0.0 B/42.1 GB present/total size)
nothing to save, working tree clean

Datalad through Python#

You can also interact with datalad directly through Python. Make sure you have installed the packages listed in requirements.txt

Interacting with datalad through Python the datalad install operation would look like below

import os
import glob
import datalad.api as dl
import pandas as pd
localizer_path = '/Users/zenkavi/Documents/RangelLab/IntroTofMRI/data/Localizer'

dl.clone(source='https://gin.g-node.org/ljchang/Localizer', path=localizer_path)

Download data for the course#

Confirm that no data has been downloaded yet

ds = dl.Dataset(localizer_path)
results = ds.status(annex='all')

Data for the course is:

  • sub-S01’s raw data

  • experimental metadata

  • preprocessed data for the first 20 subjects including the fmriprep QC reports.

result = ds.get(os.path.join(localizer_path, 'sub-S01'))
result = ds.get(glob.glob(os.path.join(localizer_path, '*.json')))
result = ds.get(glob.glob(os.path.join(localizer_path, '*.tsv')))
result = ds.get(glob.glob(os.path.join(localizer_path, 'phenotype')))
file_list = glob.glob(os.path.join(localizer_path, '*', 'fmriprep', 'sub*'))
file_list.sort()
for f in file_list[:20]:
    result = ds.get(f)