Data download#
Many examples in the rest of the tutorials will use data from the Dartbrains course and other sample data. This tutorial describes how to download these data if you want to be able to run the code yourself.
Install datalad
#
In a Terminal window (or however you prefer interacting with your shell) install required software to download data through datalad
. Below we install git-annex
through Homebrew and datalad
through pip
> brew install git-annex
==> Caveats
==> git-annex
To start git-annex now and restart at login:
brew services start git-annex
Or, if you don't want/need a background service you can just run:
/opt/homebrew/opt/git-annex/bin/git-annex assistant --autostart
> pip install datalad
Create a data
directory and navigate into it. This will be where you will tell datalad
to store the data
> mkdir data
> cd data
> datalad install https://gin.g-node.org/ljchang/Localizer
install(ok): /Users/zenkavi/Documents/RangelLab/IntroTofMRI/data/Localizer (dataset)
Note this does not actually download any data but creates the dataset structure and records the metadata to know where to pull data from when you want to. You can confirm that nothing has been dowloaded as follows
> cd Localizer
> datalad status --annex all
1794 annex'd files (0.0 B/42.1 GB present/total size)
nothing to save, working tree clean
Datalad through Python#
You can also interact with datalad
directly through Python. Make sure you have installed the packages listed in requirements.txt
Interacting with datalad
through Python the datalad install
operation would look like below
import os
import glob
import datalad.api as dl
import pandas as pd
localizer_path = '/Users/zenkavi/Documents/RangelLab/IntroTofMRI/data/Localizer'
dl.clone(source='https://gin.g-node.org/ljchang/Localizer', path=localizer_path)
Download data for the course#
Confirm that no data has been downloaded yet
ds = dl.Dataset(localizer_path)
results = ds.status(annex='all')
Data for the course is:
sub-S01’s raw data
experimental metadata
preprocessed data for the first 20 subjects including the fmriprep QC reports.
result = ds.get(os.path.join(localizer_path, 'sub-S01'))
result = ds.get(glob.glob(os.path.join(localizer_path, '*.json')))
result = ds.get(glob.glob(os.path.join(localizer_path, '*.tsv')))
result = ds.get(glob.glob(os.path.join(localizer_path, 'phenotype')))
file_list = glob.glob(os.path.join(localizer_path, '*', 'fmriprep', 'sub*'))
file_list.sort()
for f in file_list[:20]:
result = ds.get(f)