Advanced Folder Structures#
In the previous notebook, we saw how to access images if they are stored in a single folder. In this notebook, we will show two more organized examples of folder structures: an Omero-like structure and a machine learning suited structure.
from skimage.io import imread, imsave
import matplotlib.pyplot as plt
from pathlib import Path
import numpy as np
Omero-like Structure#
Omero is an image data management platform that handles image data and metadata, allowing to remotely view, organize, analyze and share images. Below is a screenshot of an Omero server.
On the left side, the folder structure can be obeserved. Omero operates with two levels of hierarchy: images can be put inside directories called Datasets, and Datasets can be put inside directories called Projects. Further differentiation of files is made via metadata, by means of image tags and image key-value pairs.
Mimicking that structure, our local Project directory called “Project2_Omero_like” contains two folders (Datasets): Control and Group1, each containing images and other files.
Project2_Omero_like
|
├─ Control
| ├─ Readme.txt
| ├─ A9 p5d.tif
| ⁞
|
├─ Group1
| ├─ Readme.txt
| ├─ 17P1_POS0006_D_1UL.tif
| ⁞
|
└─ Readme.txt
Opening multiple images from folders#
We start by providing the path to the highest level folder.
data_folder2 = '../../../data/Folder_Structures/Project2_Omero_like'
data_path = Path(data_folder2)
Since here we have a two-level hierarchy of directories, we need 2 for
loops to iterate over each level. The first for
loop iterates over the top level, .i.e, we get the paths to folders/files inside the “Project” folder.
for path in data_path.iterdir():
print(path)
..\..\..\data\Folder_Structures\Project2_Omero_like\Control
..\..\..\data\Folder_Structures\Project2_Omero_like\Group1
..\..\..\data\Folder_Structures\Project2_Omero_like\Readme.txt
To access the lower levels, we need two things:
1. Check if the path leads to another folder
2. If yes, iterate over this folder
We can do that by putting an if
condition inside the first for
loop and a second for
loop to be run if the condition is met.
Note: the sorted
function can sort paths alphabetically. Another good more sophisticated option is natsort.
# Fisrt for loop: iterates over Project folder
for path in data_path.iterdir():
print('Project folder path: \n', path)
# Check if path leads to another folder
if path.is_dir():
# In case the condition is met, iterate over the new path
for file_path in sorted(path.iterdir()):
print('Dataset folder path: ', file_path)
Project folder path:
..\..\..\data\Folder_Structures\Project2_Omero_like\Control
Dataset folder path: ..\..\..\data\Folder_Structures\Project2_Omero_like\Control\A9 p10d.tif
Dataset folder path: ..\..\..\data\Folder_Structures\Project2_Omero_like\Control\A9 p5d.tif
Dataset folder path: ..\..\..\data\Folder_Structures\Project2_Omero_like\Control\A9 p7d.tif
Dataset folder path: ..\..\..\data\Folder_Structures\Project2_Omero_like\Control\A9 p9d.tif
Dataset folder path: ..\..\..\data\Folder_Structures\Project2_Omero_like\Control\Readme.txt
Project folder path:
..\..\..\data\Folder_Structures\Project2_Omero_like\Group1
Dataset folder path: ..\..\..\data\Folder_Structures\Project2_Omero_like\Group1\17P1_POS0006_D_1UL.tif
Dataset folder path: ..\..\..\data\Folder_Structures\Project2_Omero_like\Group1\17P1_POS0007_D_1UL.tif
Dataset folder path: ..\..\..\data\Folder_Structures\Project2_Omero_like\Group1\17P1_POS0011_D_1UL.tif
Dataset folder path: ..\..\..\data\Folder_Structures\Project2_Omero_like\Group1\17P1_POS0013_D_1UL.tif
Dataset folder path: ..\..\..\data\Folder_Structures\Project2_Omero_like\Group1\17P1_POS0014_D_1UL.tif
Dataset folder path: ..\..\..\data\Folder_Structures\Project2_Omero_like\Group1\Readme.txt
Project folder path:
..\..\..\data\Folder_Structures\Project2_Omero_like\Readme.txt
As usual, to filter out files that are not images, we can add extra conditions and use .glob
for specific file formats. We also store paths from different folders in separated lists by checking the path stem
(the final path component, without its suffix).
Example:
if a variable
file_path
containsPath('..\..\..\data\Folder_Structures\Project2_Omero_like\Readme.txt')
file_path.stem
yields'Readme'
file_path.suffix
yields'.txt'
Check this out below:
file_path = Path('..\..\..\data\Folder_Structures\Project2_Omero_like\Readme.txt')
file_path.stem
'Readme'
Now back to the folder iteration, we collect file paths that lead to .tif
files and store them in 2 lists
, depending if they are in the 'Control'
folder or in the 'Group1'
folder.
image_path_list_control = []
image_path_list_group1 = []
for path in data_path.iterdir():
# Check if path leads to another folder
if path.is_dir():
for file_path in sorted(path.glob('*.tif')):
# Check if current folder name is 'Control'
if path.stem == 'Control':
# Store file path in control list
image_path_list_control += [file_path]
# Check if current folder name is 'Group1'
elif path.stem == 'Group1':
# Store file path in group1 list
image_path_list_group1 += [file_path]
image_path_list_control
[WindowsPath('../../../data/Folder_Structures/Project2_Omero_like/Control/A9 p10d.tif'),
WindowsPath('../../../data/Folder_Structures/Project2_Omero_like/Control/A9 p5d.tif'),
WindowsPath('../../../data/Folder_Structures/Project2_Omero_like/Control/A9 p7d.tif'),
WindowsPath('../../../data/Folder_Structures/Project2_Omero_like/Control/A9 p9d.tif')]
image_path_list_group1
[WindowsPath('../../../data/Folder_Structures/Project2_Omero_like/Group1/17P1_POS0006_D_1UL.tif'),
WindowsPath('../../../data/Folder_Structures/Project2_Omero_like/Group1/17P1_POS0007_D_1UL.tif'),
WindowsPath('../../../data/Folder_Structures/Project2_Omero_like/Group1/17P1_POS0011_D_1UL.tif'),
WindowsPath('../../../data/Folder_Structures/Project2_Omero_like/Group1/17P1_POS0013_D_1UL.tif'),
WindowsPath('../../../data/Folder_Structures/Project2_Omero_like/Group1/17P1_POS0014_D_1UL.tif')]
Machine Learning Style Folder Structure#
With machine learning or deep learning, we typically have a folder with intensity images and another folder with labeled images or masks with the same file name. Images in both folders must be read in pairs, so storing their paths in an ordered list is important.
There are a few variations to this initial structure. One of them could be having another level on top separating these images into train, test and validation groups.
Another possiblity is having a third folder with manual annotations for the labeled objects. This last structure is the one we replicated below, in our local folder called “Project3_Machine_Learning_style”.
Project3_Machine_learning_style
|
├─ Annotations
| ├─ image_01.tif
| ├─ image_02.tif
| ⁞
|
├─ Label_Images
| ├─ image_01.tif
| ├─ image_02.tif
| ⁞
|
├─ Raw_Images
| ├─ image_01.tif
| ├─ image_02.tif
| ⁞
└─ Readme.md
data_folder3 = '../../../data/Folder_Structures/Project3_Machine_Learning_style'
data_path = Path(data_folder3)
data_path
WindowsPath('../../../data/Folder_Structures/Project3_Machine_Learning_style')
Exercise#
Iterate over Project3_Machine_Learning_style
and store each type of image path (raw, label and annotation) in a different list.
In another cell, display the first image from each list.