Big image data formats

Prerequisites

Before starting this lesson, you should be familiar with:

Learning Objectives

After completing this lesson, learners should be able to:
  • Understand the concepts of lazy-loading, chunking and scale pyramids

  • Know some file formats that implement chunking and scale pyramids

Motivation

Modern microscopy frequently generates image data in the GB-TB range. Such data cannot be naively opened. First, the data may not fit into the working memory (RAM) of your computer. Second, it would take a lot of time to load the data into the memory. Thus, it is important to know about dedicated concepts and implemenations that enable swift interaction with such big image data.

Concept map

graph TD BIG("Big image data") -->|saved as| RP("Scale pyramids") RP -->|saved as| C("Chunks on disk") C -->|lazy-load| PC("Computer memory")



Figure


Top left: For high-resolution or large image data there can be situations where there are less pixels in the viewer window than in the data, thus only a subset can be loaded, which is hard to do in practice such that typically all data needs to be loaded, which is expensive. Having one or more downsampled versions of the data solves this issue. Bottom left: Also when zooming in there are more pixels in data space than are currently needed in viewer space. Chunked data storage allows to restrict the data loading to a smaller region in data space. Right: Examples of chunks and scale pyramids.



Activities


Show activity for:  

ImageJ GUI

Please note that currently, no explicit images are mentioned, because we don’t know yet where to provide big image data. Until we found a solution, please find some appropriate image data for your course.

Open a big TIFF image

  • Check big TIFF image file size on disk
    • Compare to your computer’s RAM
  • Open Fiji
    • [ Plugins › Utilities › Monitor Memory… ]
  • Load TIFF as a whole: [ File › Open… ]
    • Observe that this is slow, because it needs to load the whole image.
    • Observe how the memory fills up.
    • Slice in x,y,z: [ Image › Stacks › Orthogonal Views ]
      • Observe that this is relatively fast, because all data is in memory already.
  • Lazy load: [ Plugins › Bio-Formats › Bio-Formats Importer ]
    • Use virtual stack
      • This enables plane-wise lazy-loading.
    • This is faster, due to lazy-loading.
      • Changing z-slices takes some time, because it needs to lazy-load the corresponding chunk (i.e. plane).
      • Slice in x,y,z: [ Image › Stacks › Orthogonal Views ]
        • This takes more time now, because all data needs to be loaded.
        • Check the memory while it is taking the time.

Open a big XML/HDF5 image

  • Check XML/HDF5 image file size on disk.
    • Compare to your computer’s RAM.
  • Open Fiji
    • [ Plugins › Utilities › Monitor Memory… ]
  • Lazy load XML/HDF5 using [ Plugins › BigDataViewer › Open XML/HDF5 ]
    • This is fast, because of lazy-loading and resolution pyramids.
    • Zoom in (it will fetch higher resolution levels).
    • Inspect the data at different angles [ Left button drag ]
      • This is fast, because of 3D chunking.
  • Lazy load XML/HDF5 using [ Plugins › Bio-Formats › Bio-Formats Importer ]
    • Use virtual stack
      • This enables plane-wise lazy-loading.
    • Choose a resolution level.
      • Inspect data (now we are at the same situation as if this were a TIFF image with plane-wise chunking, s.a.).
    • HDFView
      • Inspect the .h5 file to see the resolution pyramids and chunking.



Assessment

Fill in the blanks

  1. Opening data piece-wise on demand is also called ___ .
  2. Storing data piece-wise is also called ___ .
  3. In order to enable fast inspection of spatial data at different scales (like on Google maps) one can use ___ .

Solution

  1. lazy-loading
  2. chunking
  3. resolution pyramids

Explanations




Follow-up material

Recommended follow-up modules:

Learn more: