API Reference¶
mydia.Videos¶
An instance of this class is used as a reader to read videos.
-
class
mydia.
Videos
(target_size=None, to_gray=False, num_frames=None, mode='auto', normalize=False, data_format='channels_last', random_state=17)[source]¶ Class to read in videos and store them as numpy arrays
The videos are stored as a 5-dimensional tensor where the shape of the tensor depends on
data_format
.Parameters: - target_size (tuple[int, int]) – A tuple of form
(width, height)
indicating the dimension to resize the frames of the video, defaults to None. The dimension of the frames will not be altered if this parameter is not set. - to_gray (bool) – Convert video to grayscale, defaults to False.
- num_frames (int) – The (exact) number of frames to extract from the video, defaults
to None. Frames are extracted based on the value of
mode
. If not set, all the frames of the video are kept. - mode (str) –
The method used for frame extraction if
num_frames
is set. It could be one of “auto”, “random”, “first”, “last” or “middle”."auto"
: N frames will be extracted at equal intervals."random"
: N frames will be randomly extracted (no repetetion). Userandom_state
to ensure reproducibility."first"
,"last"
and"middle"
will extract N contiguous frames from the beginning, end and middle of the video respectively.
- normalize (bool) – Shifts each video to the range (0, 1) by subtracting the minimum and dividing by the difference between the maximum and the minimum pixel value. Defaults to False
- data_format (str) –
Video data format, either “channels_last” or “channels_first”.
"channels_last"
: The tensor will have shape(<videos>, <frames>, <height>, <width>, <channels>)
"channels_first"
: The tensor will have shape(<videos>, <channels>, <frames>, <height>, <width>)
channels
will be 3 for videos in RGB format, or 1 for videos in grayscale. - random_state (int) – Integer that seeds the (numpy) random number generator, defaults
to 17. Used only when
mode
is set to “random”.
Example
from mydia import Videos reader = Videos( target_size=(720, 480), to_gray=False, num_frames=128, data_format="channels_first" ) video = reader.read("./path/to/video")
Note
You could also pass a callable to
mode
for custom frame extraction. The callable should return a list of integers, denoting the indices of the frames to be extracted. It should take 4 (non-keyword) arguments:total_frames
: The total number of frames in the videonum_frames
: The number of frames that you want to extractfps
: The frame rate of the videorandom_state
: Integer to seed the random number generator
These arguments may/may not be used to generate the required frame indices. Detailed examples are provided in the documentation.
Warning
If you are passing a callable to
mode
, then make sure that the number of frames (indices) it returns is equal to the value ofnum_frames
. If this condition is not met, then this would mean that the number of frames selected is different for different videos, and therefore they cannot be stacked into a single tensor.-
read
(paths, verbose=1, workers=0)[source]¶ Function to read videos
Parameters: - paths (str or list[str]) – A list of paths/path of the video(s) to be read.
- verbose (int) – If set to 0, the progress bar will be disabled.
- workers (int) –
The number of processes (CPUs) to use for reading the videos. This uses the
multiprocessing
module present in the python standard library.Its value can range from 0 to max_workers where the latter can be determined by calling
multiprocessing.cpu_count()
on your machine.Defaults to 0, which means that multiprocessing will not be used.
Returns: A 5-dimensional tensor, whose shape will depend on the value of
data_format
.- For
"channels_last"
: The tensor will have shape(<videos>, <frames>, <height>, <width>, <channels>)
- For
"channels_first"
: The tensor will have shape(<videos>, <channels>, <frames>, <height>, <width>)
Return type: Raises: ValueError
– Ifpaths
is neither a string, not a list of strings.IndexError
– Ifnum_frames
is set to a value greater than the total number of frames available in the video.
Important
If multiple videos are to be read, then each video should have the same dimension
(frames, height, width)
, otherwise they cannot be stacked into a single tensor. Therefore, the user must use the parameterstarget_size
andnum_frames
to make sure of this.
- target_size (tuple[int, int]) – A tuple of form
mydia.make_grid¶
This method can be used for converting a video into a grid of frames. Inspired from a similar utility provided in torchvision
-
mydia.
make_grid
(video, num_col=3, padding=5)[source]¶ Converts a video into a grid of frames.
Parameters: - video (
numpy.ndarray
) – A 4-dimensional video tensor (a single video). - num_col (int) – The number of columns in the grid, defaults to 3.
- padding (int) – Amount of padding (in pixels), defaults to 5.
Returns: A gird of frames (numpy array) of shape
(height, width, 3)
if the video is in RGB format, or(height, width)
if the video is in grayscale.Return type: Raises: ValueError
– If the dimension of thevideo
tensor is invalid.Example
import matplotlib.pyplot as plt from mydia import Videos, make_grid reader = Videos(target_size=(720, 480), to_gray=True) video = reader.read("./path/to/video") grid = make_grid(video[0], num_col=6, padding=8) plt.imshow(grid, cmap="gray")
Note
The input to this function should be a single video tensor, with any
data_format
. However, the grid of frames produced as the output will always be"channels_last"
.- video (