jactorch.transforms.vision.transforms#

Classes

CenterCrop

Crops the given image at the center.

ColorJitter

Compose

Crop

DenormalizeCoordinates

FiveCrop

Crop the given image into four corners and the central crop.

Grayscale

Convert image to grayscale.

HFlip

Lambda

LinearTransformation

Transform a tensor image with a square transformation matrix and a mean_vector computed offline.

Normalize

Normalize a tensor image with mean and standard deviation.

NormalizeCoordinates

Pad

Pad the given image on all sides with the given "pad" value.

PadMultipleOf

RandomApply

RandomChoice

RandomCrop

Crop the given image at a random location.

RandomGrayscale

Randomly convert image to grayscale with a probability of p (default 0.1).

RandomHorizontalFlip

Horizontally flip the given image randomly with a given probability.

RandomOrder

RandomResizedCrop

Crop a random portion of image and resize it to a given size.

RandomRotation

Rotate the image by angle.

RandomVerticalFlip

Vertically flip the given image randomly with a given probability.

Resize

Resize the input image to the given size.

ResizeMultipleOf

Rotate

TenCrop

Crop the given image into four corners and the central crop plus the flipped version of these (horizontal flipping is used by default).

ToPILImage

Convert a tensor or an ndarray to PIL Image

ToTensor

Convert a PIL Image or ndarray to tensor and scale the values accordingly.

TransformBase

TransformDataTypes

TransformFunctionBase

TransformFunctionBaseImageOnly

TransformGuide

VFlip

Class CenterCrop

class CenterCrop[source]#

Bases: TransformBase

Crops the given image at the center. If the image is torch Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions. If image size is smaller than output size along any edge, image is padded with 0 and then center cropped.

Parameters:

size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(size, tg=None)[source]#
__new__(**kwargs)#
call_feed_dict(feed_dict)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class ColorJitter

class ColorJitter[source]#

Bases: TransformFunctionBaseImageOnly

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(brightness=0, contrast=0, saturation=0, hue=0, tg=None)[source]#
__new__(**kwargs)#
call_bbox(img, bbox)#
call_coor(img, coor)#
call_feed_dict(feed_dict)#
call_image(img)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class Compose

class Compose[source]#

Bases: Compose

__call__(feed_dict=None, **kwargs)[source]#

Call self as a function.

__init__(transforms)[source]#
__new__(**kwargs)#
ezcall(image=None, coor=None, bbox=None)#

Class Crop

class Crop[source]#

Bases: TransformFunctionBase

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(x, y, w, h, tg=None)[source]#
__new__(**kwargs)#
call_bbox(img, bbox)[source]#
call_coor(img, coor)[source]#
call_feed_dict(feed_dict)#
call_image(img)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class DenormalizeCoordinates

class DenormalizeCoordinates[source]#

Bases: TransformFunctionBase

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(tg=None)#
__new__(**kwargs)#
call_bbox(img, bbox)[source]#
call_coor(img, coor)[source]#
call_feed_dict(feed_dict)#
call_image(img)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class FiveCrop

class FiveCrop[source]#

Bases: TransformFunctionBase

Crop the given image into four corners and the central crop. If the image is torch Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions

Note

This transform returns a tuple of images and there may be a mismatch in the number of inputs and targets your Dataset returns. See below for an example of how to deal with this.

Parameters:

size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop of size (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).

Example

>>> transform = Compose([
>>>    FiveCrop(size), # this is a list of PIL Images
>>>    Lambda(lambda crops: torch.stack([PILToTensor()(crop) for crop in crops])) # returns a 4D tensor
>>> ])
>>> #In your test loop you can do the following:
>>> input, target = batch # input is a 5d tensor, target is 2d
>>> bs, ncrops, c, h, w = input.size()
>>> result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
>>> result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops
__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(size, tg=None)[source]#
__new__(**kwargs)#
call_bbox(img, bbox)#
call_coor(img, coor)#
call_feed_dict(feed_dict)#
call_image(img)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class Grayscale

class Grayscale[source]#

Bases: TransformFunctionBaseImageOnly

Convert image to grayscale. If the image is torch Tensor, it is expected to have […, 3, H, W] shape, where … means an arbitrary number of leading dimensions

Parameters:

num_output_channels (int) – (1 or 3) number of channels desired for output image

Returns:

Grayscale version of the input.

  • If num_output_channels == 1 : returned image is single channel

  • If num_output_channels == 3 : returned image is 3 channel with r == g == b

Return type:

PIL Image

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(num_output_channels=1, tg=None)[source]#
__new__(**kwargs)#
call_bbox(img, bbox)#
call_coor(img, coor)#
call_feed_dict(feed_dict)#
call_image(img)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class HFlip

class HFlip[source]#

Bases: TransformFunctionBase

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(tg=None)#
__new__(**kwargs)#
call_bbox(img, bbox)[source]#
call_coor(img, coor)[source]#
call_feed_dict(feed_dict)#
call_image(img)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class Lambda

class Lambda[source]#

Bases: Lambda

__call__(feed_dict=None, **kwargs)[source]#

Call self as a function.

__init__(lambd)[source]#
__new__(**kwargs)#
ezcall(image=None, coor=None, bbox=None)#

Class LinearTransformation

class LinearTransformation[source]#

Bases: TransformFunctionBaseImageOnly

Transform a tensor image with a square transformation matrix and a mean_vector computed offline. This transform does not support PIL Image. Given transformation_matrix and mean_vector, will flatten the torch.*Tensor and subtract mean_vector from it which is then followed by computing the dot product with the transformation matrix and then reshaping the tensor to its original shape.

Applications:

whitening transformation: Suppose X is a column vector zero-centered data. Then compute the data covariance matrix [D x D] with torch.mm(X.t(), X), perform SVD on this matrix and pass it as transformation_matrix.

Parameters:
  • transformation_matrix (Tensor) – tensor [D x D], D = C x H x W

  • mean_vector (Tensor) – tensor [D], D = C x H x W

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(transformation_matrix, tg=None)[source]#
__new__(**kwargs)#
call_bbox(img, bbox)#
call_coor(img, coor)#
call_feed_dict(feed_dict)#
call_image(tensor)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class Normalize

class Normalize[source]#

Bases: TransformFunctionBaseImageOnly

Normalize a tensor image with mean and standard deviation. This transform does not support PIL Image. Given mean: (mean[1],...,mean[n]) and std: (std[1],..,std[n]) for n channels, this transform will normalize each channel of the input torch.*Tensor i.e., output[channel] = (input[channel] - mean[channel]) / std[channel]

Note

This transform acts out of place, i.e., it does not mutate the input tensor.

Parameters:
  • mean (sequence) – Sequence of means for each channel.

  • std (sequence) – Sequence of standard deviations for each channel.

  • inplace (bool,optional) – Bool to make this operation in-place.

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(mean, std, tg=None)[source]#
__new__(**kwargs)#
call_bbox(img, bbox)#
call_coor(img, coor)#
call_feed_dict(feed_dict)#
call_image(tensor)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class NormalizeCoordinates

class NormalizeCoordinates[source]#

Bases: TransformFunctionBase

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(tg=None)#
__new__(**kwargs)#
call_bbox(img, bbox)[source]#
call_coor(img, coor)[source]#
call_feed_dict(feed_dict)#
call_image(img)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class Pad

class Pad[source]#

Bases: TransformFunctionBase

Pad the given image on all sides with the given “pad” value. If the image is torch Tensor, it is expected to have […, H, W] shape, where … means at most 2 leading dimensions for mode reflect and symmetric, at most 3 leading dimensions for mode edge, and an arbitrary number of leading dimensions for mode constant

Parameters:
  • padding (int or sequence) –

    Padding on each border. If a single int is provided this is used to pad all borders. If sequence of length 2 is provided this is the padding on left/right and top/bottom respectively. If a sequence of length 4 is provided this is the padding for the left, top, right and bottom borders respectively.

    Note

    In torchscript mode padding as single int is not supported, use a sequence of length 1: [padding, ].

  • fill (number or tuple) – Pixel fill value for constant fill. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode is constant. Only number is supported for torch Tensor. Only int or tuple value is supported for PIL Image.

  • padding_mode (str) –

    Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant.

    • constant: pads with a constant value, this value is specified with fill

    • edge: pads with the last value at the edge of the image. If input a 5D torch Tensor, the last 3 dimensions will be padded instead of the last 2

    • reflect: pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2]

    • symmetric: pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(padding, mode='constant', fill=0, tg=None)[source]#
__new__(**kwargs)#
call_bbox(img, bbox)[source]#
call_coor(img, coor)[source]#
call_feed_dict(feed_dict)#
call_image(img)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class PadMultipleOf

class PadMultipleOf[source]#

Bases: TransformBase

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(multiple, residual=0, mode='constant', fill=0, tg=None)[source]#
__new__(**kwargs)#
call_feed_dict(feed_dict)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class RandomApply

class RandomApply[source]#

Bases: RandomApply

__call__(feed_dict=None, **kwargs)[source]#

Call self as a function.

__init__(transforms, p=0.5)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

ezcall(image=None, coor=None, bbox=None)#
forward(img)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Class RandomChoice

class RandomChoice[source]#

Bases: RandomChoice

__call__(feed_dict=None, **kwargs)[source]#

Call self as a function.

__init__(transforms, p=None)[source]#
__new__(**kwargs)#

Class RandomCrop

class RandomCrop[source]#

Bases: TransformBase

Crop the given image at a random location. If the image is torch Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions, but if non-constant padding is used, the input is expected to have at most 2 leading dimensions

Parameters:
  • size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).

  • padding (int or sequence, optional) –

    Optional padding on each border of the image. Default is None. If a single int is provided this is used to pad all borders. If sequence of length 2 is provided this is the padding on left/right and top/bottom respectively. If a sequence of length 4 is provided this is the padding for the left, top, right and bottom borders respectively.

    Note

    In torchscript mode padding as single int is not supported, use a sequence of length 1: [padding, ].

  • pad_if_needed (boolean) – It will pad the image if smaller than the desired size to avoid raising an exception. Since cropping is done after padding, the padding seems to be done at a random offset.

  • fill (number or tuple) – Pixel fill value for constant fill. Default is 0. If a tuple of length 3, it is used to fill R, G, B channels respectively. This value is only used when the padding_mode is constant. Only number is supported for torch Tensor. Only int or tuple value is supported for PIL Image.

  • padding_mode (str) –

    Type of padding. Should be: constant, edge, reflect or symmetric. Default is constant.

    • constant: pads with a constant value, this value is specified with fill

    • edge: pads with the last value at the edge of the image. If input a 5D torch Tensor, the last 3 dimensions will be padded instead of the last 2

    • reflect: pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2]

    • symmetric: pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(size, padding=0, pad_if_needed=False, tg=None)[source]#
__new__(**kwargs)#
call_feed_dict(feed_dict)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class RandomGrayscale

class RandomGrayscale[source]#

Bases: TransformFunctionBaseImageOnly

Randomly convert image to grayscale with a probability of p (default 0.1). If the image is torch Tensor, it is expected to have […, 3, H, W] shape, where … means an arbitrary number of leading dimensions

Parameters:

p (float) – probability that image should be converted to grayscale.

Returns:

Grayscale version of the input image with probability p and unchanged with probability (1-p). - If input image is 1 channel: grayscale version is 1 channel - If input image is 3 channel: grayscale version is 3 channel with r == g == b

Return type:

PIL Image or Tensor

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(p=0.1, tg=None)[source]#
__new__(**kwargs)#
call_bbox(img, bbox)#
call_coor(img, coor)#
call_feed_dict(feed_dict)#
call_image(img)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class RandomHorizontalFlip

class RandomHorizontalFlip[source]#

Bases: TransformBase

Horizontally flip the given image randomly with a given probability. If the image is torch Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions

Parameters:

p (float) – probability of the image being flipped. Default value is 0.5

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(p=0.5, tg=None)[source]#
__new__(**kwargs)#
call_feed_dict(feed_dict)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class RandomOrder

class RandomOrder[source]#

Bases: RandomOrder

__call__(feed_dict=None, **kwargs)[source]#

Call self as a function.

__init__(transforms)#
__new__(**kwargs)#
ezcall(image=None, coor=None, bbox=None)#

Class RandomResizedCrop

class RandomResizedCrop[source]#

Bases: TransformBase

Crop a random portion of image and resize it to a given size.

If the image is torch Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions

A crop of the original image is made: the crop has a random area (H * W) and a random aspect ratio. This crop is finally resized to the given size. This is popularly used to train the Inception networks.

Parameters:
  • size (int or sequence) –

    expected output size of the crop, for each edge. If size is an int instead of sequence like (h, w), a square output size (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).

    Note

    In torchscript mode size as single int is not supported, use a sequence of length 1: [size, ].

  • scale (tuple of float) – Specifies the lower and upper bounds for the random area of the crop, before resizing. The scale is defined with respect to the area of the original image.

  • ratio (tuple of float) – lower and upper bounds for the random aspect ratio of the crop, before resizing.

  • interpolation (InterpolationMode) – Desired interpolation enum defined by torchvision.transforms.InterpolationMode. Default is InterpolationMode.BILINEAR. If input is Tensor, only InterpolationMode.NEAREST, InterpolationMode.NEAREST_EXACT, InterpolationMode.BILINEAR and InterpolationMode.BICUBIC are supported. The corresponding Pillow integer constants, e.g. PIL.Image.BILINEAR are accepted as well.

  • antialias (bool, optional) –

    Whether to apply antialiasing. It only affects tensors with bilinear or bicubic modes and it is ignored otherwise: on PIL images, antialiasing is always applied on bilinear or bicubic modes; on other modes (for PIL images and tensors), antialiasing makes no sense and this parameter is ignored. Possible values are:

    • True (default): will apply antialiasing for bilinear or bicubic modes. Other mode aren’t affected. This is probably what you want to use.

    • False: will not apply antialiasing for tensors on any mode. PIL images are still antialiased on bilinear or bicubic modes, because PIL doesn’t support no antialias.

    • None: equivalent to False for tensors and True for PIL images. This value exists for legacy reasons and you probably don’t want to use it unless you really know what you are doing.

    The default value changed from None to True in v0.17, for the PIL and Tensor backends to be consistent.

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(size, scale=(0.08, 1.0), ratio=(3. / 4., 4. / 3.), interpolation=Image.BILINEAR, tg=None)[source]#
__new__(**kwargs)#
call_feed_dict(feed_dict)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class RandomRotation

class RandomRotation[source]#

Bases: TransformBase

Rotate the image by angle. If the image is torch Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions.

Parameters:
  • degrees (sequence or number) – Range of degrees to select from. If degrees is a number instead of sequence like (min, max), the range of degrees will be (-degrees, +degrees).

  • interpolation (InterpolationMode) – Desired interpolation enum defined by torchvision.transforms.InterpolationMode. Default is InterpolationMode.NEAREST. If input is Tensor, only InterpolationMode.NEAREST, InterpolationMode.BILINEAR are supported. The corresponding Pillow integer constants, e.g. PIL.Image.BILINEAR are accepted as well.

  • expand (bool, optional) – Optional expansion flag. If true, expands the output to make it large enough to hold the entire rotated image. If false or omitted, make the output image the same size as the input image. Note that the expand flag assumes rotation around the center and no translation.

  • center (sequence, optional) – Optional center of rotation, (x, y). Origin is the upper left corner. Default is the center of the image.

  • fill (sequence or number) – Pixel fill value for the area outside the rotated image. Default is 0. If given a number, the value is used for all bands respectively.

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(degrees, resample=False, crop=False, expand=False, center=None, translate=None, tg=None)[source]#
__new__(**kwargs)#
call_feed_dict(feed_dict)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class RandomVerticalFlip

class RandomVerticalFlip[source]#

Bases: TransformBase

Vertically flip the given image randomly with a given probability. If the image is torch Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions

Parameters:

p (float) – probability of the image being flipped. Default value is 0.5

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(p=0.5, tg=None)[source]#
__new__(**kwargs)#
call_feed_dict(feed_dict)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class Resize

class Resize[source]#

Bases: TransformFunctionBase

Resize the input image to the given size. If the image is torch Tensor, it is expected to have […, H, W] shape, where … means a maximum of two leading dimensions

Parameters:
  • size (sequence or int) –

    Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size).

    Note

    In torchscript mode size as single int is not supported, use a sequence of length 1: [size, ].

  • interpolation (InterpolationMode) – Desired interpolation enum defined by torchvision.transforms.InterpolationMode. Default is InterpolationMode.BILINEAR. If input is Tensor, only InterpolationMode.NEAREST, InterpolationMode.NEAREST_EXACT, InterpolationMode.BILINEAR and InterpolationMode.BICUBIC are supported. The corresponding Pillow integer constants, e.g. PIL.Image.BILINEAR are accepted as well.

  • max_size (int, optional) – The maximum allowed for the longer edge of the resized image. If the longer edge of the image is greater than max_size after being resized according to size, size will be overruled so that the longer edge is equal to max_size. As a result, the smaller edge may be shorter than size. This is only supported if size is an int (or a sequence of length 1 in torchscript mode).

  • antialias (bool, optional) –

    Whether to apply antialiasing. It only affects tensors with bilinear or bicubic modes and it is ignored otherwise: on PIL images, antialiasing is always applied on bilinear or bicubic modes; on other modes (for PIL images and tensors), antialiasing makes no sense and this parameter is ignored. Possible values are:

    • True (default): will apply antialiasing for bilinear or bicubic modes. Other mode aren’t affected. This is probably what you want to use.

    • False: will not apply antialiasing for tensors on any mode. PIL images are still antialiased on bilinear or bicubic modes, because PIL doesn’t support no antialias.

    • None: equivalent to False for tensors and True for PIL images. This value exists for legacy reasons and you probably don’t want to use it unless you really know what you are doing.

    The default value changed from None to True in v0.17, for the PIL and Tensor backends to be consistent.

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(size, interpolation=Image.BILINEAR, tg=None)[source]#
__new__(**kwargs)#
call_bbox(img, bbox)[source]#
call_coor(img, coor)[source]#
call_feed_dict(feed_dict)#
call_image(img)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class ResizeMultipleOf

class ResizeMultipleOf[source]#

Bases: TransformBase

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(multiple, residual=0, interpolation=Image.NEAREST, tg=None)[source]#
__new__(**kwargs)#
call_feed_dict(feed_dict)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class Rotate

class Rotate[source]#

Bases: TransformBase

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(angle, resample=False, crop=False, expand=False, center=None, translate=None, tg=None)[source]#
__new__(**kwargs)#
call_feed_dict(feed_dict)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class TenCrop

class TenCrop[source]#

Bases: TransformFunctionBase

Crop the given image into four corners and the central crop plus the flipped version of these (horizontal flipping is used by default). If the image is torch Tensor, it is expected to have […, H, W] shape, where … means an arbitrary number of leading dimensions

Note

This transform returns a tuple of images and there may be a mismatch in the number of inputs and targets your Dataset returns. See below for an example of how to deal with this.

Parameters:
  • size (sequence or int) – Desired output size of the crop. If size is an int instead of sequence like (h, w), a square crop (size, size) is made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).

  • vertical_flip (bool) – Use vertical flipping instead of horizontal

Example

>>> transform = Compose([
>>>    TenCrop(size), # this is a tuple of PIL Images
>>>    Lambda(lambda crops: torch.stack([PILToTensor()(crop) for crop in crops])) # returns a 4D tensor
>>> ])
>>> #In your test loop you can do the following:
>>> input, target = batch # input is a 5d tensor, target is 2d
>>> bs, ncrops, c, h, w = input.size()
>>> result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
>>> result_avg = result.view(bs, ncrops, -1).mean(1) # avg over crops
__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(size, tg=None)[source]#
__new__(**kwargs)#
call_bbox(img, bbox)#
call_coor(img, coor)#
call_feed_dict(feed_dict)#
call_image(img)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class ToPILImage

class ToPILImage[source]#

Bases: TransformFunctionBaseImageOnly

Convert a tensor or an ndarray to PIL Image

This transform does not support torchscript.

Converts a torch.*Tensor of shape C x H x W or a numpy ndarray of shape H x W x C to a PIL Image while adjusting the value range depending on the mode.

Parameters:

mode (PIL.Image mode) –

color space and pixel depth of input data (optional). If mode is None (default) there are some assumptions made about the input data:

  • If the input has 4 channels, the mode is assumed to be RGBA.

  • If the input has 3 channels, the mode is assumed to be RGB.

  • If the input has 2 channels, the mode is assumed to be LA.

  • If the input has 1 channel, the mode is determined by the data type (i.e int, float, short).

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(mode=None, tg=None)[source]#
__new__(**kwargs)#
call_bbox(img, bbox)#
call_coor(img, coor)#
call_feed_dict(feed_dict)#
call_image(img)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class ToTensor

class ToTensor[source]#

Bases: TransformFunctionBase

Convert a PIL Image or ndarray to tensor and scale the values accordingly.

This transform does not support torchscript.

Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8

In the other cases, tensors are returned without scaling.

Note

Because the input image is scaled to [0.0, 1.0], this transformation should not be used when transforming target image masks. See the references for implementing the transforms for image masks.

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(tg=None)#
__new__(**kwargs)#
call_bbox(img, bbox)[source]#
call_coor(img, coor)[source]#
call_feed_dict(feed_dict)#
call_image(img)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class TransformBase

class TransformBase[source]#

Bases: object

__call__(feed_dict=None, **kwargs)[source]#

Call self as a function.

__init__(tg=None)[source]#
__new__(**kwargs)#
call_feed_dict(feed_dict)[source]#
ezcall(image=None, coor=None, bbox=None)[source]#

Class TransformDataTypes

class TransformDataTypes[source]#

Bases: JacEnum

__new__(value)#
classmethod assert_valid(value)#

Assert if the value is a valid choice.

classmethod choice_names()#

Returns the list of the name of all possible choices.

classmethod choice_objs()#

Returns the list of the object of all possible choices.

classmethod choice_values()#

Returns the list of the value of all possible choices.

classmethod from_string(value)#
Parameters:

value (str | JacEnum)

Return type:

JacEnum

classmethod is_valid(value)#

Check if the value is a valid choice.

classmethod type_name()#

Return the type name of the enum.

BBOX = 'bbox'#
COOR = 'coor'#
IMAGE = 'image'#

Class TransformFunctionBase

class TransformFunctionBase[source]#

Bases: TransformBase

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(tg=None)#
__new__(**kwargs)#
call_bbox(img, bbox)[source]#
call_coor(img, coor)[source]#
call_feed_dict(feed_dict)[source]#
call_image(img)[source]#
ezcall(image=None, coor=None, bbox=None)#

Class TransformFunctionBaseImageOnly

class TransformFunctionBaseImageOnly[source]#

Bases: TransformFunctionBase

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(tg=None)#
__new__(**kwargs)#
call_bbox(img, bbox)[source]#
call_coor(img, coor)[source]#
call_feed_dict(feed_dict)#
call_image(img)#
ezcall(image=None, coor=None, bbox=None)#

Class TransformGuide

class TransformGuide[source]#

Bases: object

__init__(transform_guide)[source]#
__new__(**kwargs)#
as_default()[source]#
gen(feed_dict)[source]#
items()[source]#
keys()[source]#

Class VFlip

class VFlip[source]#

Bases: TransformFunctionBase

__call__(feed_dict=None, **kwargs)#

Call self as a function.

__init__(tg=None)#
__new__(**kwargs)#
call_bbox(img, bbox)[source]#
call_coor(img, coor)[source]#
call_feed_dict(feed_dict)#
call_image(img)[source]#
ezcall(image=None, coor=None, bbox=None)#