Image segmentation, Semantic segmentation, Instance segmentation, How to segment images in computer vision
IMAGE SEGMENTATION
Check out the first part of the computer vision blog post, Welcome to
Part 2!
Segmentation is an image processing technique used in scientific image analysis. It provides information about various regions of interest in an image. Image segmentation is used to organize the data within visual images or videos into meaningful categories. These processes of segmentation provide pixel-perfect accuracy of an image.
What
is an image segmentation task? So when we have some image that is represented
as an array of numbers we want a computer to understand what's on this image. So
the first problem is the classification the image net competition is for classification problem. But what if we want to find where the object is so for
example if we have a surveillance camera and we want a computer to recognize
where the person is walking and where are all the cars so this is a much more
complicated problem. For classification, you need to predict only one
number the label or the cross. And for detection problems when you are trying to
find out where the object is you are trying to predict one number of a class
and phone numbers for a box around this object. And the situation becomes even
more complex when you have several objects. So you want to detect every object
and the most complicated version of this problem is image segmentation. So this
problem is to detect class for every pixel on your image so it is the
conversion from an image from an array of numbers to another array of numbers. So
like another image where every pixel denotes some class.
Types of Image segmentation
1.
Semantic segmentation
2. Instance segmentation
Semantic Segmentation
Semantic segmentation is a type of image segmentation where all pixels corresponding to a class are given the same pixel values. Semantic segmentation is different from all the three algorithms like image classification, object detection, and object tracking. Semantic Segmentation refers to the task of linking each pixel in the image with a class label. These labels could include many objects. It is typically an image classification at a pixel level.
In semantic segmentation, it labels every pixel in the image where the
object is located. Lots of pixels in the image are based on the resolution
and quality of the image so it detects and labels every pixel location. It
gives us a classification by detecting the lines or the edges of the object.
It gives us a better perspective of the output and prediction it is better
than object detection and tracking. It does not create the bounding boxes
but it takes the exact shape of the object. The semantic segmentation
detects edges all around in the image.
For example, pixels that belong to a box would be defined under the same
“Box” category.
It classifies every pixel in the image. So it detects the exact shape based
on the pixel and classifies the objects in their category and pixel
should be labeled. There is one limitation that we cannot count how many
objects are there in a single category of the image. To overcome this
limitation, we have instance segmentation.
Applications of semantic segmentation:
·
Autonomous driving (Brake lights, localizing pedestrians, other vehicles,
etc.)
·
GeoSensing – land usage (forests, crops, roads, buildings, etc.)
·
Facial segmentation
·
Precision Agriculture
·
Virtual fitting Rooms
·
Medical Imaging and Diagnostics (locating tumors, planning a surgery,
studying anatomy, Measure tissue volumes, Virtual surgery simulation,
etc.)
Instance Segmentation
Instance segmentation is another variation of image segmentation where all
pixels corresponding to each object share a unique pixel value.
Instance segmentation labels every pixel of the image. But in semantic
segmentation, it checks for the pixels that belong to the objects and
supposes for the one human category wherever it detects the human it will
only define the human label. It will label the pixel only for the human
category. But instance segmentation will label every object's pixels are
assigned to different unique id and different unique labels.
So, it will detect one human and give this object or this human only one label that's person one and the other person two. It will differentiate the objects based on the edges and also give the different labels and different unique IDs.
So, we can count every object in the image like Bottle, Cubes, and Cup
are in this image. We can differentiate that there are three cubes, one
bottle, and one cup. This is the difference between semantic segmentation
and instance segmentation.
For example, all pixels in this green cup share the same pixel value this
type of approach can be very useful in object parameters for each object in
your image.
Applications of Instance Segmentation:
·
U-Net is a convolution neural network developed for biomedical image
segmentation.
·
Mask Region-Based Convolution Neural Network.
How to segment images?
The approach you take for segmentation completely depends upon the
complexity of the image.
Segmenting low complex images
For low complexity of the image (single) can be separated from the
background by applying simple histogram-based thresholding. It can be
used to find that appropriate threshold value to separate the object from
the background.
Segmenting medium complex images
As the complexity of the image (1-10) increases, finding any machine
learning-based approaches to be more efficient. For similar pixel values
makes it very difficult for histogram-based thresholding to separate
efficiently extracting features. Traditional machine learning algorithms
such as random forest or support vector machines often yield
excellent results even with limited training data making them trainable on any workstation.
Segmenting high complex images
As the complexity of images (Hundreds) increases it is a very challenging task and cannot be achieved using traditional machine learning approaches. Deep learning has been proven very successful at segmenting challenging images. But deep learning requires hundreds or thousands of labeled images. It may take a long time to train a deep learning model. But once trained these models can be used in production mode to segment future images. U-Net is an architecture that arranges convolutional filters in a contraction path where the input image is progressively scaled-down. And an expansion path where the scaled-down information is upscaled back to the original image size.
Related Post: Top computer vision tools 2021


COMMENTS