First Post

Posted by Remi on Sun 24 September 2017

Image Features Extraction Package

package doc: https://rempic.github.io/Image-Features-Extraction/

Tutorial

This Python package allows the fast extraction and classification of features from a set of images. The resulting data frame can be used as training and testing set for machine learning classifier.

This package was originally developed to extract measurements of single cell nuclei from microscopy images (see figure above). The package can be used to extract features from any set of images for a variety of applications. Below it is shown a map of Boston used for city density and demographic models.

In [1]:
% matplotlib inline
import matplotlib.pyplot as plt

import image_features_extraction.Images as fe


IMGS = fe.Images('../images/REMI')

for i in IMGS:
    print(i.file_name())
../images/REMI/REMI_8BIT.tif
../images/REMI/REMI_POINTS.tif
/Users/remi/anaconda/envs/remi_insight/lib/python3.5/site-packages/skimage/filters/thresholding.py:271: UserWarning: threshold_otsu is expected to work correctly only for grayscale images; image shape (772, 772, 3) looks like an RGB image
  warn(msg.format(image.shape))
../images/REMI/REMI_RGB.tif
In [2]:
IMG_POINTS = IMGS.item(1)
IMG_8BIT = IMGS.item(0)

#IMG_POINTS.set_image_intensity(IMG_8BIT)

fig, ax = plt.subplots(figsize=(20, 20))
ax.imshow(IMG_POINTS.get_image_segmentation(overlap_image=IMG_8BIT))
Out[2]:
In [3]:
vor = IMG_POINTS.Voronoi()

fig = plt.figure(figsize=(20,20))

plt.imshow(vor.get_voronoi_map(), cmap=plt.get_cmap('pink'))
Out[3]:
In [7]:
IMG_MIX = IMG_8BIT.get_image()/7 + vor.get_voronoi_map() + IMG_POINTS.get_image()/30

fig = plt.figure(figsize=(20,20))

plt.imshow(IMG_MIX, cmap=plt.get_cmap('pink'))
Out[7]:
In [10]:
features = vor.features(['area','perimeter','centroid','minor_axis_length','major_axis_length', 'eccentricity','extent','bbox_area', 'convex_area', 'equivalent_diameter', 'euler_number', 'orientation','solidity'], prefix='remi_',)

features.set_class_name('class')
features.set_class_value('REMI')

 
df = features.get_dataframe(include_class=True)
df
Out[10]:
id remi_area remi_perimeter remi_centroid remi_minor_axis_length remi_major_axis_length remi_eccentricity remi_extent remi_bbox_area remi_convex_area remi_equivalent_diameter remi_euler_number remi_orientation remi_solidity class
0 7 5017 315.320851 (102.314929241, 321.456049432) 53.285483 124.467392 0.903728 0.574027 595984 5135 79.923981 1 -1.170126 0.977020 REMI
1 8 4559 319.404112 (106.25948673, 458.345909191) 44.698597 134.193471 0.942895 0.453091 595984 4669 76.188576 1 1.066200 0.976440 REMI
2 6 3369 268.024387 (106.202137133, 416.063817156) 47.031187 99.470718 0.881162 0.597129 595984 3472 65.494611 1 1.191270 0.970334 REMI
3 9 4404 300.859956 (118.32493188, 279.348546776) 54.456838 113.061069 0.876359 0.448930 595984 4508 74.882221 1 -1.000866 0.976930 REMI
4 5 3382 249.012193 (110.14163217, 371.157303371) 50.513739 90.240628 0.828650 0.783596 595984 3481 65.620851 1 -1.438716 0.971560 REMI
5 10 4965 311.629509 (130.288016113, 494.767371601) 56.710644 118.580001 0.878225 0.442869 595984 5061 79.508706 1 0.913723 0.981031 REMI
6 11 3687 284.516811 (146.102522376, 250.030105777) 39.545521 123.602267 0.947437 0.402863 595984 3778 68.515941 1 -0.747492 0.975913 REMI
7 17 6345 368.351334 (160.676280536, 542.002679275) 63.933439 137.235393 0.884855 0.460617 595984 6465 89.881616 1 0.468657 0.981439 REMI
8 18 6400 364.617316 (164.06765625, 202.62140625) 59.199771 145.479623 0.913460 0.491212 595984 6507 90.270333 1 -0.537414 0.983556 REMI
9 12 2161 185.597980 (178.270245257, 365.258676539) 45.425786 62.659068 0.688783 0.832435 595984 2224 52.454463 1 -1.427274 0.971673 REMI
10 22 6630 382.048773 (203.160784314, 569.146757164) 57.310172 154.898618 0.929038 0.523201 595984 6767 91.878061 1 0.467105 0.979755 REMI
11 14 2135 185.254834 (179.778922717, 404.433255269) 46.410827 60.880669 0.647195 0.804143 595984 2196 52.137956 1 1.467360 0.972222 REMI
12 23 6146 378.184812 (209.301008786, 185.169053043) 57.428309 148.882027 0.922612 0.515647 595984 6297 88.460897 1 -0.354007 0.976020 REMI
13 15 2214 184.610173 (187.255194219, 323.86269196) 50.707903 56.819696 0.451175 0.719298 595984 2278 53.093807 1 -1.351082 0.971905 REMI
14 16 1980 172.610173 (190.641414141, 446.88030303) 49.668804 52.223298 0.308929 0.718954 595984 2030 50.209703 1 -0.752913 0.975369 REMI
15 19 1590 155.053824 (208.147798742, 280.552201258) 40.259068 51.163389 0.617116 0.739535 595984 1632 44.993898 1 0.677179 0.974265 REMI
16 20 1411 151.539105 (216.309000709, 483.535081502) 40.169383 45.745391 0.478463 0.652636 595984 1463 42.385623 1 -0.712045 0.964457 REMI
17 24 2433 190.811183 (231.675709001, 360.057131114) 53.709472 58.970166 0.412869 0.806698 595984 2500 55.657810 1 1.012723 0.973200 REMI
18 25 2171 180.740115 (231.05481345, 407.736066329) 48.309506 58.782339 0.569723 0.807664 595984 2216 52.575689 1 1.359703 0.979693 REMI
19 26 1883 171.296465 (235.445034519, 314.212426978) 46.990257 52.074882 0.430984 0.726466 595984 1939 48.964375 1 1.536153 0.971119 REMI
20 32 7803 386.894444 (260.585672177, 585.842240164) 78.835101 134.726483 0.810926 0.687368 595984 7959 99.674912 1 0.162532 0.980400 REMI
21 27 1816 168.710678 (239.348017621, 446.533039648) 42.460700 55.420353 0.642653 0.715524 595984 1869 48.085372 1 1.312223 0.971643 REMI
22 30 4734 331.752309 (252.136459654, 171.511406844) 45.937047 137.824876 0.942821 0.633990 595984 4870 77.637079 1 -0.131222 0.972074 REMI
23 28 1510 155.154329 (248.775496689, 275.822516556) 41.397597 47.798598 0.499899 0.733722 595984 1552 43.847368 1 -0.621955 0.972938 REMI
24 29 1423 151.752309 (254.441321152, 484.28601546) 41.794962 45.173235 0.379443 0.593659 595984 1469 42.565477 1 0.502687 0.968686 REMI
25 31 983 128.811183 (266.14242116, 245.489318413) 32.004738 41.549062 0.637696 0.632561 595984 1027 35.377881 1 1.210774 0.957157 REMI
26 33 1264 151.740115 (276.652689873, 508.600474684) 37.326609 48.542685 0.639316 0.642276 595984 1315 40.117014 1 1.353374 0.961217 REMI
27 34 1638 162.325902 (271.571428571, 375.664224664) 40.268472 53.803813 0.663212 0.746922 595984 1693 45.668002 1 0.113053 0.967513 REMI
28 35 1817 164.568542 (277.433681893, 327.026417171) 46.356782 50.783251 0.408326 0.788628 595984 1861 48.098610 1 0.369776 0.976357 REMI
29 36 1831 165.740115 (281.099945385, 419.177498635) 48.264502 48.954304 0.167281 0.795050 595984 1889 48.283554 1 1.528857 0.969296 REMI
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
129 136 359 74.870058 (523.944289694, 390.690807799) 16.844737 27.702384 0.793890 0.754202 595984 372 21.379733 1 0.157310 0.965054 REMI
130 139 487 90.882251 (530.780287474, 429.106776181) 23.271800 28.402330 0.573276 0.507292 595984 513 24.901158 1 -1.073237 0.949318 REMI
131 141 644 104.911688 (540.352484472, 412.729813665) 22.905307 37.793386 0.795414 0.609848 595984 677 28.635053 1 -0.862601 0.951256 REMI
132 142 582 97.355339 (537.295532646, 475.283505155) 25.744282 30.349715 0.529589 0.692857 595984 609 27.221782 1 1.058730 0.955665 REMI
133 147 875 121.740115 (544.010285714, 450.56) 33.801741 36.005557 0.344484 0.639620 595984 916 33.377906 1 0.385109 0.955240 REMI
134 144 678 102.183766 (540.315634218, 300.396755162) 27.494940 32.855965 0.547459 0.730603 595984 704 29.381225 1 0.142922 0.963068 REMI
135 140 581 94.568542 (540.944922547, 345.611015491) 25.002068 30.660996 0.578848 0.624731 595984 610 27.198386 1 0.823560 0.952459 REMI
136 143 438 81.698485 (540.116438356, 277.296803653) 22.089257 26.813238 0.566853 0.730000 595984 455 23.615226 1 -1.039348 0.962637 REMI
137 146 450 81.213203 (542.34, 367.995555556) 23.410854 25.129394 0.363453 0.815217 595984 467 23.936537 1 0.653222 0.963597 REMI
138 145 425 79.556349 (541.68, 390.063529412) 20.958509 26.582574 0.615124 0.839921 595984 436 23.262132 1 -0.666803 0.974771 REMI
139 148 4947 361.989899 (577.488578937, 557.063068526) 43.870035 155.188920 0.959212 0.407362 595984 5078 79.364451 1 -0.493103 0.974202 REMI
140 150 8062 383.504617 (588.337013148, 204.868518978) 75.787732 141.853993 0.845317 0.600924 595984 8190 101.315632 1 0.578247 0.984371 REMI
141 153 925 116.325902 (561.92972973, 327.732972973) 31.438696 38.826925 0.586825 0.741186 595984 950 34.318313 1 -1.333271 0.973684 REMI
142 151 476 85.698485 (561.949579832, 383.550420168) 22.756774 27.593644 0.565555 0.734568 595984 498 24.618327 1 1.525077 0.955823 REMI
143 155 531 94.627417 (563.645951036, 304.293785311) 24.630995 29.314437 0.542224 0.732414 595984 563 26.001735 1 -1.078984 0.943162 REMI
144 152 4206 333.801082 (599.39063243, 526.529957204) 40.351119 144.574651 0.960261 0.344134 595984 4295 73.179543 1 -0.659752 0.979278 REMI
145 154 652 101.497475 (564.855828221, 361.171779141) 24.650229 34.863496 0.707164 0.657258 595984 684 28.812362 1 -0.641924 0.953216 REMI
146 156 630 104.811183 (565.128571429, 405.015873016) 23.809112 35.848631 0.747593 0.561497 595984 663 28.322092 1 0.929507 0.950226 REMI
147 158 1748 264.717821 (591.360411899, 257.957665904) 20.387373 123.303669 0.986236 0.220791 595984 1821 47.176506 1 0.813957 0.959912 REMI
148 160 765 114.604076 (576.938562092, 421.047058824) 25.557045 40.856473 0.780198 0.597656 595984 800 31.209426 1 1.062704 0.956250 REMI
149 159 690 111.254834 (568.279710145, 445.471014493) 29.968076 32.520761 0.388364 0.616071 595984 722 29.640096 1 -0.384336 0.955679 REMI
150 161 4103 306.801082 (620.462344626, 496.452839386) 43.492688 128.705341 0.941173 0.377183 595984 4190 72.277949 1 -0.878144 0.979236 REMI
151 163 740 109.154329 (584.267567568, 350.686486486) 25.286671 38.462705 0.753513 0.645161 595984 768 30.695232 1 -0.362611 0.963542 REMI
152 162 4002 304.339141 (621.774362819, 266.011244378) 47.989397 117.702244 0.913108 0.404733 595984 4103 71.382804 1 0.819045 0.975384 REMI
153 164 899 115.840620 (589.906562848, 385.59621802) 29.419182 39.861597 0.674764 0.775000 595984 933 33.832563 1 -0.071340 0.963558 REMI
154 165 3984 296.149278 (637.5562249, 303.424949799) 44.924160 120.539852 0.927955 0.503030 595984 4091 71.222092 1 1.173731 0.973845 REMI
155 166 3948 316.859956 (641.641084093, 467.087639311) 39.489102 135.854186 0.956823 0.379397 595984 4001 70.899575 1 -1.050165 0.986753 REMI
156 168 4210 321.800036 (649.627553444, 431.752969121) 47.394800 123.897039 0.923942 0.469866 595984 4338 73.214332 1 -1.104397 0.970493 REMI
157 167 2800 253.403066 (637.225714286, 340.142857143) 37.354790 100.122458 0.927795 0.555556 595984 2904 59.708213 1 1.263229 0.964187 REMI
158 169 3728 260.568542 (644.581813305, 381.561963519) 59.919822 85.869324 0.716290 0.725998 595984 3831 68.895842 1 1.562278 0.973114 REMI

159 rows × 15 columns

In [108]:
import numpy as np

np.log(df.mean()).plot(kind='barh', figsize=(10,10))
Out[108]:
In [22]:
i1
Out[22]:
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ..., 
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)

Features extraction for spatial classification of images

The image below shows a possible workflow for image feature extraction: two sets of images with different classification labels are used to produce two data sets for training and testing a classifier

An example of Collection-object and Iterator implementation

The object 'Image' includes the function Voronoi(), which returns the object Voronoi of my package Voronoi_Features. The Voronoi object can be used to measure the voronoi tassels of each image regions. It includes >30 measurements. Below an example of voronoi diagrams from the image shown above

Image features extraction for city density and demographic analysis modelling

Create the Images root object and laod the images contained in the folder

In [10]:
% matplotlib inline
import matplotlib.pyplot as plt

import image_features_extraction.Images as fe


IMGS = fe.Images('../images/CITY')

IMG = IMGS.item(0)


print(IMG.file_name())


fig, ax = plt.subplots(figsize=(20, 20))

ax.imshow(IMGS.item(0).get_image_segmentation())
../images/CITY/Boston_Center.tif
Out[10]:
In [7]:
features = IMG.features(['label', 'area','perimeter', 'centroid', 'moments'])

df2 = features.get_dataframe()

df2.head()
Out[7]:
id label area perimeter centroid_x centroid_y moments
0 0 44 4 4.000000 2.500000 122.500000 [[4.0, 2.0, 2.0, 2.0], [2.0, 1.0, 1.0, 1.0], [...
1 1 45 6 5.207107 4.333333 3.833333 [[6.0, 8.0, 14.0, 26.0], [5.0, 8.0, 14.0, 26.0...
2 2 46 64 36.556349 7.718750 34.015625 [[64.0, 302.0, 1862.0, 13058.0], [385.0, 1857....
3 3 47 29 23.520815 6.517241 146.689655 [[29.0, 102.0, 476.0, 2580.0], [78.0, 305.0, 1...
4 4 48 165 62.355339 10.121212 460.951515 [[165.0, 1175.0, 10225.0, 99551.0], [1807.0, 1...
In [8]:
# SHOW THE FOUND CENTROIDS

fig, ax = plt.subplots(figsize=(20, 20))

plt.plot(df2.centroid_x,df2.centroid_y,'.r' )
Out[8]:
[]
In [9]:
h = plt.hist(df2.area,100)

Image features extraction for cellular spatial analysis

Images show cell nuclei

In [ ]:
 
In [31]:
% matplotlib inline
import matplotlib.pyplot as plt

import image_features_extraction.Images as fe

    
IMGS = fe.Images('../images/CA/1')


# the iterator at work ...
for IMG in IMGS:
    print(IMG.file_name())
    
../images/CA/1/ORG_8bit.tif
../images/CA/1/ORG_bin.tif
In [32]:
fig, ax = plt.subplots(figsize=(20, 20))

ax.imshow(IMGS.item(1).get_image_segmentation())
Out[32]:

An example of measurement and visualization of a property, e.g., area

In [33]:
IMG = IMGS.item(1)


REGS = IMG.regions()


areas = REGS.prop_values('area')


plt.plot(areas)
plt.ylabel('region area (px^2)')
Out[33]:
In [22]:
h = plt.hist(df2.area,100)

VORONOI FEATURES

In [34]:
vor = IMG.Voronoi()
In [35]:
vor = IMG.Voronoi()
IMG_VOR = vor.get_voronoi_map()
fig = plt.figure(figsize=(20,20))
plt.imshow(IMG_VOR, cmap=plt.get_cmap('jet'))
Out[35]:
In [36]:
i1 = IMGS.item(0).get_image_segmentation()
i2 = vor.get_voronoi_map()
In [69]:
i3 = i1[:,:,0] + i2/1000
fig = plt.figure(figsize=(yinch,xinch))
plt.imshow(i3, cmap=plt.get_cmap('Reds'))
Out[69]:

Feature from the image only

In [15]:
features1 = IMG.features(['area','perimeter','centroid','bbox', 'eccentricity'])
features1.get_dataframe().head()
Out[15]:
id area perimeter centroid_x centroid_y bbox eccentricity
0 0 4 4.000000 2.500000 122.500000 (2, 122, 4, 124) 0.000000
1 1 6 5.207107 4.333333 3.833333 (3, 3, 6, 6) 0.738294
2 2 64 36.556349 7.718750 34.015625 (3, 28, 14, 39) 0.410105
3 3 29 23.520815 6.517241 146.689655 (3, 144, 11, 151) 0.736301
4 4 165 62.355339 10.121212 460.951515 (3, 450, 19, 471) 0.718935

Features from the voronoi diagram only

In [14]:
features2 = vor.features(['area','perimeter','centroid','bbox', 'eccentricity'])
features2.get_dataframe().head()
Out[14]:
id voro_area voro_perimeter voro_centroid voro_bbox voro_eccentricity
0 24 314 71.112698 (13.9203821656, 407.257961783) (2, 395, 25, 416) 0.502220
1 33 365 78.526912 (18.2, 481.273972603) (2, 473, 32, 491) 0.861947
2 71 343 94.911688 (17.8717201166, 723.320699708) (3, 706, 30, 740) 0.955651
3 32 161 50.662951 (15.7701863354, 450.565217391) (5, 445, 24, 460) 0.738073
4 46 160 50.591883 (15.8625, 516.75) (5, 511, 24, 524) 0.782348

Merge features from the image + the voronoi diagram

In [18]:
features3 = features1.merge(features2, how_in='inner')
features3.get_dataframe().head()
Out[18]:
id area perimeter centroid_x centroid_y bbox eccentricity voro_area voro_perimeter voro_centroid voro_bbox voro_eccentricity
0 8 147 95.041631 18.843537 151.149660 (5, 146, 34, 157) 0.967212 257 67.355339 (22.2762645914, 152.482490272) (12, 143, 36, 162) 0.799861
1 15 485 279.260931 25.649485 170.092784 (8, 155, 40, 188) 0.618654 447 80.325902 (29.0604026846, 169.451901566) (17, 157, 42, 185) 0.558628
2 17 114 69.562446 20.061404 747.701754 (8, 739, 33, 753) 0.960308 73 31.798990 (20.1369863014, 748.931506849) (14, 744, 26, 754) 0.530465
3 18 106 48.556349 17.990566 119.075472 (9, 114, 28, 125) 0.810733 151 48.763456 (18.2185430464, 117.688741722) (10, 109, 25, 124) 0.756768
4 21 2 0.000000 9.500000 395.000000 (9, 395, 11, 396) 1.000000 63 33.349242 (10.0158730159, 392.698412698) (6, 387, 15, 400) 0.742086

Add class name and value

In [23]:
features3.set_class_name('class')
features3.set_class_value('test_class_val')

features3.get_dataframe(include_class=True).head()
Out[23]:
id area perimeter centroid_x centroid_y bbox eccentricity voro_area voro_perimeter voro_centroid voro_bbox voro_eccentricity class
0 8 147 95.041631 18.843537 151.149660 (5, 146, 34, 157) 0.967212 257 67.355339 (22.2762645914, 152.482490272) (12, 143, 36, 162) 0.799861 test_class_val
1 15 485 279.260931 25.649485 170.092784 (8, 155, 40, 188) 0.618654 447 80.325902 (29.0604026846, 169.451901566) (17, 157, 42, 185) 0.558628 test_class_val
2 17 114 69.562446 20.061404 747.701754 (8, 739, 33, 753) 0.960308 73 31.798990 (20.1369863014, 748.931506849) (14, 744, 26, 754) 0.530465 test_class_val
3 18 106 48.556349 17.990566 119.075472 (9, 114, 28, 125) 0.810733 151 48.763456 (18.2185430464, 117.688741722) (10, 109, 25, 124) 0.756768 test_class_val
4 21 2 0.000000 9.500000 395.000000 (9, 395, 11, 396) 1.000000 63 33.349242 (10.0158730159, 392.698412698) (6, 387, 15, 400) 0.742086 test_class_val

To measure intensity from image regions

The example below shows how to associate a grayscale image to a binary one for intensity measurement. The package uses intenally a very simple segmentation algorithm based on an Otsu Thresholding method for segmentation of binary images. The goal of the package in not to segment images but to measurement their segmented features. The corect way to use this package is by using as input pre-segmented binary images and if intensity measurement are needed you can assaciate the original grayscale image.

In [8]:
IMG = IMGS.item(1)

IMG.set_image_intensity(IMGS.item(0))

features = IMG.features(['label', 'area','perimeter', 'centroid', 'moments','mean_intensity'])

df = features.get_dataframe()

df.head()
Out[8]:
id label area perimeter centroid_x centroid_y moments mean_intensity
0 0 22 64 28.278175 5.468750 584.375000 [[64.0, 286.0, 1630.0, 10366.0], [280.0, 1223.... 170.078125
1 1 23 86 33.556349 6.418605 621.546512 [[86.0, 466.0, 3268.0, 25726.0], [391.0, 2067.... 139.127907
2 2 24 100 35.556349 5.720000 1290.330000 [[100.0, 472.0, 2988.0, 21442.0], [533.0, 2238... 99.360000
3 3 25 50 24.142136 5.600000 23.040000 [[50.0, 180.0, 846.0, 4458.0], [202.0, 699.0, ... 181.940000
4 4 26 80 31.556349 7.325000 99.462500 [[80.0, 426.0, 2894.0, 21846.0], [357.0, 1969.... 157.675000

Plot area vs perimeter and area histogram

In [9]:
plt.plot(df.area, df.mean_intensity, '.b')
plt.xlabel('area')
plt.ylabel('mean_intensity')
Out[9]:

An example of how save measured features

This package includes the class Features for data managment layer, which is used to separate the business from the data layer and allow easy scalability of the data layer.

In [10]:
import image_features_extraction.Images as fe

    
IMGS = fe.Images('../images/EDGE')

storage_name = '../images/DB1.csv'
class_value = 1

for IMG in IMGS:
    print(IMG.file_name())
    
    REGS = IMG.regions()
    
    FEATURES = REGS.features(['area','perimeter', 'extent', 'equivalent_diameter', 'eccentricity'], class_value=class_value)
    
    FEATURES.save(storage_name, type_storage='file', do_append=True)
    
    
    
../images/EDGE/ca_1.tif
../images/EDGE/ca_2.tif
../images/EDGE/ca_3.tif

Pytest: Units test

In [11]:
!py.test
============================= test session starts ==============================
platform darwin -- Python 3.5.3, pytest-3.1.3, py-1.4.34, pluggy-0.4.0
rootdir: /Users/remi/Google Drive/INSIGHT PRJ/PRJ/Image-Features-Extraction, inifile:
collected 0 items 

========================= no tests ran in 0.01 seconds =========================
In [ ]: