Image Features Extraction Package¶

package doc: https://rempic.github.io/Image-Features-Extraction/¶

Tutorial¶

This Python package allows the fast extraction and classification of features from a set of images. The resulting data frame can be used as training and testing set for machine learning classifier.

This package was originally developed to extract measurements of single cell nuclei from microscopy images (see figure above). The package can be used to extract features from any set of images for a variety of applications. Below it is shown a map of Boston used for city density and demographic models.

In [1]:

% matplotlib inline
import matplotlib.pyplot as plt

import image_features_extraction.Images as fe


IMGS = fe.Images('../images/REMI')

for i in IMGS:
    print(i.file_name())

../images/REMI/REMI_8BIT.tif
../images/REMI/REMI_POINTS.tif

/Users/remi/anaconda/envs/remi_insight/lib/python3.5/site-packages/skimage/filters/thresholding.py:271: UserWarning: threshold_otsu is expected to work correctly only for grayscale images; image shape (772, 772, 3) looks like an RGB image
  warn(msg.format(image.shape))

../images/REMI/REMI_RGB.tif

In [2]:

IMG_POINTS = IMGS.item(1)
IMG_8BIT = IMGS.item(0)

#IMG_POINTS.set_image_intensity(IMG_8BIT)

fig, ax = plt.subplots(figsize=(20, 20))
ax.imshow(IMG_POINTS.get_image_segmentation(overlap_image=IMG_8BIT))

Out[2]:

In [3]:

vor = IMG_POINTS.Voronoi()

fig = plt.figure(figsize=(20,20))

plt.imshow(vor.get_voronoi_map(), cmap=plt.get_cmap('pink'))

Out[3]:

In [7]:

IMG_MIX = IMG_8BIT.get_image()/7 + vor.get_voronoi_map() + IMG_POINTS.get_image()/30

fig = plt.figure(figsize=(20,20))

plt.imshow(IMG_MIX, cmap=plt.get_cmap('pink'))

Out[7]:

In [10]:

features = vor.features(['area','perimeter','centroid','minor_axis_length','major_axis_length', 'eccentricity','extent','bbox_area', 'convex_area', 'equivalent_diameter', 'euler_number', 'orientation','solidity'], prefix='remi_',)

features.set_class_name('class')
features.set_class_value('REMI')

 
df = features.get_dataframe(include_class=True)
df

Out[10]:

	id	remi_area	remi_perimeter	remi_centroid	remi_minor_axis_length	remi_major_axis_length	remi_eccentricity	remi_extent	remi_bbox_area	remi_convex_area	remi_equivalent_diameter	remi_euler_number	remi_orientation	remi_solidity	class
0	7	5017	315.320851	(102.314929241, 321.456049432)	53.285483	124.467392	0.903728	0.574027	595984	5135	79.923981	1	-1.170126	0.977020	REMI
1	8	4559	319.404112	(106.25948673, 458.345909191)	44.698597	134.193471	0.942895	0.453091	595984	4669	76.188576	1	1.066200	0.976440	REMI
2	6	3369	268.024387	(106.202137133, 416.063817156)	47.031187	99.470718	0.881162	0.597129	595984	3472	65.494611	1	1.191270	0.970334	REMI
3	9	4404	300.859956	(118.32493188, 279.348546776)	54.456838	113.061069	0.876359	0.448930	595984	4508	74.882221	1	-1.000866	0.976930	REMI
4	5	3382	249.012193	(110.14163217, 371.157303371)	50.513739	90.240628	0.828650	0.783596	595984	3481	65.620851	1	-1.438716	0.971560	REMI
5	10	4965	311.629509	(130.288016113, 494.767371601)	56.710644	118.580001	0.878225	0.442869	595984	5061	79.508706	1	0.913723	0.981031	REMI
6	11	3687	284.516811	(146.102522376, 250.030105777)	39.545521	123.602267	0.947437	0.402863	595984	3778	68.515941	1	-0.747492	0.975913	REMI
7	17	6345	368.351334	(160.676280536, 542.002679275)	63.933439	137.235393	0.884855	0.460617	595984	6465	89.881616	1	0.468657	0.981439	REMI
8	18	6400	364.617316	(164.06765625, 202.62140625)	59.199771	145.479623	0.913460	0.491212	595984	6507	90.270333	1	-0.537414	0.983556	REMI
9	12	2161	185.597980	(178.270245257, 365.258676539)	45.425786	62.659068	0.688783	0.832435	595984	2224	52.454463	1	-1.427274	0.971673	REMI
10	22	6630	382.048773	(203.160784314, 569.146757164)	57.310172	154.898618	0.929038	0.523201	595984	6767	91.878061	1	0.467105	0.979755	REMI
11	14	2135	185.254834	(179.778922717, 404.433255269)	46.410827	60.880669	0.647195	0.804143	595984	2196	52.137956	1	1.467360	0.972222	REMI
12	23	6146	378.184812	(209.301008786, 185.169053043)	57.428309	148.882027	0.922612	0.515647	595984	6297	88.460897	1	-0.354007	0.976020	REMI
13	15	2214	184.610173	(187.255194219, 323.86269196)	50.707903	56.819696	0.451175	0.719298	595984	2278	53.093807	1	-1.351082	0.971905	REMI
14	16	1980	172.610173	(190.641414141, 446.88030303)	49.668804	52.223298	0.308929	0.718954	595984	2030	50.209703	1	-0.752913	0.975369	REMI
15	19	1590	155.053824	(208.147798742, 280.552201258)	40.259068	51.163389	0.617116	0.739535	595984	1632	44.993898	1	0.677179	0.974265	REMI
16	20	1411	151.539105	(216.309000709, 483.535081502)	40.169383	45.745391	0.478463	0.652636	595984	1463	42.385623	1	-0.712045	0.964457	REMI
17	24	2433	190.811183	(231.675709001, 360.057131114)	53.709472	58.970166	0.412869	0.806698	595984	2500	55.657810	1	1.012723	0.973200	REMI
18	25	2171	180.740115	(231.05481345, 407.736066329)	48.309506	58.782339	0.569723	0.807664	595984	2216	52.575689	1	1.359703	0.979693	REMI
19	26	1883	171.296465	(235.445034519, 314.212426978)	46.990257	52.074882	0.430984	0.726466	595984	1939	48.964375	1	1.536153	0.971119	REMI
20	32	7803	386.894444	(260.585672177, 585.842240164)	78.835101	134.726483	0.810926	0.687368	595984	7959	99.674912	1	0.162532	0.980400	REMI
21	27	1816	168.710678	(239.348017621, 446.533039648)	42.460700	55.420353	0.642653	0.715524	595984	1869	48.085372	1	1.312223	0.971643	REMI
22	30	4734	331.752309	(252.136459654, 171.511406844)	45.937047	137.824876	0.942821	0.633990	595984	4870	77.637079	1	-0.131222	0.972074	REMI
23	28	1510	155.154329	(248.775496689, 275.822516556)	41.397597	47.798598	0.499899	0.733722	595984	1552	43.847368	1	-0.621955	0.972938	REMI
24	29	1423	151.752309	(254.441321152, 484.28601546)	41.794962	45.173235	0.379443	0.593659	595984	1469	42.565477	1	0.502687	0.968686	REMI
25	31	983	128.811183	(266.14242116, 245.489318413)	32.004738	41.549062	0.637696	0.632561	595984	1027	35.377881	1	1.210774	0.957157	REMI
26	33	1264	151.740115	(276.652689873, 508.600474684)	37.326609	48.542685	0.639316	0.642276	595984	1315	40.117014	1	1.353374	0.961217	REMI
27	34	1638	162.325902	(271.571428571, 375.664224664)	40.268472	53.803813	0.663212	0.746922	595984	1693	45.668002	1	0.113053	0.967513	REMI
28	35	1817	164.568542	(277.433681893, 327.026417171)	46.356782	50.783251	0.408326	0.788628	595984	1861	48.098610	1	0.369776	0.976357	REMI
29	36	1831	165.740115	(281.099945385, 419.177498635)	48.264502	48.954304	0.167281	0.795050	595984	1889	48.283554	1	1.528857	0.969296	REMI
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
129	136	359	74.870058	(523.944289694, 390.690807799)	16.844737	27.702384	0.793890	0.754202	595984	372	21.379733	1	0.157310	0.965054	REMI
130	139	487	90.882251	(530.780287474, 429.106776181)	23.271800	28.402330	0.573276	0.507292	595984	513	24.901158	1	-1.073237	0.949318	REMI
131	141	644	104.911688	(540.352484472, 412.729813665)	22.905307	37.793386	0.795414	0.609848	595984	677	28.635053	1	-0.862601	0.951256	REMI
132	142	582	97.355339	(537.295532646, 475.283505155)	25.744282	30.349715	0.529589	0.692857	595984	609	27.221782	1	1.058730	0.955665	REMI
133	147	875	121.740115	(544.010285714, 450.56)	33.801741	36.005557	0.344484	0.639620	595984	916	33.377906	1	0.385109	0.955240	REMI
134	144	678	102.183766	(540.315634218, 300.396755162)	27.494940	32.855965	0.547459	0.730603	595984	704	29.381225	1	0.142922	0.963068	REMI
135	140	581	94.568542	(540.944922547, 345.611015491)	25.002068	30.660996	0.578848	0.624731	595984	610	27.198386	1	0.823560	0.952459	REMI
136	143	438	81.698485	(540.116438356, 277.296803653)	22.089257	26.813238	0.566853	0.730000	595984	455	23.615226	1	-1.039348	0.962637	REMI
137	146	450	81.213203	(542.34, 367.995555556)	23.410854	25.129394	0.363453	0.815217	595984	467	23.936537	1	0.653222	0.963597	REMI
138	145	425	79.556349	(541.68, 390.063529412)	20.958509	26.582574	0.615124	0.839921	595984	436	23.262132	1	-0.666803	0.974771	REMI
139	148	4947	361.989899	(577.488578937, 557.063068526)	43.870035	155.188920	0.959212	0.407362	595984	5078	79.364451	1	-0.493103	0.974202	REMI
140	150	8062	383.504617	(588.337013148, 204.868518978)	75.787732	141.853993	0.845317	0.600924	595984	8190	101.315632	1	0.578247	0.984371	REMI
141	153	925	116.325902	(561.92972973, 327.732972973)	31.438696	38.826925	0.586825	0.741186	595984	950	34.318313	1	-1.333271	0.973684	REMI
142	151	476	85.698485	(561.949579832, 383.550420168)	22.756774	27.593644	0.565555	0.734568	595984	498	24.618327	1	1.525077	0.955823	REMI
143	155	531	94.627417	(563.645951036, 304.293785311)	24.630995	29.314437	0.542224	0.732414	595984	563	26.001735	1	-1.078984	0.943162	REMI
144	152	4206	333.801082	(599.39063243, 526.529957204)	40.351119	144.574651	0.960261	0.344134	595984	4295	73.179543	1	-0.659752	0.979278	REMI
145	154	652	101.497475	(564.855828221, 361.171779141)	24.650229	34.863496	0.707164	0.657258	595984	684	28.812362	1	-0.641924	0.953216	REMI
146	156	630	104.811183	(565.128571429, 405.015873016)	23.809112	35.848631	0.747593	0.561497	595984	663	28.322092	1	0.929507	0.950226	REMI
147	158	1748	264.717821	(591.360411899, 257.957665904)	20.387373	123.303669	0.986236	0.220791	595984	1821	47.176506	1	0.813957	0.959912	REMI
148	160	765	114.604076	(576.938562092, 421.047058824)	25.557045	40.856473	0.780198	0.597656	595984	800	31.209426	1	1.062704	0.956250	REMI
149	159	690	111.254834	(568.279710145, 445.471014493)	29.968076	32.520761	0.388364	0.616071	595984	722	29.640096	1	-0.384336	0.955679	REMI
150	161	4103	306.801082	(620.462344626, 496.452839386)	43.492688	128.705341	0.941173	0.377183	595984	4190	72.277949	1	-0.878144	0.979236	REMI
151	163	740	109.154329	(584.267567568, 350.686486486)	25.286671	38.462705	0.753513	0.645161	595984	768	30.695232	1	-0.362611	0.963542	REMI
152	162	4002	304.339141	(621.774362819, 266.011244378)	47.989397	117.702244	0.913108	0.404733	595984	4103	71.382804	1	0.819045	0.975384	REMI
153	164	899	115.840620	(589.906562848, 385.59621802)	29.419182	39.861597	0.674764	0.775000	595984	933	33.832563	1	-0.071340	0.963558	REMI
154	165	3984	296.149278	(637.5562249, 303.424949799)	44.924160	120.539852	0.927955	0.503030	595984	4091	71.222092	1	1.173731	0.973845	REMI
155	166	3948	316.859956	(641.641084093, 467.087639311)	39.489102	135.854186	0.956823	0.379397	595984	4001	70.899575	1	-1.050165	0.986753	REMI
156	168	4210	321.800036	(649.627553444, 431.752969121)	47.394800	123.897039	0.923942	0.469866	595984	4338	73.214332	1	-1.104397	0.970493	REMI
157	167	2800	253.403066	(637.225714286, 340.142857143)	37.354790	100.122458	0.927795	0.555556	595984	2904	59.708213	1	1.263229	0.964187	REMI
158	169	3728	260.568542	(644.581813305, 381.561963519)	59.919822	85.869324	0.716290	0.725998	595984	3831	68.895842	1	1.562278	0.973114	REMI

159 rows × 15 columns

In [108]:

import numpy as np

np.log(df.mean()).plot(kind='barh', figsize=(10,10))

Out[108]:

In [22]:

i1

Out[22]:

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ..., 
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)

Features extraction for spatial classification of images¶

The image below shows a possible workflow for image feature extraction: two sets of images with different classification labels are used to produce two data sets for training and testing a classifier

An example of Collection-object and Iterator implementation¶

The object 'Image' includes the function Voronoi(), which returns the object Voronoi of my package Voronoi_Features. The Voronoi object can be used to measure the voronoi tassels of each image regions. It includes >30 measurements. Below an example of voronoi diagrams from the image shown above

Image features extraction for city density and demographic analysis modelling¶

Create the Images root object and laod the images contained in the folder

In [10]:

% matplotlib inline
import matplotlib.pyplot as plt

import image_features_extraction.Images as fe


IMGS = fe.Images('../images/CITY')

IMG = IMGS.item(0)


print(IMG.file_name())


fig, ax = plt.subplots(figsize=(20, 20))

ax.imshow(IMGS.item(0).get_image_segmentation())

../images/CITY/Boston_Center.tif

Out[10]:

In [7]:

features = IMG.features(['label', 'area','perimeter', 'centroid', 'moments'])

df2 = features.get_dataframe()

df2.head()

Out[7]:

	id	label	area	perimeter	centroid_x	centroid_y	moments
0	0	44	4	4.000000	2.500000	122.500000	[[4.0, 2.0, 2.0, 2.0], [2.0, 1.0, 1.0, 1.0], [...
1	1	45	6	5.207107	4.333333	3.833333	[[6.0, 8.0, 14.0, 26.0], [5.0, 8.0, 14.0, 26.0...
2	2	46	64	36.556349	7.718750	34.015625	[[64.0, 302.0, 1862.0, 13058.0], [385.0, 1857....
3	3	47	29	23.520815	6.517241	146.689655	[[29.0, 102.0, 476.0, 2580.0], [78.0, 305.0, 1...
4	4	48	165	62.355339	10.121212	460.951515	[[165.0, 1175.0, 10225.0, 99551.0], [1807.0, 1...

In [8]:

# SHOW THE FOUND CENTROIDS

fig, ax = plt.subplots(figsize=(20, 20))

plt.plot(df2.centroid_x,df2.centroid_y,'.r' )

Out[8]:

[]

In [9]:

h = plt.hist(df2.area,100)

Image features extraction for cellular spatial analysis¶

Images show cell nuclei

In [ ]:

In [31]:

% matplotlib inline
import matplotlib.pyplot as plt

import image_features_extraction.Images as fe

    
IMGS = fe.Images('../images/CA/1')


# the iterator at work ...
for IMG in IMGS:
    print(IMG.file_name())

../images/CA/1/ORG_8bit.tif
../images/CA/1/ORG_bin.tif

In [32]:

fig, ax = plt.subplots(figsize=(20, 20))

ax.imshow(IMGS.item(1).get_image_segmentation())

Out[32]:

An example of measurement and visualization of a property, e.g., area¶

In [33]:

IMG = IMGS.item(1)


REGS = IMG.regions()


areas = REGS.prop_values('area')


plt.plot(areas)
plt.ylabel('region area (px^2)')

Out[33]:

In [22]:

h = plt.hist(df2.area,100)

VORONOI FEATURES¶

In [34]:

vor = IMG.Voronoi()

In [35]:

vor = IMG.Voronoi()
IMG_VOR = vor.get_voronoi_map()
fig = plt.figure(figsize=(20,20))
plt.imshow(IMG_VOR, cmap=plt.get_cmap('jet'))

Out[35]:

In [36]:

i1 = IMGS.item(0).get_image_segmentation()
i2 = vor.get_voronoi_map()

In [69]:

i3 = i1[:,:,0] + i2/1000
fig = plt.figure(figsize=(yinch,xinch))
plt.imshow(i3, cmap=plt.get_cmap('Reds'))

Out[69]:

Feature from the image only¶

In [15]:

features1 = IMG.features(['area','perimeter','centroid','bbox', 'eccentricity'])
features1.get_dataframe().head()

Out[15]:

	id	area	perimeter	centroid_x	centroid_y	bbox	eccentricity
0	0	4	4.000000	2.500000	122.500000	(2, 122, 4, 124)	0.000000
1	1	6	5.207107	4.333333	3.833333	(3, 3, 6, 6)	0.738294
2	2	64	36.556349	7.718750	34.015625	(3, 28, 14, 39)	0.410105
3	3	29	23.520815	6.517241	146.689655	(3, 144, 11, 151)	0.736301
4	4	165	62.355339	10.121212	460.951515	(3, 450, 19, 471)	0.718935

Features from the voronoi diagram only¶

In [14]:

features2 = vor.features(['area','perimeter','centroid','bbox', 'eccentricity'])
features2.get_dataframe().head()

Out[14]:

	id	voro_area	voro_perimeter	voro_centroid	voro_bbox	voro_eccentricity
0	24	314	71.112698	(13.9203821656, 407.257961783)	(2, 395, 25, 416)	0.502220
1	33	365	78.526912	(18.2, 481.273972603)	(2, 473, 32, 491)	0.861947
2	71	343	94.911688	(17.8717201166, 723.320699708)	(3, 706, 30, 740)	0.955651
3	32	161	50.662951	(15.7701863354, 450.565217391)	(5, 445, 24, 460)	0.738073
4	46	160	50.591883	(15.8625, 516.75)	(5, 511, 24, 524)	0.782348

Merge features from the image + the voronoi diagram¶

In [18]:

features3 = features1.merge(features2, how_in='inner')
features3.get_dataframe().head()

Out[18]:

	id	area	perimeter	centroid_x	centroid_y	bbox	eccentricity	voro_area	voro_perimeter	voro_centroid	voro_bbox	voro_eccentricity
0	8	147	95.041631	18.843537	151.149660	(5, 146, 34, 157)	0.967212	257	67.355339	(22.2762645914, 152.482490272)	(12, 143, 36, 162)	0.799861
1	15	485	279.260931	25.649485	170.092784	(8, 155, 40, 188)	0.618654	447	80.325902	(29.0604026846, 169.451901566)	(17, 157, 42, 185)	0.558628
2	17	114	69.562446	20.061404	747.701754	(8, 739, 33, 753)	0.960308	73	31.798990	(20.1369863014, 748.931506849)	(14, 744, 26, 754)	0.530465
3	18	106	48.556349	17.990566	119.075472	(9, 114, 28, 125)	0.810733	151	48.763456	(18.2185430464, 117.688741722)	(10, 109, 25, 124)	0.756768
4	21	2	0.000000	9.500000	395.000000	(9, 395, 11, 396)	1.000000	63	33.349242	(10.0158730159, 392.698412698)	(6, 387, 15, 400)	0.742086

Add class name and value¶

In [23]:

features3.set_class_name('class')
features3.set_class_value('test_class_val')

features3.get_dataframe(include_class=True).head()

Out[23]:

	id	area	perimeter	centroid_x	centroid_y	bbox	eccentricity	voro_area	voro_perimeter	voro_centroid	voro_bbox	voro_eccentricity	class
0	8	147	95.041631	18.843537	151.149660	(5, 146, 34, 157)	0.967212	257	67.355339	(22.2762645914, 152.482490272)	(12, 143, 36, 162)	0.799861	test_class_val
1	15	485	279.260931	25.649485	170.092784	(8, 155, 40, 188)	0.618654	447	80.325902	(29.0604026846, 169.451901566)	(17, 157, 42, 185)	0.558628	test_class_val
2	17	114	69.562446	20.061404	747.701754	(8, 739, 33, 753)	0.960308	73	31.798990	(20.1369863014, 748.931506849)	(14, 744, 26, 754)	0.530465	test_class_val
3	18	106	48.556349	17.990566	119.075472	(9, 114, 28, 125)	0.810733	151	48.763456	(18.2185430464, 117.688741722)	(10, 109, 25, 124)	0.756768	test_class_val
4	21	2	0.000000	9.500000	395.000000	(9, 395, 11, 396)	1.000000	63	33.349242	(10.0158730159, 392.698412698)	(6, 387, 15, 400)	0.742086	test_class_val

To measure intensity from image regions¶

The example below shows how to associate a grayscale image to a binary one for intensity measurement. The package uses intenally a very simple segmentation algorithm based on an Otsu Thresholding method for segmentation of binary images. The goal of the package in not to segment images but to measurement their segmented features. The corect way to use this package is by using as input pre-segmented binary images and if intensity measurement are needed you can assaciate the original grayscale image.

In [8]:

IMG = IMGS.item(1)

IMG.set_image_intensity(IMGS.item(0))

features = IMG.features(['label', 'area','perimeter', 'centroid', 'moments','mean_intensity'])

df = features.get_dataframe()

df.head()

Out[8]:

	id	label	area	perimeter	centroid_x	centroid_y	moments	mean_intensity
0	0	22	64	28.278175	5.468750	584.375000	[[64.0, 286.0, 1630.0, 10366.0], [280.0, 1223....	170.078125
1	1	23	86	33.556349	6.418605	621.546512	[[86.0, 466.0, 3268.0, 25726.0], [391.0, 2067....	139.127907
2	2	24	100	35.556349	5.720000	1290.330000	[[100.0, 472.0, 2988.0, 21442.0], [533.0, 2238...	99.360000
3	3	25	50	24.142136	5.600000	23.040000	[[50.0, 180.0, 846.0, 4458.0], [202.0, 699.0, ...	181.940000
4	4	26	80	31.556349	7.325000	99.462500	[[80.0, 426.0, 2894.0, 21846.0], [357.0, 1969....	157.675000

Plot area vs perimeter and area histogram¶

In [9]:

plt.plot(df.area, df.mean_intensity, '.b')
plt.xlabel('area')
plt.ylabel('mean_intensity')

Out[9]:

An example of how save measured features¶

This package includes the class Features for data managment layer, which is used to separate the business from the data layer and allow easy scalability of the data layer.

In [10]:

import image_features_extraction.Images as fe

    
IMGS = fe.Images('../images/EDGE')

storage_name = '../images/DB1.csv'
class_value = 1

for IMG in IMGS:
    print(IMG.file_name())
    
    REGS = IMG.regions()
    
    FEATURES = REGS.features(['area','perimeter', 'extent', 'equivalent_diameter', 'eccentricity'], class_value=class_value)
    
    FEATURES.save(storage_name, type_storage='file', do_append=True)

../images/EDGE/ca_1.tif
../images/EDGE/ca_2.tif
../images/EDGE/ca_3.tif

Pytest: Units test¶

In [11]:

!py.test

============================= test session starts ==============================
platform darwin -- Python 3.5.3, pytest-3.1.3, py-1.4.34, pluggy-0.4.0
rootdir: /Users/remi/Google Drive/INSIGHT PRJ/PRJ/Image-Features-Extraction, inifile:
collected 0 items 

========================= no tests ran in 0.01 seconds =========================

In [ ]: