Link

Mockup / Illustrative

Descriptive Statistics - Housing Data

Preview of Dataset

Available at https://raw.githubusercontent.com/ageron/handson-ml2/master/datasets/housing/housing.tgz

Preview of First 10 Rows

Full dataset has 20640 rows, 20640 columns
longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_valueocean_proximity
0-122.2337.8841.0880.0129.0322.0126.08.3252452600.0NEAR BAY
1-122.2237.8621.07099.01106.02401.01138.08.3014358500.0NEAR BAY
2-122.2437.8552.01467.0190.0496.0177.07.2574352100.0NEAR BAY
3-122.2537.8552.01274.0235.0558.0219.05.6431341300.0NEAR BAY
4-122.2537.8552.01627.0280.0565.0259.03.8462342200.0NEAR BAY
5-122.2537.8552.0919.0213.0413.0193.04.0368269700.0NEAR BAY
6-122.2537.8452.02535.0489.01094.0514.03.6591299200.0NEAR BAY
7-122.2537.8452.03104.0687.01157.0647.03.1200241400.0NEAR BAY
8-122.2637.8442.02555.0665.01206.0595.02.0804226700.0NEAR BAY
9-122.2537.8452.03549.0707.01551.0714.03.6912261100.0NEAR BAY
longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_value
count20640.00000020640.00000020640.00000020640.00000020433.00000020640.00000020640.00000020640.00000020640.000000
mean-119.56970435.63186128.6394862635.763081537.8705531425.476744499.5396803.870671206855.816909
std2.0035322.13595212.5855582181.615252421.3850701132.462122382.3297531.899822115395.615874
min-124.35000032.5400001.0000002.0000001.0000003.0000001.0000000.49990014999.000000
25%-121.80000033.93000018.0000001447.750000296.000000787.000000280.0000002.563400119600.000000
50%-118.49000034.26000029.0000002127.000000435.0000001166.000000409.0000003.534800179700.000000
75%-118.01000037.71000037.0000003148.000000647.0000001725.000000605.0000004.743250264725.000000
max-114.31000041.95000052.00000039320.0000006445.00000035682.0000006082.00000015.000100500001.000000
<1H OCEAN     9136
INLAND        6551
NEAR OCEAN    2658
NEAR BAY      2290
ISLAND           5
Name: ocean_proximity, dtype: int64



Saving figure attribute_histogram_plots

png

The implementation of test_set_check() above works fine in both Python 2 and Python 3. In earlier releases, the following implementation was proposed, which supported any hash function, but was much slower and did not support Python 2:

If you want an implementation that supports any hash function and is compatible with both Python 2 and Python 3, here is one:

Median Income

<matplotlib.axes._subplots.AxesSubplot at 0x133e082b0>

png

Geographic Distribution

png

Correlation with median_house_value

median_house_value    1.000000
median_income         0.687160
total_rooms           0.135097
housing_median_age    0.114110
households            0.064506
total_bedrooms        0.047689
population           -0.026920
longitude            -0.047432
latitude             -0.142724
Name: median_house_value, dtype: float64

Scatterplot Matrix

Saving figure scatter_matrix_plot

png

Median House Value vs. Median Income

png

Scatterplot Rooms Per Household vs. Median House Value

png