- #1
- 7,122
- 10,773
- TL;DR Summary
- Python code is producing data in 1D arrays instead of the 2D arrays I need for linear regression.
Python:
import matplotlibimport matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
import pandas as pd
# Load CSV and columns
df = pd.read_csv("C:\Housing.csv")
Y = df['price']
X = df['lotsize']
# Split the data into training/testing sets
X_train = X[:-250]
X_test = X[-250:]
# Split the targets into training/testing sets
Y_train = Y[:-250]
Y_test = Y[-250:]
# Plot outputs
plt.scatter(X_test, Y_test, color='black')
plt.title('Test Data')
plt.xlabel('Size')
plt.ylabel('Price')
plt.xticks(())
plt.yticks(())
plt.show()
Python:
regr = linear_model.LinearRegression()
X_train = X[:-250]
X_test = X[-250:]
# Split the targets into training/testing sets
Y_train = Y[:-250]
Y_test = Y[-250:]# Train the model using the training sets
regr.fit(X_train, Y_train)
X_test.reshape(-1,1)
Y_test.reshape(-1,1)
ValueError: Expected 2D array, got 1D array instead:
array=[ 5850. 4000. 3060. 6650. 6360. 4160. 3880. 4160. 4800. 5500.
7200. 3000. 1700. 2880. 3600. 3185. 3300. 5200. 3450. 3986.
4785. 4510. 4000. 3934. 4960. 3000. 3800. 4960. 3000. 4500.
3500. 3500. 4000. 4500. 6360. 4500. 4032. 5170. 5400. 3150.
3745. 4520. 4640. 8580. 2000. 2160. 3040. 3090. 4960. 3350.
5300. 4100. 9166. 4040. 3630. 3620. 2400. 7260. 4400. 2400.
4120. 4750. 4280. 4820. 5500. 5500. 5040. 6000. 2500. 4095.
4095. 3150. 1836. 2475. 3210. 3180. 1650. 3180. 3180. 6360.
4240. 3240. 3650. 3240. 3780. 6480. 5850. 3150. 3000. 3090.
6060. 5900. 7420. 8500. 8050. 6800. 8250. 8250. 3500. 2835.
4500. 3300. 4320. 3500. 4992. 4600. 3720. 3680. 3000. 3750.
5076. 4500. 5000. 4260. 6540. 3700. 3760. 4000. 4300. 6840.
4400. 10500. 4400. 4840. 4120. 4260. 5960. 8800. 4560. 4600.
4840. 3850. 4900. 3850. 3760. 6000. 4370. 7700. 2990. 3750.
3000. 2650. 4500. 4500. 4500. 4500. 2175. 4500. 4800. 4600.
3450. 3000. 3600. 3600. 3750. 2610. 2953. 2747. 1905. 3968.
3162. 6000. 2910. 2135. 3120. 4075. 3410. 2800. 2684. 3100.
3630. 1950. 2430. 4320. 3036. 3630. 5400. 3420. 3180. 3660.
4410. 3990. 4340. 3510. 3420. 3420. 5495. 3480. 7424. 3460.
3630. 3630. 3480. 3460. 3180. 3635. 3960. 4350. 3930. 3570.
3600. 2520. 3480. 3180. 3290. 4000. 2325. 4350. 3540. 3960.
2640. 2700. 2700. 3180. 3500. 3630. 6000. 3150. 3792. 3510.
3120. 3000. 4200. 2817. 3240. 2800. 3816. 3185. 6321. 3650.
4700. 6615. 3850. 3970. 3000. 4352. 3630. 3600. 3000. 3000.
2787. 3000. 4770. 3649. 3970. 2910. 3480. 6615. 3500. 3450.
3450. 3520. 6930. 4600. 4360. 3450. 4410. 4600. 3640. 6000.
5400. 3640. 3640. 4040. 3640. 3640. 5640. 3600. 3600. 4632.
3640. 4900. 4510. 4100. 3640. 5680. 6300. 4000. 3960. 5960.
5830. 4500. 4100. 6750. 9000. 2550. 7152. 6450. 3360. 3264.
4000. 4000. 3069. 4040. 4040. 3185. 5900. 3120. 5450. 4040.
4080. 8080. 4040. 4080. 5800. 5885. 9667. 3420. 5800. 7600.
5400. 4995. 3000. 5500. 6450. 6210. 5000. 5000. 5828. 5200.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
# Plot outputs
plt.plot(X_test, regr.predict(X_test))