November 29, 2023

Box Plots:

  • Logan Passengers: The median number of passengers at Logan International Airport is between 2.75 and 3.25 million per month. There is a significant spread in the data, with some months having as few as 1.5 million passengers and others having as many as 5 million.
  • Logan Intl Flights: The median number of international flights at Logan International Airport is between 3,250 and 3,750 per month. The spread of the data is similar to that of the number of passengers, with some months having as few as 2,500 flights and others having as many as 5,250.
  • The number of passengers and international flights at Logan International Airport is relatively stable throughout the year, with only slight variations from month to month.
  • The data suggests that the tourism industry in the Boston area is relatively stable and predictable.

November 27, 2023

Histograms:

Histogram of Hotel Occupancy Rate:

  • The most common hotel occupancy rate in the Boston area is between 70% and 80%.
  • Hotel occupancy rates in the Boston area have been declining slightly in recent years, but remain relatively high.

Histogram of Hotel Avg Daily Rate:

  • The most common hotel average daily rate in the Boston area is between $225 and $275 per night.
  • Hotel average daily rates in the Boston area have been increasing steadily over time.

November 24, 2023

Histograms:

Histogram of Logan Passengers:

  • The most common number of passengers at Logan International Airport is between 2.5 and 3.5 million per month.
  • The number of passengers at Logan International Airport has been increasing steadily over time, with a peak of over 5 million passengers in December 2023.

Histogram of Logan Intl Flights:

  • The most common number of international flights at Logan International Airport is between 3000 and 4000 per month.
  • The number of international flights at Logan International Airport has also been increasing steadily over time, with a peak of over 5000 flights in December 2023.

November 22, 2023

Pair Plots for Relationship between Variables:

1: There is a positive correlation between logan_intl_flights and logan_passengers. This means that as the number of international flights at Logan International Airport increases, the number of passengers at the airport also tends to increase. This is likely because Logan is a major hub for both domestic and international flights.

2: There is a positive correlation between logan_passengers and hotel_avg_daily_rate. This means that as the number of passengers at Logan International Airport increases, the average daily rate of hotels in the area also tends to increase. This is likely because an increase in demand for hotel rooms drives up prices.

3: There is a positive correlation between logan_passengers and hotel_occup_rate. This means that as the number of passengers at Logan International Airport increases, the occupancy rate of hotels in the area also tends to increase. This is likely due to the same reason as the previous point: an increase in demand for hotel rooms drives up prices and occupancy rates.

In addition to the above inferences, the scatter plots also reveal some interesting trends:

1: The relationship between logan_passengers and hotel_avg_daily_rate is stronger than the relationship between logan_passengers and hotel_occup_rate. This means that hotels are more likely to raise prices in response to an increase in demand than they are to increase occupancy rates.

2: The relationship between all three variables appears to be linear. This means that the change in one variable is proportional to the change in the other variables.

November 20, 2023

Correlation Matrix:

1:Logan International Airport (BOS) has the highest number of passengers and hotel occupancy rates. This is likely because BOS is a major hub for both domestic and international flights.

2:Hotel occupancy rates are generally higher than the number of passengers. This suggests that many people who travel to the United States are staying in hotels, even if they are not arriving or departing through BOS.

3: There is a positive correlation between the number of passengers and hotel occupancy rates. This means that as the number of passengers increases, hotel occupancy rates also tend to increase.

Trends:

1: The number of passengers at BOS has been increasing steadily over time. This suggests that the airport is becoming more popular with travelers.

2: Hotel occupancy rates at BOS have been declining slightly in recent years. This could be due to several factors, such as the rise of Airbnb and other home-sharing platforms.

3: The correlation between the number of passengers and hotel occupancy rates has been weakening in recent years. This suggests that other factors, such as the economy and the availability of alternative accommodations, are becoming more influential in determining hotel occupancy rates.

November 17, 2023

Decided to work on Hotel Market effects on Tourism:         

1: logan_passengers:

  • The mean number of Logan airport passengers is approximately 3,015,647.
  • The standard deviation is around 549,276, indicating some variability in the number of passengers.
  • The minimum and maximum values are roughly 1,878,731 and 4,120,937, respectively.

2: logan_intl_flights:

  • The mean number of international flights is approximately 3,940.51.
  • The standard deviation is approximately 694.48.
  • The minimum and maximum values are 2,587 and 5,260, respectively.

3: hotel_occup_rate:

  • The mean hotel occupancy rate is approximately 81.77%.
  • The standard deviation is about 10.86%.
  • The minimum and maximum values are 57.2% and 93.1%, respectively.

4: hotel_avg_daily_rate:

  • The mean hotel average daily rate is approximately $244.42.
  • The standard deviation is around $49.76.
  • The minimum and maximum values are $157.89 and $337.92, respectively.

Interpretations:

1: Logan Passengers and International Flights:

  • The mean values provide a central tendency for the number of Logan airport passengers and international flights.
  • The standard deviations indicate the variability around these means.

2: Hotel Occupancy Rate:

  • The mean hotel occupancy rate of approximately 81.77% suggests a relatively high average occupancy.
  • The variability (standard deviation) of around 10.86% indicates some fluctuations in hotel occupancy.

3: Hotel Average Daily Rate:

  • The mean hotel average daily rate of approximately $244.42 provides an average pricing benchmark.
  • The standard deviation of $49.76 suggests some variability in hotel pricing.

November 15, 2023

Real Estate: Board Approved Development Projects (Pipeline):

  • pipeline_unit (Units): Approximately 468.95 units are approved on average for development projects.
  • The data may exhibit an anomaly since the minimum is negative.
  • The average total development cost, or pipeline_total_dev_cost, is approximately $480,481,700.
  • The cost ranges from $0 at the minimum to $2,755,500,000 at the maximum.
  • sqft (pipeline, in square feet): The approved projects have an average square footage of about 992,537.
  • The range of the square footage is 0 to 4,714,445.
  • pipeline_const_jobs (Construction Jobs): For projects that are approved, the average number of construction jobs is roughly 801.73. 3,976 is the maximum, and 0 is the minimum.

Real Estate Market: Housing:

  • foreclosure_pet (Foreclosure Petitions): There are typically 13.23 foreclosure petitions filed each year. Between 0 and 69 is the range.
  • foreclosure_deeds: The mean quantity of foreclosure deeds is approximately 3.77.
  • Med_housing_price (Median Housing Sales Price): The range is 0 to 17. The median price of a home sold on average is about $167,327.85. In some cases, the median price is reported as 0.
  • housing_sales_vol (Volume of Housing Sales): Approximately 269.61 houses are sold on average. There are 0 to 2508 in the range.
  • New Housing Construction Permits: The mean quantity of permits issued for new housing construction is approximately 132.89. The range is 0 to 897.
  • new-affordable_housing_permits (New Affordable Housing Unit Permits): The average number of permits for new affordable housing construction is approximately 23.13. The range is from 0 to 232.

November 13, 2023

The dataset seems to encompass a number of topics pertaining to Boston’s real estate, labor market, hotel industry, and tourism sector.
This dataset offers an extensive perspective of diverse economic metrics in Boston, facilitating the examination and investigation of patterns and connections among diverse industries.

Key statistics for various variables over 84 observations (months or years) in Boston are summarized in the data that is provided.

Month and Year:

  • The information is available from 2013 to 2019.
  • The number 6.5 stands for an average month.

    Travel:

  • logan_passengers (Passenger Traffic at Logan): 3.02 million people travel through Logan Airport on average.
  • There is a minimum of roughly 1.88 million and a maximum of roughly 4.12 million.
  • Logan International Flights (logan_intl_flights): There are roughly 3940.51 international flights on average.
  • 2587 is the minimum and 5260 is the maximum.

    Hotel Market:

  • hotel_occup_rate (Occupancy Rate): 81.77% is the average hotel occupancy rate.
  • 93.1% is the highest rate, and 57.2% is the lowest.
  • The average daily rate, or hotel_avg_daily_rate: The average cost of a hotel room is $244.42 per day.
  • $157.89 is the minimum rate and $337.92 is the maximum.

November 10, 2023

Topics Learnt Today:
Clustering methods for the project:
The Silhouette Scores, which serve as indicators of clustering quality, have been calculated for different clustering algorithms, each applied with five clusters. Detailed explanation of each method is below:

1: KMedoids Clustering (n_clusters=5):

  • Silhouette Score: 0.37
  • Interpretation: The score of 0.37 suggests moderate cohesion and separation between clusters. Points within clusters are reasonably well-matched to neighboring clusters. It indicates that there is some distinguishability between the clusters, but the separation is not exceptionally strong.

2: KMeans Clustering (n_clusters=5):

  • Silhouette Score: 0.44
  • Interpretation: The higher score of 0.44 indicates good cohesion and separation between clusters. Points within clusters are well-matched to neighboring clusters, signifying a more distinct and well-defined clustering compared to KMedoids. The clusters are relatively well-separated.

3: DBSCAN Clustering (eps=0.5, min_samples=5):

  • Silhouette Score: -1
  • Interpretation: The negative score of -1 is concerning. It suggests potential issues with the clustering quality, indicating that the DBSCAN algorithm may not be suitable for the given data and parameter settings. A negative silhouette score implies that points are inappropriately assigned to clusters, and the algorithm struggles to define meaningful clusters with the specified parameters.

In summary, the Silhouette Scores provide insights into the performance of different clustering algorithms. KMeans exhibits the highest score (0.44), indicating more distinct and well-separated clusters compared to KMedoids and DBSCAN. The negative score for DBSCAN suggests challenges in forming meaningful clusters with the specified parameters, highlighting potential issues in the clustering process for this algorithm in the given context.

November 8, 2023

Topics Learnt Today:

The provided boxplot illustrates the age distribution of individuals who were killed, categorized by their race, denoted by letters (A, W, H, B, O, N) likely corresponding to Asian, White, Hispanic, Black, Other, and Native American. Here’s a descriptive analysis of the boxplot:

Asian (A): The median age is approximately in the mid-30s, and there is a relatively symmetrical spread of ages within the interquartile range (IQR) from the mid-20s to mid-40s. Numerous outliers suggest a significant number of cases with ages deviating from the central tendency, spanning from young adults to those in their late 60s or early 70s.

White (W): The median age is similar to that of the Asian category, in the mid-30s, but the IQR has a broader spread from the early 20s to late 40s. Outliers indicate individuals outside the typical age range, both younger and notably older, with a cluster of older-age outliers.

Hispanic (H): The median age is slightly lower than that of Asian and White categories, potentially in the early 30s. The age distribution is compact, with an IQR similar to the Asian category. There are outliers on the higher age end, but fewer than in the White category.

Black (B): The median age for this group is also in the early 30s, with a tight IQR, indicating less variability in age within the quartiles compared to the White category. Outliers are present, indicating ages both much younger and older than the median.

Other (O): The median age in this category seems to be in the early 30s, with an IQR comparable to that of the Hispanic and Black categories. There are a few outliers, suggesting the presence of individuals significantly older than the median.

Native American (N): The median age for Native Americans is similar to that of the Other category, with an IQR slightly wider but comparable to other minority groups. Outliers indicate ages higher than the typical range.

Overall, the median ages across the races do not vary significantly, with most medians lying in the 30s. White individuals exhibit a broader age range with older-age outliers, whereas other racial categories have tighter age distributions with fewer outliers.

November 6, 2023

Topics Learnt Today:
1: White People

The age distribution of White individuals in the dataset displays a moderately right-skewed pattern, featuring a median of 38.0 and a mean of 40.09. The data indicates a relatively widespread distribution, as illustrated by a standard deviation of 13.24 and a variance of 175.26, suggesting a considerable degree of variability in the ages of White individuals.

The positive skewness value of 0.52 provides further insight into the distribution’s characteristics. Skewness measures the asymmetry of a distribution, and in this context, a positive skewness of 0.52 indicates a tail on the right side. This implies that there are relatively more White individuals with ages higher than the median, contributing to the rightward skew.

Furthermore, the negative kurtosis value of -0.13 sheds light on the tails and overall shape of the distribution. Kurtosis measures the tail heaviness of a distribution, and a negative kurtosis of -0.13 suggests slightly lighter tails compared to a normal distribution. This suggests that the age distribution among White individuals has tails that are less pronounced, and the overall shape of the distribution is somewhat flatter at the peak compared to a normal distribution.

2: Other Age Groups

The age distribution of individuals categorized as “Other” in the dataset is characterized by a median of 31.0 and a mean of 33.47. The standard deviation (11.48) and variance (131.83) suggest a moderate degree of variability in the dataset.

The positive skewness value of 0.63 provides additional information about the distribution’s shape. Skewness measures the asymmetry of a distribution, and in this case, a positive skewness of 0.63 indicates a right-skewed distribution with a tail on the right side. This suggests that there are relatively more individuals in the “Other” category with ages higher than the median, contributing to the rightward skew.

The negative kurtosis value of -0.23 gives insight into the tails and overall shape of the distribution. Kurtosis measures the tail heaviness of a distribution, and a negative kurtosis of -0.23 implies slightly lighter tails compared to a normal distribution. Additionally, the negative kurtosis suggests a flatter peak, indicating that the distribution among individuals categorized as “Other” is less concentrated around the mean compared to a normal distribution.

November 3, 2023

Topics Learnt Today:
1: Hispanics

The age distribution of Hispanic individuals in the dataset demonstrates a moderately right-skewed pattern, as indicated by a median of 33.0 and a mean of 33.73. The data exhibits a relatively lower level of dispersion, evident through a standard deviation of 10.59 and a variance of 112.13, suggesting that there is less variability in the distribution of ages among Hispanic individuals.

The positive skewness value of 0.77 adds more detail to the distribution. Skewness measures the asymmetry of a distribution, and in this context, a positive skewness of 0.77 indicates a tail on the right side. This suggests that there are relatively more Hispanic individuals with ages higher than the median, contributing to the rightward skew.

Furthermore, the positive kurtosis value of 0.69 provides insight into the tails and overall shape of the distribution. Kurtosis measures the tail heaviness of a distribution, and a positive kurtosis of 0.69 suggests slightly heavier tails compared to a normal distribution. This implies that the age distribution among Hispanic individuals has tails that are more pronounced, and the overall shape of the distribution is somewhat more concentrated around the mean.

2: Native Americans:

The age distribution of Native American individuals in the dataset is characterized by a moderately right-skewed pattern, with a median of 32.0 and a mean of 32.92. The data exhibits a relatively narrow spread, as evidenced by a standard deviation of 9.38 and a variance of 87.92, indicating a lesser degree of variability in the distribution of ages among Native American individuals.

The positive skewness value of 0.50 provides additional insight into the distribution. Skewness measures the asymmetry of a distribution, and in this context, a positive skewness of 0.50 suggests a tail on the right side. This implies that there are relatively more Native American individuals with ages higher than the median, contributing to the rightward skew.

Moreover, the negative kurtosis value of -0.17 offers information about the tails and overall shape of the distribution. Kurtosis measures the tail heaviness of a distribution, and a negative kurtosis of -0.17 suggests slightly lighter tails compared to a normal distribution. This indicates that the age distribution among Native American individuals has tails that are less pronounced, and the overall shape of the distribution is somewhat flatter at the peak compared to a normal distribution.

November 1, 2023

Topics Learnt Today:

1: Asians:

The distribution of ages among Asians in the dataset is symmetrical, as evidenced by a median of 35.0 and a mean of 36.48. The dataset displays a moderate level of dispersion, as indicated by a standard deviation of 12.21. The variance, calculated as 149, signifies a notable degree of variability in the age distribution. A skewness of 0.26 indicates a slight rightward tail, suggesting that there is a minor asymmetry towards higher age values. Additionally, the kurtosis value of -0.79 implies that the distribution has flatter tails compared to a normal distribution, indicating a relatively less peaked distribution.

2: Blacks

The age distribution of Black individuals in the dataset exhibits a right-skewed pattern, as reflected by a median of 31.0 and a mean of 32.74. The dataset’s dispersion is of moderate extent, as evidenced by a standard deviation of 11.34 and a variance of 128.62, indicating variability in the distribution of ages among Black individuals.

The positive skewness value of 1.01 further characterizes the distribution. Skewness measures the asymmetry of a distribution. In this context, a positive skewness of 1.01 suggests a tail extending towards higher age values. This implies that there are relatively more Black individuals with ages higher than the median, contributing to the rightward skew.

Moreover, the positive kurtosis value of 0.99 is indicative of the distribution having heavier tails and a more peaked shape compared to a normal distribution. Kurtosis measures the tail heaviness of a distribution. In this case, a positive kurtosis suggests that the tails of the age distribution among Black individuals are more pronounced than those in a normal distribution, and the overall shape of the distribution is more concentrated around the mean.