"
+ ]
+ },
+ "metadata": {
+ "tags": [],
+ "needs_background": "light"
+ }
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "9b_rRq84ZwDR"
+ },
+ "source": [
+ "The distribution is imbalanced, there are a much higher volume of bond scores received on days 4, 5 than on any other days.\n",
+ "\n",
+ "This may have been due to how the data was collected, perhaps there was a push to get partcipants to log a score on those days."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "FjGC1c3D7gO2",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 297
+ },
+ "outputId": "71f2876d-bba0-4db7-c9bb-3a70c64cfb8e"
+ },
+ "source": [
+ "survey_responses.describe()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
tenureDay
\n",
+ "
waiBondSubscore
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
count
\n",
+ "
5311.000000
\n",
+ "
5311.000000
\n",
+ "
\n",
+ "
\n",
+ "
mean
\n",
+ "
6.460930
\n",
+ "
3.836613
\n",
+ "
\n",
+ "
\n",
+ "
std
\n",
+ "
3.892763
\n",
+ "
0.768943
\n",
+ "
\n",
+ "
\n",
+ "
min
\n",
+ "
4.000000
\n",
+ "
1.000000
\n",
+ "
\n",
+ "
\n",
+ "
25%
\n",
+ "
4.000000
\n",
+ "
3.250000
\n",
+ "
\n",
+ "
\n",
+ "
50%
\n",
+ "
5.000000
\n",
+ "
4.000000
\n",
+ "
\n",
+ "
\n",
+ "
75%
\n",
+ "
6.000000
\n",
+ "
4.500000
\n",
+ "
\n",
+ "
\n",
+ "
max
\n",
+ "
16.000000
\n",
+ "
5.000000
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " tenureDay waiBondSubscore\n",
+ "count 5311.000000 5311.000000\n",
+ "mean 6.460930 3.836613\n",
+ "std 3.892763 0.768943\n",
+ "min 4.000000 1.000000\n",
+ "25% 4.000000 3.250000\n",
+ "50% 5.000000 4.000000\n",
+ "75% 6.000000 4.500000\n",
+ "max 16.000000 5.000000"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 111
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "xOd1U1jdZU9S"
+ },
+ "source": [
+ "The Highest bond score is 5 and the lowest is 1.\n",
+ "\n",
+ "Let's check the mean average bond score by day:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "01if-0TI7gUT",
+ "outputId": "41b81f65-18ec-45fa-bc4f-83eb7e94aa45"
+ },
+ "source": [
+ "survey_responses.groupby(\"tenureDay\")[\"waiBondSubscore\"].mean()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tenureDay\n",
+ "4 3.812401\n",
+ "5 3.791287\n",
+ "6 3.755371\n",
+ "14 4.037109\n",
+ "15 4.046875\n",
+ "16 3.925725\n",
+ "Name: waiBondSubscore, dtype: float64"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 246
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "uhYziXFdZlMu"
+ },
+ "source": [
+ "Initial analysis of the mean score by tenure day, suggests that scores improve later in the tenure period, compared with the earlier days. It must be noted that we have previously seen the sample sizes for the days are very different. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "qHs0tsaSb-e3"
+ },
+ "source": [
+ "# Task 1 - Does the reported measure of bond change over time?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "9R3oqKy3s2LJ"
+ },
+ "source": [
+ "To measure whether bond changes over time, I will do the following:\n",
+ "\n",
+ "\n",
+ "1. Pivot the surey_repsonses data set to observe which participants submitted two bond scores\n",
+ "2. Add a column at the end of this pivot table, to calculate the change in bond score. This is calculated by deducting the first score from the second (where there are two)\n",
+ "3. Calculating the mean of the 'Change' metric to understand the change in bond score over time\n",
+ "4. Examine the distribution of users who submitted one bond score and those who submitted two\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "q10f77SilPtn"
+ },
+ "source": [
+ "The survey data is in long format, this can be pivotted."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "l52ib8amAQGl"
+ },
+ "source": [
+ "# Pivot the dataframe to look at both scores for a given user, where applicable:\n",
+ "pivot_survey = survey_responses.pivot_table('waiBondSubscore', index =\"userid\", columns =\"tenureDay\") "
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 235
+ },
+ "id": "ehPjtcZJBpks",
+ "outputId": "341e0731-777d-44e9-c7f0-c861671c9e5a"
+ },
+ "source": [
+ "pivot_survey.head(5)"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
tenureDay
\n",
+ "
4
\n",
+ "
5
\n",
+ "
6
\n",
+ "
14
\n",
+ "
15
\n",
+ "
16
\n",
+ "
\n",
+ "
\n",
+ "
userid
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
+/wAfc2I0c831C21wvh2Kcr4DZk=
\n",
+ "
4.25
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
4.0
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
+0NdvBGRsXuoa20PHou4K3FMlBA=
\n",
+ "
NaN
\n",
+ "
3.75
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
+0eFEPPuFJm9U5lXwlAKw/I+Clo=
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
3.75
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
+0pfU/ormz318pPBTZ6cWtrgHkI=
\n",
+ "
NaN
\n",
+ "
5.00
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
5.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
+11s2fkg+oFKje/WvOYnzxbYgtY=
\n",
+ "
4.50
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ "tenureDay 4 5 6 14 15 16\n",
+ "userid \n",
+ "+/wAfc2I0c831C21wvh2Kcr4DZk= 4.25 NaN NaN 4.0 NaN NaN\n",
+ "+0NdvBGRsXuoa20PHou4K3FMlBA= NaN 3.75 NaN NaN NaN NaN\n",
+ "+0eFEPPuFJm9U5lXwlAKw/I+Clo= NaN NaN 3.75 NaN NaN NaN\n",
+ "+0pfU/ormz318pPBTZ6cWtrgHkI= NaN 5.00 NaN NaN 5.0 NaN\n",
+ "+11s2fkg+oFKje/WvOYnzxbYgtY= 4.50 NaN NaN NaN NaN NaN"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 179
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "9HEW3PQVma1V"
+ },
+ "source": [
+ " Create a new column 'Change' to track the change in bond score over tenure days."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "3AU28DfALIZi"
+ },
+ "source": [
+ "df = pivot_survey.reset_index(drop=True)\n",
+ "for index in df.index:\n",
+ " row = df.iloc[index]\n",
+ " nonNaValuesInRow = df.iloc[index].dropna()\n",
+ " accumulated = np.nan\n",
+ " for value in nonNaValuesInRow: # Assumption that there are only 2 values (two timepoints)\n",
+ " if len(nonNaValuesInRow) > 1:\n",
+ " if (np.isnan(accumulated)): #first\n",
+ " accumulated = value\n",
+ " else:\n",
+ " accumulated -= value #subtract value \n",
+ " df.at[index,'Change'] = accumulated\n",
+ " else:\n",
+ " df.at[index,'Change'] = float(\"NaN\") # if there is only one bond score, then 'Change' is NaN"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "dtRUs_bPWy_X"
+ },
+ "source": [
+ "df[\"Change\"] = -df[\"Change\"] # swap the sign of last column to read more intuitively; negative indicates bond decreased, positive indicates that it increased\n",
+ "pivot_survey = pivot_survey.reset_index().rename({'index':'UserId'}, axis = 'columns')\n",
+ "newcolumn = pivot_survey[\"userid\"]\n",
+ "newdf = df.assign(UserId = newcolumn)\n",
+ "newdf = newdf.set_index('UserId')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "nPCXbRZWkl0D",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 235
+ },
+ "outputId": "dcb636fd-de78-4ea7-8424-7841fdc00158"
+ },
+ "source": [
+ "newdf.head(5)"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
tenureDay
\n",
+ "
4
\n",
+ "
5
\n",
+ "
6
\n",
+ "
14
\n",
+ "
15
\n",
+ "
16
\n",
+ "
Change
\n",
+ "
\n",
+ "
\n",
+ "
UserId
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
+/wAfc2I0c831C21wvh2Kcr4DZk=
\n",
+ "
4.25
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
4.0
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
-0.25
\n",
+ "
\n",
+ "
\n",
+ "
+0NdvBGRsXuoa20PHou4K3FMlBA=
\n",
+ "
NaN
\n",
+ "
3.75
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
+0eFEPPuFJm9U5lXwlAKw/I+Clo=
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
3.75
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
+0pfU/ormz318pPBTZ6cWtrgHkI=
\n",
+ "
NaN
\n",
+ "
5.00
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
5.0
\n",
+ "
NaN
\n",
+ "
-0.00
\n",
+ "
\n",
+ "
\n",
+ "
+11s2fkg+oFKje/WvOYnzxbYgtY=
\n",
+ "
4.50
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ "tenureDay 4 5 6 14 15 16 Change\n",
+ "UserId \n",
+ "+/wAfc2I0c831C21wvh2Kcr4DZk= 4.25 NaN NaN 4.0 NaN NaN -0.25\n",
+ "+0NdvBGRsXuoa20PHou4K3FMlBA= NaN 3.75 NaN NaN NaN NaN NaN\n",
+ "+0eFEPPuFJm9U5lXwlAKw/I+Clo= NaN NaN 3.75 NaN NaN NaN NaN\n",
+ "+0pfU/ormz318pPBTZ6cWtrgHkI= NaN 5.00 NaN NaN 5.0 NaN -0.00\n",
+ "+11s2fkg+oFKje/WvOYnzxbYgtY= 4.50 NaN NaN NaN NaN NaN NaN"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 14
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "kLp-tekCxJ4d"
+ },
+ "source": [
+ "The distribution of biond score change across participants who logged two scores:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 295
+ },
+ "id": "mhLK_U7ftw7E",
+ "outputId": "e296dd61-b3e3-408a-c051-ff5875eb1e78"
+ },
+ "source": [
+ "sns.histplot(df[\"Change\"], color='red', alpha =0.5).set(title=\"Change metric distribution\")\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEWCAYAAACJ0YulAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAazklEQVR4nO3de5QdZZ3u8e9DuCnoIKZPG4EYQPAMgwpMgxfILBREYFTEpVHiKHiLjDBHD44OiApnvM4oKMcLrCAMogRBgZHx4AgqSHQADTFCuKiAZJIYmgQQERBJeM4fVV1U2r3TO929d3Wnn89ae3XV+1bV/lUnXc+u65ZtIiIiADZruoCIiJg4EgoREVFJKERERCWhEBERlYRCRERUEgoREVFJKMSoSTpV0tebrmMikvQHSbuMw3Is6bnl8FmSPjL26kDSzLLGaeX4NZLeOR7LLpf3XUlHj9fyoncSCrFBkuZKWlRuQFaVf+wHNF1XUzrdeNre1vZd4/neto+1/bGRppN0t6SDR1jWf5c1rhtrXa0+HNg+zPZXx7rs6L2EQrQl6QTg88AngX5gJvBl4Igm65rIJG3edA0jmQw1RoNs55XXn72AvwD+ALxhA9OcClwMnA88BNwCDNT6TwTuLPtuBY6s9R0D/Bj4LPAA8BvgsFr/zsC15bzfB74EfL3W/2Lgv4DfAb8ADtxAnXcDHwBuAh4GzqEIue/Wlv+MkZYNfAJYB/yx/N18sWw3cBzwa+A3tbbnlsNPAU4DlgEPluv9lDa1fgBYBfwWePuw5ZwHfLwcng58p6zxfmAhxYe8rwFPAI+WNX4QmFUu5x3Af5e/16G2zcvlXQN8Cvgp8Hvg28D2Zd+BwIoWv9ODgUOBPwGPl+/3i9ry3lkObwZ8uFz/eyn+v/xF2TdUx9FlbWuAk5v+/z+VX40XkNfEfJV/7GuHNhptpjm13EAeDkwrNyrX1/rfADy73Ci8sdwgzyj7jik3JO8q5/37ckOosv86isDYEjig3FB9vezbAbivfN/NgFeU431t6rwbuJ4iCHYoN0yLgb2BrYEfAqd0suz6xq62fANXAdtTbuxZf2P+pXK+Hcp1fSmwVZvf+SCwJ7ANsID2ofAp4Cxgi/I1u/a7uxs4uLbcoQ3v+eVyn0LrUFhZe+9Lar/vA2kTCrX/B18f1l/9nijC7Q5gF2Bb4FLga8NqO7us64XAY8BfNv03MFVfOXwU7TwTWGN77QjT/dj2FS6OTX+N4o8aANvftP1b20/Yvojik/R+tXmX2T67nPerwAygX9JMYF/go7b/ZPvHwOW1+f4OuKJ83ydsXwUsotiQt/MF24O2V1J8qr7B9s9t/xG4jCIgRrtsgE/Zvt/2o/VGSZtRbBTfa3ul7XW2/8v2Yy2WMQf4N9tLbT9MsbFt53GK39dzbD9ue6HtkR5kdqrth4fXWPO12nt/BJgzdCJ6jN4MnG77Ltt/AE4C3jTsMNb/sf2o7V9Q7J29sNWCovsSCtHOfcD0Do4/31MbfgTYemgeSW+VtETS7yT9juJT6PRW89p+pBzclmLv4v5aG8Dy2vBzgDcMLbdc9gEUG8l2BmvDj7YY33YMyx5eX910ir2RO0eYH4r1ri9n2Qam/QzFp+8rJd0l6cQOlt+uxlb9yyj2QKa3mXZjPJv112UZsDnFntuQ4f+PtiUakVCIdq6j2I1/7WhmlvQcikMCxwPPtL0dsBRQB7OvAraX9NRa20614eUUn2q3q722sf3p0dQ6zEjLbvdpvF37GopDbLt28N6rWH89Z7ab0PZDtt9vexfgNcAJkg4aZY1Dhr/34xT1PwxU/xbl3kPfRiz3txRhW1/2WtYP5pggEgrRku0HgY8CX5L0WklPlbSFpMMk/WsHi9iGYmOxGkDS2yj2FDp572UUh2xOlbSlpJcAr65N8nXg1ZJeKWmapK0lHShpx41YxXZGWvYgxbHxjth+AjgXOF3Ss8tlvkTSVi0mvxg4RtIeZSCe0m65kl4l6bmSRHHyeh3FCeaNrrHm72rv/c/At8pDe7+i2AP8W0lbUJw0rtc/CMwqD5W1ciHwvyXtLGlbiqvZLurg0GQ0IKEQbdk+DTiBYiOwmuJT9PHAv3cw760UV9xcR7HReD7wk414+zcDL6E4jPVx4CKKPRdsL6e4LPZDtbo+wDj8f+5g2WcAr5f0gKT/2+Fi/xG4GfgZxZVC/9KqVtvfpbgE+IcUh4Z+uIFl7kZx1dQfKH7HX7Z9ddn3KeDD5eGvf+ywRijOCZ1HcShna+B/lXU9CLwH+ArFyeiHgRW1+b5Z/rxP0uIWyz23XPa1FFeZ/RH4h42oK3po6GqFiAlN0kXA7bbbfnqOiLHLnkJMSJL2lbSrpM0kHUrx6X3EPZSIGJvc2RgT1bMormd/JsWhir+3/fNmS4rY9HXt8JGknShulumnOOE43/YZkranOD48i+IGmDm2HyhPmJ1BcT34I8Axtlsdn4yIiC7p5uGjtcD7be9B8diA4yTtQfHogx/Y3g34QTkOcBjFybPdgHnAmV2sLSIiWuja4SPbqyiuu8b2Q5Juo7jN/wiK2+ahuIv1GuCfyvbzy7syr5e0naQZ5XJamj59umfNmtWtVYiI2CTdeOONa2z3terryTkFSbMoHiNwA9Bf29Dfw5N3Ne7A+ndUrijb1gsFSfMo9iSYOXMmixYt6lrdERGbIklt75bv+tVH5c0qlwDvs/37el+5V7BRJzVsz7c9YHugr69l0EVExCh1NRTKux8vAS6wfWnZPChpRtk/g+KJlVDcFFO/zX7Hsi0iInqka6FQXk10DnCb7dNrXZdTPDud8ue3a+1vVeHFwIMbOp8QERHjr5vnFPYH3gLcLGlJ2fYh4NPAxZLeQfG0xDll3xUUl6PeQXFJ6tu6WFtERLTQzauPfkz7J2IeNLyhPL9wXLfqiYiIkeUxFxERUUkoREREJaEQERGVhEJERFTylNSICeqQ2bNZM9j+Gyun9/dz5cKFPawopoKEQsQEtWZwkMVz57bt32fBgh5WE1NFDh9FREQloRAREZWEQkREVBIKERFRSShEREQloRAREZWEQkREVBIKERFRSShEREQloRAREZWEQkREVBIKERFR6VooSDpX0r2SltbaLpK0pHzdPfTdzZJmSXq01ndWt+qKiIj2uvmU1POALwLnDzXYfuPQsKTTgAdr099pe68u1hMRESPoWijYvlbSrFZ9kgTMAV7erfePiIiN19Q5hdnAoO1f19p2lvRzST+SNLvdjJLmSVokadHq1au7X2lExBTSVCgcBVxYG18FzLS9N3ACsEDS01vNaHu+7QHbA319fT0oNSJi6uh5KEjaHHgdcNFQm+3HbN9XDt8I3Ans3uvaIiKmuib2FA4Gbre9YqhBUp+kaeXwLsBuwF0N1BYRMaV185LUC4HrgOdJWiHpHWXXm1j/0BHA3wA3lZeofgs41vb93aotIiJa6+bVR0e1aT+mRdslwCXdqiUiIjqTO5ojIqKSUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEo3v6P5XEn3SlpaaztV0kpJS8rX4bW+kyTdIemXkl7ZrboiIqK9bu4pnAcc2qL9c7b3Kl9XAEjaA3gT8FflPF+WNK2LtUVERAtdCwXb1wL3dzj5EcA3bD9m+zfAHcB+3aotIiJaa+KcwvGSbioPLz2jbNsBWF6bZkXZ9mckzZO0SNKi1atXd7vWiIgppdehcCawK7AXsAo4bWMXYHu+7QHbA319feNdX0TElLZ5L9/M9uDQsKSzge+UoyuBnWqT7li2RUxah8yezZrBwbb90/v7uXLhwh5WFDGynoaCpBm2V5WjRwJDVyZdDiyQdDrwbGA34Ke9rC1ivK0ZHGTx3Llt+/dZsKCH1UR0pmuhIOlC4EBguqQVwCnAgZL2AgzcDbwbwPYtki4GbgXWAsfZXtet2iIiorWuhYLto1o0n7OB6T8BfKJb9URExMhyR3NERFQSChERUUkoREREJaEQERGVhEJERFQSChERUUkoREREJaEQERGVhEJERFQSChERUUkoREREJaEQERGVhEJERFQSChERUUkoREREJaEQERGVhEJERFQSChERUelaKEg6V9K9kpbW2j4j6XZJN0m6TNJ2ZfssSY9KWlK+zupWXRER0V439xTOAw4d1nYVsKftFwC/Ak6q9d1pe6/ydWwX64qIiDa6Fgq2rwXuH9Z2pe215ej1wI7dev+IiNh4TZ5TeDvw3dr4zpJ+LulHkma3m0nSPEmLJC1avXp196uMiJhCGgkFSScDa4ELyqZVwEzbewMnAAskPb3VvLbn2x6wPdDX19ebgiMipoieh4KkY4BXAW+2bQDbj9m+rxy+EbgT2L3XtUVETHU9DQVJhwIfBF5j+5Fae5+kaeXwLsBuwF29rC0iImDzbi1Y0oXAgcB0SSuAUyiuNtoKuEoSwPXllUZ/A/yzpMeBJ4Bjbd/fcsEREdE1XQsF20e1aD6nzbSXAJd0q5aIqeiQ2bNZMzjYtn96fz9XLlzYw4piMuhaKEREs9YMDrJ47ty2/fssWNDDamKyyGMuIiKiklCIiIhKQiEiIioJhYiIqCQUIiKiklCIiIhKQiEiIioJhYiIqHQUCpL276QtIiImt073FL7QYVtERExiG3zMhaSXAC8F+iSdUOt6OjCtm4VFRETvjfTsoy2BbcvpnlZr/z3w+m4VFRERzdhgKNj+EfAjSefZXtajmiIioiGdPiV1K0nzgVn1eWy/vBtFRUREMzoNhW8CZwFfAdZ1r5yIiGhSp6Gw1vaZXa0kIiIa1+klqf8h6T2SZkjafujV1coiIqLnOt1TOLr8+YFam4FdNjSTpHOBVwH32t6zbNseuIji/MTdwBzbD6j40uYzgMOBR4BjbC/usL6Inhvp6y5XLl/ew2oixkdHoWB751Eu/zzgi8D5tbYTgR/Y/rSkE8vxfwIOA3YrXy8Czix/RkxII33d5fRPfrKH1USMj45CQdJbW7XbPr9Ve63/WkmzhjUfARxYDn8VuIYiFI4Azrdt4HpJ20maYXtVJzVGRMTYdXr4aN/a8NbAQcBi1t8D6FR/bUN/D9BfDu8A1Pe3V5Rt64WCpHnAPICZM2eO4u0jIqKdTg8f/UN9XNJ2wDfG+ua2LckbOc98YD7AwMDARs0bEREbNtpHZz8MjPY8w6CkGQDlz3vL9pXATrXpdizbIiKiRzo9p/AfFFcbQfEgvL8ELh7le15OcTXTp8uf3661Hy/pGxQnmB/M+YSIiN7q9JzCZ2vDa4FltleMNJOkCylOKk+XtAI4hSIMLpb0DmAZMKec/AqKy1HvoLgk9W0d1hYREeOk03MKP5LUz5MnnH/d4XxHtek6qMW0Bo7rZLkREdEdnX7z2hzgp8AbKD7Z3yApj86OiNjEdHr46GRgX9v3AkjqA74PfKtbhUVERO91evXRZkOBULpvI+aNiIhJotM9hf+U9D3gwnL8jRQnhiMiYhMy0nc0P5fiDuQPSHodcEDZdR1wQbeLi4iI3hppT+HzwEkAti8FLgWQ9Pyy79VdrS4iInpqpPMC/bZvHt5Yts3qSkUREdGYkUJhuw30PWU8C4mIiOaNFAqLJL1reKOkdwI3dqekiIhoykjnFN4HXCbpzTwZAgPAlsCR3SwsIiJ6b4OhYHsQeKmklwF7ls3/z/YPu15ZRET0XKfPProauLrLtURERMNyV3JERFQSChERUUkoREREJaEQERGVhEJERFQSChERUen00dnjRtLzgItqTbsAH6V4pMa7gNVl+4ds5/HcERE91PNQsP1LYC8ASdOAlcBlwNuAz9n+bK9rioiIQtOHjw4C7rS9rOE6IiKC5kPhTTz5bW4Ax0u6SdK5kp7RagZJ8yQtkrRo9erVrSaJiIhRaiwUJG0JvAb4Ztl0JrArxaGlVcBpreazPd/2gO2Bvr6+ntQaETFVNLmncBiwuHzoHrYHba+z/QRwNrBfg7VFRExJPT/RXHMUtUNHkmbYXlWOHgksbaSqiCli+fLl7LP77m37p/f3c+XChT2sKCaCRkJB0jbAK4B315r/VdJegIG7h/VFxDjzunUsnju3bf8+Cxb0sJqYKBoJBdsPA88c1vaWJmqJiIgnNX31UURETCBNnlOIaNQhs2ezZnCwbX+OqcdUlFCIKWvN4GCOqUcMk8NHERFRSShEREQloRAREZWEQkREVBIKERFRSShEREQloRAREZWEQkREVHLzWkSMSu4I3zQlFCJiVHJH+KYph48iIqKSUIiIiEpCISIiKgmFiIioJBQiIqLS2NVHku4GHgLWAWttD0jaHrgImEXxPc1zbD/QVI0REVNN03sKL7O9l+2BcvxE4Ae2dwN+UI5HRESPNB0Kwx0BfLUc/irw2gZriYiYcpoMBQNXSrpR0ryyrd/2qnL4HqC/mdIiIqamJu9oPsD2Skn/A7hK0u31TtuW5OEzlQEyD2DmzJm9qTQiYopobE/B9sry573AZcB+wKCkGQDlz3tbzDff9oDtgb6+vl6WHBGxyWskFCRtI+lpQ8PAIcBS4HLg6HKyo4FvN1FfRMRU1dTho37gMklDNSyw/Z+SfgZcLOkdwDJgTkP1xSSQp3RGjL9GQsH2XcALW7TfBxzU+4piMspTOiPG30S7JDUiIhqUUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEpCISIiKj0PBUk7Sbpa0q2SbpH03rL9VEkrJS0pX4f3uraIiKlu8wbecy3wftuLJT0NuFHSVWXf52x/toGaIiKCBkLB9ipgVTn8kKTbgB16XUdERPy5JvYUKpJmAXsDNwD7A8dLeiuwiGJv4oEW88wD5gHMnDmzZ7VG7x0yezZrBgfb9q9cvryH1cR4G+nfd3p/P1cuXNjDigIaDAVJ2wKXAO+z/XtJZwIfA1z+PA14+/D5bM8H5gMMDAy4dxVHr60ZHGTx3Llt+6d/8pM9rCbG20j/vvssWNDDamJII1cfSdqCIhAusH0pgO1B2+tsPwGcDezXRG0REVNZE1cfCTgHuM326bX2GbXJjgSW9rq2iIipronDR/sDbwFulrSkbPsQcJSkvSgOH90NvLuB2iIiprQmrj76MaAWXVf0upborpxIjJh8Gr36KDZtOZEYMfnkMRcREVFJKERERCWhEBERlYRCRERUEgoREVFJKERERCWXpEbEhLR8+XL22X33tv25z6U7EgoRMSF53brc59KAHD6KiIhKQiEiIioJhYiIqOScwhSWB9ZFtDdV/z4SClNYHlgXm7KxbtSn6t9HQiHayiWBMZlN1Y36WCUUoq1cEhgx9eREc0REVBIKERFRmXCHjyQdCpwBTAO+YvvTDZc0YU3VqyMiJoJN9ZzbhAoFSdOALwGvAFYAP5N0ue1bm61sYsqJtIjmbKrn3CZUKAD7AXfYvgtA0jeAI4CuhEK3P2mPtPx7Bgd5Vn9/194/Iiavpo4EyPa4L3S0JL0eONT2O8vxtwAvsn18bZp5wLxy9HnAL3tc5nRgTY/fs9uyTpND1mnimyzr8xzbfa06JtqewohszwfmN/X+khbZHmjq/bsh6zQ5ZJ0mvk1hfSba1UcrgZ1q4zuWbRER0QMTLRR+BuwmaWdJWwJvAi5vuKaIiCljQh0+sr1W0vHA9yguST3X9i0NlzVcY4euuijrNDlknSa+Sb8+E+pEc0RENGuiHT6KiIgGJRQiIqKSUBgFSR+TdJOkJZKulPTspmsaC0mfkXR7uU6XSdqu6ZrGStIbJN0i6QlJk/oSQUmHSvqlpDskndh0PWMl6VxJ90pa2nQt40XSTpKulnRr+f/uvU3XNFoJhdH5jO0X2N4L+A7w0aYLGqOrgD1tvwD4FXBSw/WMh6XA64Brmy5kLGqPfjkM2AM4StIezVY1ZucBhzZdxDhbC7zf9h7Ai4HjJuu/U0JhFGz/vja6DTCpz9bbvtL22nL0eor7QyY127fZ7vXd7t1QPfrF9p+AoUe/TFq2rwXub7qO8WR7le3F5fBDwG3ADs1WNToT6pLUyUTSJ4C3Ag8CL2u4nPH0duCipouIyg7A8tr4CuBFDdUSHZA0C9gbuKHZSkYnodCGpO8Dz2rRdbLtb9s+GThZ0knA8cApPS1wI420PuU0J1PsBl/Qy9pGq5N1iuglSdsClwDvG3ZEYdJIKLRh++AOJ70AuIIJHgojrY+kY4BXAQd5kty8shH/RpNZHv0ySUjagiIQLrB9adP1jFbOKYyCpN1qo0cAtzdVy3gov9jog8BrbD/SdD2xnjz6ZRKQJOAc4Dbbpzddz1jkjuZRkHQJxWO7nwCWAcfanrSf3iTdAWwF3Fc2XW/72AZLGjNJRwJfAPqA3wFLbL+y2apGR9LhwOd58tEvn2i4pDGRdCFwIMVjpgeBU2yf02hRYyTpAGAhcDPFdgHgQ7avaK6q0UkoREREJYePIiKiklCIiIhKQiEiIioJhYiIqCQUIiKikpvXIoaR9CyKS0D3pbicdRD4d4r7OF7VZG0R3ZY9hYia8iaky4BrbO9q+68pnhrb32xlEb2RUIhY38uAx22fNdRg+xcUNyZtK+lb5XdPXFAGCJI+KulnkpZKml9rv0bSv0j6qaRfSZpdtj9V0sXls/cvk3TD0Hc+SDpE0nWSFkv6ZvksnYieSShErG9P4MY2fXsD76P4XoNdgP3L9i/a3tf2nsBTKJ4hNWRz2/uV8w09H+s9wAPls/c/Avw1gKTpwIeBg23vAywCThivFYvoREIhonM/tb3C9hPAEmBW2f6y8tP+zcDLgb+qzTP0YLQba9MfQPG9CNheCtxUtr+YInB+ImkJcDTwnO6sSkRrOdEcsb5bgNe36XusNrwO2FzS1sCXgQHbyyWdCmzdYp51jPz3JuAq20dtdNUR4yR7ChHr+yGwlaR5Qw2SXgDMbjP9UACsKY//twuUup8Ac8pl7wE8v2y/Hthf0nPLvm0k7b7xqxAxegmFiJryuySOBA6WdKekW4BPAfe0mf53wNkU3wn9PYpHXY/ky0CfpFuBj1PsnTxoezVwDHChpJuA64D/ObY1itg4eUpqRI9JmgZsYfuPknYFvg88r/wO5ohG5ZxCRO89Fbi6/KYuAe9JIMREkT2FiIio5JxCRERUEgoREVFJKERERCWhEBERlYRCRERU/j9ZHcpv38bZbQAAAABJRU5ErkJggg==\n",
+ "text/plain": [
+ "
"
+ ]
+ },
+ "metadata": {
+ "tags": [],
+ "needs_background": "light"
+ }
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "aeEGEs1_Wy4s",
+ "outputId": "a048d0ba-3a66-47fc-8aa1-64043fa622b5"
+ },
+ "source": [
+ "df[\"Change\"].mean()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "-0.11161670235546038"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 186
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "D7A_IYtyugHR",
+ "outputId": "fb287ea4-edd2-4616-a7e5-83abf524842e"
+ },
+ "source": [
+ "# Number of users with one bond score and no change tracked\n",
+ "len(newdf[newdf['Change'].isna()])"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "3443"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 192
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "f1TD_lC5ugN6",
+ "outputId": "cef23484-62ac-4bf3-b763-37f89870b01c"
+ },
+ "source": [
+ "# Number of users with two bond scores and where change could be tracked\n",
+ "len(newdf[newdf['Change'].notna()])"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "934"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 193
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "e4YzH6nwW1jy"
+ },
+ "source": [
+ "# Results\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "D80qjpZO-KTE"
+ },
+ "source": [
+ "On average, where there are two bond scores submitted by the user, the bond increases by **0.11** between the first and second score.\n",
+ "\n",
+ "It must be noted that in 3443 cases the participant recorded an initial bond score, but no subseqent one. Only in 935 out of the 4377 users recorded two bond scores."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ES6GqMvoTEFG"
+ },
+ "source": [
+ "Is there a pattern between those who submitted only one bond score and those who submitted two?\n",
+ "\n",
+ "Let's look at the distribution of average scores for these two groups, by tenure day:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "bI4UJbtNeYLf"
+ },
+ "source": [
+ "OneScore = newdf[newdf['Change'].isna()]"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "dtw_79eaWUbg"
+ },
+ "source": [
+ "TwoScores = newdf[newdf['Change'].notna()]"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "-hYq2yGgYGG2",
+ "outputId": "a00ec1c7-84c1-438b-a71a-875fe45d1e31"
+ },
+ "source": [
+ "print(\"Mean bond scores for users who submitted one score:\")\n",
+ "print(OneScore[[4, 5, 6, 14, 15, 16, 'Change']].mean())\n",
+ "print(\"Mean bond scores for users who submitted two scores:\")\n",
+ "print(TwoScores[[4, 5, 6, 14, 15, 16, 'Change']].mean())"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Mean bond scores for users who submitted one score:\n",
+ "tenureDay\n",
+ "4 3.780131\n",
+ "5 3.756394\n",
+ "6 3.742317\n",
+ "14 NaN\n",
+ "15 4.312500\n",
+ "16 NaN\n",
+ "Change NaN\n",
+ "dtype: float64\n",
+ "Mean bond scores for users who submitted two scores:\n",
+ "tenureDay\n",
+ "4 3.920115\n",
+ "5 3.924390\n",
+ "6 3.817416\n",
+ "14 4.037109\n",
+ "15 4.044296\n",
+ "16 3.925725\n",
+ "Change 0.111617\n",
+ "dtype: float64\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "eyLQWa2kIIxE"
+ },
+ "source": [
+ "The partcipants who submitted a second score, on average had slightly higher early bond scores, compared to those who didn't.\n",
+ "\n",
+ "There are 4 instances of outlier participants who submitted their only bond score on day 15, but did not submit an early one on days 4,5 or 6.\n",
+ "\n",
+ "Whilst the sample sizes are different, willingness to submit a second score may suggest a level of engagement."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "U2lAx_XKb_u-"
+ },
+ "source": [
+ "# Task 2 - Is Bond dependent on engagement / activity?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "A7frHyEMvqHP"
+ },
+ "source": [
+ "To attempt to answer this question I will implement the following tests:\n",
+ "\n",
+ "\n",
+ "1. Pivot the user_activity data to reorganise with the tenure day as columns. Fill the Nan cells with 0s for ease of analysis.\n",
+ "2. Add a column to the pivotted table with the mean number of user messages for the 28 day period for each user\n",
+ "3. Develop a visual to understand the trend in user engagement\n",
+ "4. Test the correlation between early bond score (days 4-6) and messages sent days 0-6 for a relationship between message volumes and bond levels\n",
+ "5. Test the correlation between change in bond score and messages sent days 0-28 for a relationship between message volumes and change in bond\n",
+ "6. Statistical test to gauge if there is a difference in engagement levels (messages sent) between users who reorded low bond scores and high bond scores\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 204
+ },
+ "id": "YiCK17wnfLKA",
+ "outputId": "0bf0dafb-1df4-4dfd-d2e5-e146c89a07cc"
+ },
+ "source": [
+ "user_activty.head(5)"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
"
+ ],
+ "text/plain": [
+ " 0 ... Average_Messages_per_user\n",
+ "userid ... \n",
+ "+/wAfc2I0c831C21wvh2Kcr4DZk= 33.0 ... 22.703704\n",
+ "+0pfU/ormz318pPBTZ6cWtrgHkI= 103.0 ... 27.185185\n",
+ "+4XFQAS/fIojw07hLfZk6PZiGzA= 145.0 ... 54.888889\n",
+ "+6hscmdASu/PfR0HuiO9AzKlNdQ= 52.0 ... 61.592593\n",
+ "+9kMKboj6nxl0kWR3t90grCXt5k= 56.0 ... 7.407407\n",
+ "\n",
+ "[5 rows x 32 columns]"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 227
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "7l-DuJq65uAm"
+ },
+ "source": [
+ "## Result 2. Pearsons r correlation - Change in bond score and user messages"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "3OvA46OnZAXW",
+ "outputId": "7cc6e837-a8f1-485d-b258-cf3ecc93de53"
+ },
+ "source": [
+ "corr2, _ = pearsonr(activityAndChange['Change'], activityAndChange['Total_Messages_per_user'])\n",
+ "print('Pearsons correlation: %.3f' % corr2)"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Pearsons correlation: 0.022\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "QOFpN_aI5qxA"
+ },
+ "source": [
+ "The Pearsons correaltion of 0.022 indicates no correlation between change in bond score, and user engagement."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "E0mvUH-J6kk0"
+ },
+ "source": [
+ "## Test 3. Statistical testing of the mean of two groups: low bond score and high bond score"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "FvpFHmi-7B0A"
+ },
+ "source": [
+ "In this final test: \n",
+ "\n",
+ "1. The data is ordered by early bond score, from lowest to highest\n",
+ "2. The messages column is split in two; half of the message totals relating to low bond scores are in one list, and the other half relating to high bond scores are in the other.\n",
+ "3. The two lists are tested for normality using a Shapiro-Wilkes test, the lists are identified to be non-parametric\n",
+ "4. A Mann Whitney U test is implemented to compare the two groups, and understand whether there is a significant difference in the total message volumes (and the engagement) of the users with low bond scores, and those with high bond scores\n",
+ "3. The Null hypothesis, or baseline assumption, is that there is no significant difference in the mean number of messages sent per user between the low bond-score and high bond-score groups.\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "YzjZ7qXdjz9F"
+ },
+ "source": [
+ "sortedByBond = bondAndEngagement"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "w4eb_E0Pl73H"
+ },
+ "source": [
+ "sortedByBond.sort_values(by=['early_bond'], inplace=True)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "B3c-4FIsnIkc"
+ },
+ "source": [
+ "Now that the datdframe is ordered by bond score, Let's split the totoal messages data into two lists, the total message numbers for the lower scores, and the total messages for the higher scores.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "G03XikqMna1f"
+ },
+ "source": [
+ "len(sortedMerged)\n",
+ "lower = sortedMerged[\"total_messages\"].iloc[:2188]\n",
+ "higher = sortedMerged[\"total_messages\"].iloc[2188:]\n",
+ "#df.iloc[:n,:]"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "RHqlhOfYoTYR"
+ },
+ "source": [
+ "lower = list(lower)\n",
+ "higher = list(higher)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "kxRXlYsxoywh"
+ },
+ "source": [
+ "Shapiro-Wilkes tests below tell us that the distributions of the lists are non-Gaussian, meaning we will use non-parametric testing.\n",
+ "\n",
+ "Alpha is set to 0.5\n",
+ "\n",
+ "Instead of using a studnents t-test, we can use a Mann-Whitney U test.\n",
+ "\n",
+ "The Null hypothesis is that there is no difference in the mean number of messages sent between the low-score and high-score groups."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "kNdl8R9Vodt2",
+ "outputId": "6c51cd82-7472-4315-9d43-9aef5a02d677"
+ },
+ "source": [
+ "from scipy.stats import shapiro\n",
+ "stat, p = shapiro(lower)\n",
+ "print('Statistics=%.3f, p=%.3f' % (stat, p))\n",
+ "# interpret\n",
+ "alpha = 0.05\n",
+ "if p > alpha:\n",
+ "\tprint('Sample looks Gaussian (fail to reject H0)')\n",
+ "else:\n",
+ "\tprint('Sample does not look Gaussian (reject H0)')"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Statistics=0.792, p=0.000\n",
+ "Sample does not look Gaussian (reject H0)\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Q6gG3LCuouKl",
+ "outputId": "b9e9fcd7-0f53-4904-b689-a4df1d3f1f14"
+ },
+ "source": [
+ "stat, p = shapiro(higher)\n",
+ "print('Statistics=%.3f, p=%.3f' % (stat, p))\n",
+ "# interpret\n",
+ "alpha = 0.05\n",
+ "if p > alpha:\n",
+ "\tprint('Sample looks Gaussian (fail to reject H0)')\n",
+ "else:\n",
+ "\tprint('Sample does not look Gaussian (reject H0)')"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Statistics=0.809, p=0.000\n",
+ "Sample does not look Gaussian (reject H0)\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "nIfuvpRlpCYU",
+ "outputId": "cff78bac-6e0b-47de-e65f-98fd00bf4502"
+ },
+ "source": [
+ "print(np.mean(lower))\n",
+ "print(np.mean(higher))"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "286.3841936957515"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 175
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mNgnD4sIpHyn"
+ },
+ "source": [
+ "To test if there is a significant difference between volumes of messages (engagement) sent by participants who submitted low scores and those who submitted high scores, we can pwrform a Mann-Whitney U-test on the two lists."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "fV9qJ9Z0p1Iq"
+ },
+ "source": [
+ "import scipy.stats as stats\n",
+ "t, pvalue = stats.mannwhitneyu(lower,higher, alternative=None)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "yNQPby3yp5JV",
+ "outputId": "961ceee1-5c6b-4251-ddf1-6d260d23502f"
+ },
+ "source": [
+ "print(t)\n",
+ "print(pvalue)"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "2352817.0\n",
+ "0.15780557009578583\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mWUVTIGxq8zG"
+ },
+ "source": [
+ "## Results 3. Comparison of two groups, low bond score and high bond score - do their engaement levels differ?\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "D07kejG4847B"
+ },
+ "source": [
+ "The p-value is significantly above alpha of 0.05, we do not reject the above Null hypothesis: \n",
+ "\n",
+ "There is no significant difference in engagement levels of users of the bot depending on bond score."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "RCEW4Bob9tDS"
+ },
+ "source": [
+ "# Task 3 - Additional Analysis\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "igd-l5O595e6"
+ },
+ "source": [
+ "1. If further data were available (assuming data protection regulation allowed for the collection of the data) on the user profiles like Geography, Occupation, Age, Gender etc, we could potentially derive further insights from the data at a more granular level. We could see whether certain profile groups had a tendency to allocate higher or lower bond scores, or were more engaged than others. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "aj9qA7p6_D8E"
+ },
+ "source": [
+ "2. Further information on the quantity of the data in each message would be interesting to see. It is unclear from the data how long each message interaction is in terms of characters, or the type of language that was used. It would also be of interest to know why there wasn't a second bond score recorded for the majority of users."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "1AaBGE7C_XjG"
+ },
+ "source": [
+ "3. Analysing the chat logs to look at the langauge used would be very interesting for sentiment analysis. This could assist in gauging the 'mood' of the user, in conjunction with the 'moodCategory' variable. We could potentially analyse the terms used in the log, and see if the use of certain terms correlates with engagement of bond."
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/Copy_of_colab_github_demo.ipynb b/Copy_of_colab_github_demo.ipynb
new file mode 100644
index 00000000..34359db8
--- /dev/null
+++ b/Copy_of_colab_github_demo.ipynb
@@ -0,0 +1,161 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "name": "Copy of colab-github-demo.ipynb",
+ "provenance": [],
+ "collapsed_sections": [],
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "-pVhOfzLx9us"
+ },
+ "source": [
+ "# Using Google Colab with GitHub\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "wKJ4bd5rt1wy"
+ },
+ "source": [
+ "\n",
+ "[Google Colaboratory](http://colab.research.google.com) is designed to integrate cleanly with GitHub, allowing both loading notebooks from github and saving notebooks to github."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "K-NVg7RjyeTk"
+ },
+ "source": [
+ "## Loading Public Notebooks Directly from GitHub\n",
+ "\n",
+ "Colab can load public github notebooks directly, with no required authorization step.\n",
+ "\n",
+ "For example, consider the notebook at this address: https://github.com/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb.\n",
+ "\n",
+ "The direct colab link to this notebook is: https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb.\n",
+ "\n",
+ "To generate such links in one click, you can use the [Open in Colab](https://chrome.google.com/webstore/detail/open-in-colab/iogfkhleblhcpcekbiedikdehleodpjo) Chrome extension."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "WzIRIt9d2huC"
+ },
+ "source": [
+ "## Browsing GitHub Repositories from Colab\n",
+ "\n",
+ "Colab also supports special URLs that link directly to a GitHub browser for any user/organization, repository, or branch. For example:\n",
+ "\n",
+ "- http://colab.research.google.com/github will give you a general github browser, where you can search for any github organization or username.\n",
+ "- http://colab.research.google.com/github/googlecolab/ will open the repository browser for the ``googlecolab`` organization. Replace ``googlecolab`` with any other github org or user to see their repositories.\n",
+ "- http://colab.research.google.com/github/googlecolab/colabtools/ will let you browse the main branch of the ``colabtools`` repository within the ``googlecolab`` organization. Substitute any user/org and repository to see its contents.\n",
+ "- http://colab.research.google.com/github/googlecolab/colabtools/blob/master will let you browse ``master`` branch of the ``colabtools`` repository within the ``googlecolab`` organization. (don't forget the ``blob`` here!) You can specify any valid branch for any valid repository."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "Rmai0dD30XzL"
+ },
+ "source": [
+ "## Loading Private Notebooks\n",
+ "\n",
+ "Loading a notebook from a private GitHub repository is possible, but requires an additional step to allow Colab to access your files.\n",
+ "Do the following:\n",
+ "\n",
+ "1. Navigate to http://colab.research.google.com/github.\n",
+ "2. Click the \"Include Private Repos\" checkbox.\n",
+ "3. In the popup window, sign-in to your Github account and authorize Colab to read the private files.\n",
+ "4. Your private repositories and notebooks will now be available via the github navigation pane."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "8J3NBxtZpPcK"
+ },
+ "source": [
+ "## Saving Notebooks To GitHub or Drive\n",
+ "\n",
+ "Any time you open a GitHub hosted notebook in Colab, it opens a new editable view of the notebook. You can run and modify the notebook without worrying about overwriting the source.\n",
+ "\n",
+ "If you would like to save your changes from within Colab, you can use the File menu to save the modified notebook either to Google Drive or back to GitHub. Choose **File→Save a copy in Drive** or **File→Save a copy to GitHub** and follow the resulting prompts. To save a Colab notebook to GitHub requires giving Colab permission to push the commit to your repository."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "colab_type": "text",
+ "id": "8QAWNjizy_3O"
+ },
+ "source": [
+ "## Open In Colab Badge\n",
+ "\n",
+ "Anybody can open a copy of any github-hosted notebook within Colab. To make it easier to give people access to live views of GitHub-hosted notebooks,\n",
+ "colab provides a [shields.io](http://shields.io/)-style badge, which appears as follows:\n",
+ "\n",
+ "[](https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb)\n",
+ "\n",
+ "The markdown for the above badge is the following:\n",
+ "\n",
+ "```markdown\n",
+ "[](https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb)\n",
+ "```\n",
+ "\n",
+ "The HTML equivalent is:\n",
+ "\n",
+ "```HTML\n",
+ "\n",
+ " \n",
+ "\n",
+ "```\n",
+ "\n",
+ "Remember to replace the notebook URL in this template with the notebook you want to link to."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab_type": "code",
+ "id": "3VQqVi-3ScBC",
+ "colab": {}
+ },
+ "source": [
+ ""
+ ],
+ "execution_count": 0,
+ "outputs": []
+ }
+ ]
+}
\ No newline at end of file
diff --git a/WoeBot.ipynb b/WoeBot.ipynb
new file mode 100644
index 00000000..2f011b9f
--- /dev/null
+++ b/WoeBot.ipynb
@@ -0,0 +1,2756 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "name": "WoeBot.ipynb",
+ "provenance": [],
+ "collapsed_sections": [],
+ "toc_visible": true,
+ "authorship_tag": "ABX9TyOsDMyLzDwWdqCMDu7//XGI",
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mUa00C6lRb83"
+ },
+ "source": [
+ "# Woebot Analysis"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Ax0sOCbIrHm4"
+ },
+ "source": [
+ "## Housekeeping"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "GF-u0Uh4Aw2o"
+ },
+ "source": [
+ "Import the relevant libraries and load in the data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "nSNRXOvj6aPX"
+ },
+ "source": [
+ "# Import required libraries\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import os\n",
+ "from os import path\n",
+ "import matplotlib.pyplot as plt\n",
+ "import seaborn as sns\n",
+ "from scipy.stats import pearsonr\n",
+ "from scipy.stats import shapiro\n",
+ "import scipy.stats as stats"
+ ],
+ "execution_count": 2,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "n6OjJyib7C_9",
+ "outputId": "df81f735-d642-4bb1-d3f9-536fc1bd6884"
+ },
+ "source": [
+ "# 1. Mount Google Drive to access data\n",
+ "from google.colab import drive\n",
+ "drive.mount('/content/drive')"
+ ],
+ "execution_count": 3,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Mounted at /content/drive\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 35
+ },
+ "id": "5c8-QALt7J3Z",
+ "outputId": "bc53dda6-c4bc-4a47-eb7f-305ecff2cb27"
+ },
+ "source": [
+ "# 2. Check current drive\n",
+ "os.getcwd()"
+ ],
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ },
+ "text/plain": [
+ "'/content'"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 3
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "iQdmtbmq6qMF"
+ },
+ "source": [
+ "# 3. Choose folder where datasets are stored\n",
+ "os.chdir('/content/drive/My Drive/Colab Notebooks/')\n"
+ ],
+ "execution_count": 4,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "db9LdXTR6aUz"
+ },
+ "source": [
+ "# 4. Read in both csv files\n",
+ "survey_responses = pd.read_csv(\"survey_responses.csv\")\n",
+ "user_activty = pd.read_csv(\"user_activity.csv\")"
+ ],
+ "execution_count": 6,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IJ70QYHyk9-b"
+ },
+ "source": [
+ "# Data Exploration"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "C0ZtaBu-rncB"
+ },
+ "source": [
+ "Before commencing the three assigned tasks, some basic data exploration is undertaken to understand the structure of the datasets"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 204
+ },
+ "id": "-v2t0T2YS1z5",
+ "outputId": "98e636e8-0256-47a5-93ad-e5ecd7cc7711"
+ },
+ "source": [
+ "survey_responses.head(5)"
+ ],
+ "execution_count": 176,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
"
+ ]
+ },
+ "metadata": {
+ "tags": [],
+ "needs_background": "light"
+ }
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "9b_rRq84ZwDR"
+ },
+ "source": [
+ "The distribution of the volume of scores assigned on certain days is imbalanced, there are a much higher volume of bond scores received on days 4, 5 than on any other days.\n",
+ "\n",
+ "This may have been due to how the data was collected, or perhaps there was a push to get partcipants to log a score on those days."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "FjGC1c3D7gO2",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 297
+ },
+ "outputId": "71f2876d-bba0-4db7-c9bb-3a70c64cfb8e"
+ },
+ "source": [
+ "survey_responses.describe()"
+ ],
+ "execution_count": 111,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
tenureDay
\n",
+ "
waiBondSubscore
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
count
\n",
+ "
5311.000000
\n",
+ "
5311.000000
\n",
+ "
\n",
+ "
\n",
+ "
mean
\n",
+ "
6.460930
\n",
+ "
3.836613
\n",
+ "
\n",
+ "
\n",
+ "
std
\n",
+ "
3.892763
\n",
+ "
0.768943
\n",
+ "
\n",
+ "
\n",
+ "
min
\n",
+ "
4.000000
\n",
+ "
1.000000
\n",
+ "
\n",
+ "
\n",
+ "
25%
\n",
+ "
4.000000
\n",
+ "
3.250000
\n",
+ "
\n",
+ "
\n",
+ "
50%
\n",
+ "
5.000000
\n",
+ "
4.000000
\n",
+ "
\n",
+ "
\n",
+ "
75%
\n",
+ "
6.000000
\n",
+ "
4.500000
\n",
+ "
\n",
+ "
\n",
+ "
max
\n",
+ "
16.000000
\n",
+ "
5.000000
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " tenureDay waiBondSubscore\n",
+ "count 5311.000000 5311.000000\n",
+ "mean 6.460930 3.836613\n",
+ "std 3.892763 0.768943\n",
+ "min 4.000000 1.000000\n",
+ "25% 4.000000 3.250000\n",
+ "50% 5.000000 4.000000\n",
+ "75% 6.000000 4.500000\n",
+ "max 16.000000 5.000000"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 111
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "xOd1U1jdZU9S"
+ },
+ "source": [
+ "The Highest bond score is 5 and the lowest is 1.\n",
+ "\n",
+ "Let's check the mean average bond score by day:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "01if-0TI7gUT",
+ "outputId": "41b81f65-18ec-45fa-bc4f-83eb7e94aa45"
+ },
+ "source": [
+ "survey_responses.groupby(\"tenureDay\")[\"waiBondSubscore\"].mean()"
+ ],
+ "execution_count": 246,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tenureDay\n",
+ "4 3.812401\n",
+ "5 3.791287\n",
+ "6 3.755371\n",
+ "14 4.037109\n",
+ "15 4.046875\n",
+ "16 3.925725\n",
+ "Name: waiBondSubscore, dtype: float64"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 246
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "uhYziXFdZlMu"
+ },
+ "source": [
+ "Initial analysis of the mean score by tenure day, suggests that scores improve later in the tenure period, compared with the earlier days. It must be noted that we have previously seen the sample sizes for the days are very different. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "qHs0tsaSb-e3"
+ },
+ "source": [
+ "# Task 1 - Does the reported measure of bond change over time?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "9R3oqKy3s2LJ"
+ },
+ "source": [
+ "To measure whether bond changes over time, I will do the following:\n",
+ "\n",
+ "\n",
+ "1. Pivot the surey_responses dataset to observe which participants submitted two bond scores\n",
+ "2. Add a column at the end of this pivot table, to calculate the change in bond score. This is calculated by deducting the first score from the second (where there are two)\n",
+ "3. Calculating the mean of the 'Change' metric to understand the change in bond score over time\n",
+ "4. Examine the distribution of scores for users who submitted one bond score and those who submitted two\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "q10f77SilPtn"
+ },
+ "source": [
+ "The survey data is in long format, this can be pivotted."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "l52ib8amAQGl"
+ },
+ "source": [
+ "# Pivot the dataframe to look at both scores for a given user, where applicable:\n",
+ "pivot_survey = survey_responses.pivot_table('waiBondSubscore', index =\"userid\", columns =\"tenureDay\") "
+ ],
+ "execution_count": 178,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 235
+ },
+ "id": "ehPjtcZJBpks",
+ "outputId": "341e0731-777d-44e9-c7f0-c861671c9e5a"
+ },
+ "source": [
+ "pivot_survey.head(5)"
+ ],
+ "execution_count": 179,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
tenureDay
\n",
+ "
4
\n",
+ "
5
\n",
+ "
6
\n",
+ "
14
\n",
+ "
15
\n",
+ "
16
\n",
+ "
\n",
+ "
\n",
+ "
userid
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
+/wAfc2I0c831C21wvh2Kcr4DZk=
\n",
+ "
4.25
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
4.0
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
+0NdvBGRsXuoa20PHou4K3FMlBA=
\n",
+ "
NaN
\n",
+ "
3.75
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
+0eFEPPuFJm9U5lXwlAKw/I+Clo=
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
3.75
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
+0pfU/ormz318pPBTZ6cWtrgHkI=
\n",
+ "
NaN
\n",
+ "
5.00
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
5.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
+11s2fkg+oFKje/WvOYnzxbYgtY=
\n",
+ "
4.50
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ "tenureDay 4 5 6 14 15 16\n",
+ "userid \n",
+ "+/wAfc2I0c831C21wvh2Kcr4DZk= 4.25 NaN NaN 4.0 NaN NaN\n",
+ "+0NdvBGRsXuoa20PHou4K3FMlBA= NaN 3.75 NaN NaN NaN NaN\n",
+ "+0eFEPPuFJm9U5lXwlAKw/I+Clo= NaN NaN 3.75 NaN NaN NaN\n",
+ "+0pfU/ormz318pPBTZ6cWtrgHkI= NaN 5.00 NaN NaN 5.0 NaN\n",
+ "+11s2fkg+oFKje/WvOYnzxbYgtY= 4.50 NaN NaN NaN NaN NaN"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 179
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "9HEW3PQVma1V"
+ },
+ "source": [
+ " Creating a new column 'Change' to track the change in bond score over tenure days."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "3AU28DfALIZi"
+ },
+ "source": [
+ "df = pivot_survey.reset_index(drop=True)\n",
+ "for index in df.index:\n",
+ " row = df.iloc[index]\n",
+ " nonNaValuesInRow = df.iloc[index].dropna()\n",
+ " accumulated = np.nan\n",
+ " for value in nonNaValuesInRow: # Assumption that there are only 2 values (two timepoints)\n",
+ " if len(nonNaValuesInRow) > 1:\n",
+ " if (np.isnan(accumulated)): #first\n",
+ " accumulated = value\n",
+ " else:\n",
+ " accumulated -= value #subtract value \n",
+ " df.at[index,'Change'] = accumulated\n",
+ " else:\n",
+ " df.at[index,'Change'] = float(\"NaN\") # if there is only one bond score, then 'Change' is NaN"
+ ],
+ "execution_count": 181,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "dtRUs_bPWy_X"
+ },
+ "source": [
+ "df[\"Change\"] = -df[\"Change\"] # swap the sign of last column to read more intuitively; negative indicates bond decreased, positive indicates that it increased\n",
+ "pivot_survey = pivot_survey.reset_index().rename({'index':'UserId'}, axis = 'columns')\n",
+ "newcolumn = pivot_survey[\"userid\"]\n",
+ "newdf = df.assign(UserId = newcolumn)\n",
+ "newdf = newdf.set_index('UserId')"
+ ],
+ "execution_count": 10,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "nPCXbRZWkl0D",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 235
+ },
+ "outputId": "4d568634-ea13-4581-ceaf-df710a246e66"
+ },
+ "source": [
+ "newdf.head(5)"
+ ],
+ "execution_count": 257,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
tenureDay
\n",
+ "
4
\n",
+ "
5
\n",
+ "
6
\n",
+ "
14
\n",
+ "
15
\n",
+ "
16
\n",
+ "
Change
\n",
+ "
\n",
+ "
\n",
+ "
UserId
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
+/wAfc2I0c831C21wvh2Kcr4DZk=
\n",
+ "
4.25
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
4.0
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
-0.25
\n",
+ "
\n",
+ "
\n",
+ "
+0NdvBGRsXuoa20PHou4K3FMlBA=
\n",
+ "
NaN
\n",
+ "
3.75
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
+0eFEPPuFJm9U5lXwlAKw/I+Clo=
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
3.75
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
+0pfU/ormz318pPBTZ6cWtrgHkI=
\n",
+ "
NaN
\n",
+ "
5.00
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
5.0
\n",
+ "
NaN
\n",
+ "
-0.00
\n",
+ "
\n",
+ "
\n",
+ "
+11s2fkg+oFKje/WvOYnzxbYgtY=
\n",
+ "
4.50
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ "tenureDay 4 5 6 14 15 16 Change\n",
+ "UserId \n",
+ "+/wAfc2I0c831C21wvh2Kcr4DZk= 4.25 NaN NaN 4.0 NaN NaN -0.25\n",
+ "+0NdvBGRsXuoa20PHou4K3FMlBA= NaN 3.75 NaN NaN NaN NaN NaN\n",
+ "+0eFEPPuFJm9U5lXwlAKw/I+Clo= NaN NaN 3.75 NaN NaN NaN NaN\n",
+ "+0pfU/ormz318pPBTZ6cWtrgHkI= NaN 5.00 NaN NaN 5.0 NaN -0.00\n",
+ "+11s2fkg+oFKje/WvOYnzxbYgtY= 4.50 NaN NaN NaN NaN NaN NaN"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 257
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "kLp-tekCxJ4d"
+ },
+ "source": [
+ "The distribution of bond score 'change' across participants who logged two scores:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 295
+ },
+ "id": "mhLK_U7ftw7E",
+ "outputId": "f6b7856f-4a88-43a4-c0c7-da6937dbc032"
+ },
+ "source": [
+ "sns.histplot(df[\"Change\"], color='red', alpha =0.5).set(title=\"Change metric distribution\")\n",
+ "plt.show()"
+ ],
+ "execution_count": 252,
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEWCAYAAACJ0YulAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAazklEQVR4nO3de5QdZZ3u8e9DuCnoIKZPG4EYQPAMgwpMgxfILBREYFTEpVHiKHiLjDBHD44OiApnvM4oKMcLrCAMogRBgZHx4AgqSHQADTFCuKiAZJIYmgQQERBJeM4fVV1U2r3TO929d3Wnn89ae3XV+1bV/lUnXc+u65ZtIiIiADZruoCIiJg4EgoREVFJKERERCWhEBERlYRCRERUEgoREVFJKMSoSTpV0tebrmMikvQHSbuMw3Is6bnl8FmSPjL26kDSzLLGaeX4NZLeOR7LLpf3XUlHj9fyoncSCrFBkuZKWlRuQFaVf+wHNF1XUzrdeNre1vZd4/neto+1/bGRppN0t6SDR1jWf5c1rhtrXa0+HNg+zPZXx7rs6L2EQrQl6QTg88AngX5gJvBl4Igm65rIJG3edA0jmQw1RoNs55XXn72AvwD+ALxhA9OcClwMnA88BNwCDNT6TwTuLPtuBY6s9R0D/Bj4LPAA8BvgsFr/zsC15bzfB74EfL3W/2Lgv4DfAb8ADtxAnXcDHwBuAh4GzqEIue/Wlv+MkZYNfAJYB/yx/N18sWw3cBzwa+A3tbbnlsNPAU4DlgEPluv9lDa1fgBYBfwWePuw5ZwHfLwcng58p6zxfmAhxYe8rwFPAI+WNX4QmFUu5x3Af5e/16G2zcvlXQN8Cvgp8Hvg28D2Zd+BwIoWv9ODgUOBPwGPl+/3i9ry3lkObwZ8uFz/eyn+v/xF2TdUx9FlbWuAk5v+/z+VX40XkNfEfJV/7GuHNhptpjm13EAeDkwrNyrX1/rfADy73Ci8sdwgzyj7jik3JO8q5/37ckOosv86isDYEjig3FB9vezbAbivfN/NgFeU431t6rwbuJ4iCHYoN0yLgb2BrYEfAqd0suz6xq62fANXAdtTbuxZf2P+pXK+Hcp1fSmwVZvf+SCwJ7ANsID2ofAp4Cxgi/I1u/a7uxs4uLbcoQ3v+eVyn0LrUFhZe+9Lar/vA2kTCrX/B18f1l/9nijC7Q5gF2Bb4FLga8NqO7us64XAY8BfNv03MFVfOXwU7TwTWGN77QjT/dj2FS6OTX+N4o8aANvftP1b20/Yvojik/R+tXmX2T67nPerwAygX9JMYF/go7b/ZPvHwOW1+f4OuKJ83ydsXwUsotiQt/MF24O2V1J8qr7B9s9t/xG4jCIgRrtsgE/Zvt/2o/VGSZtRbBTfa3ul7XW2/8v2Yy2WMQf4N9tLbT9MsbFt53GK39dzbD9ue6HtkR5kdqrth4fXWPO12nt/BJgzdCJ6jN4MnG77Ltt/AE4C3jTsMNb/sf2o7V9Q7J29sNWCovsSCtHOfcD0Do4/31MbfgTYemgeSW+VtETS7yT9juJT6PRW89p+pBzclmLv4v5aG8Dy2vBzgDcMLbdc9gEUG8l2BmvDj7YY33YMyx5eX910ir2RO0eYH4r1ri9n2Qam/QzFp+8rJd0l6cQOlt+uxlb9yyj2QKa3mXZjPJv112UZsDnFntuQ4f+PtiUakVCIdq6j2I1/7WhmlvQcikMCxwPPtL0dsBRQB7OvAraX9NRa20614eUUn2q3q722sf3p0dQ6zEjLbvdpvF37GopDbLt28N6rWH89Z7ab0PZDtt9vexfgNcAJkg4aZY1Dhr/34xT1PwxU/xbl3kPfRiz3txRhW1/2WtYP5pggEgrRku0HgY8CX5L0WklPlbSFpMMk/WsHi9iGYmOxGkDS2yj2FDp572UUh2xOlbSlpJcAr65N8nXg1ZJeKWmapK0lHShpx41YxXZGWvYgxbHxjth+AjgXOF3Ss8tlvkTSVi0mvxg4RtIeZSCe0m65kl4l6bmSRHHyeh3FCeaNrrHm72rv/c/At8pDe7+i2AP8W0lbUJw0rtc/CMwqD5W1ciHwvyXtLGlbiqvZLurg0GQ0IKEQbdk+DTiBYiOwmuJT9PHAv3cw760UV9xcR7HReD7wk414+zcDL6E4jPVx4CKKPRdsL6e4LPZDtbo+wDj8f+5g2WcAr5f0gKT/2+Fi/xG4GfgZxZVC/9KqVtvfpbgE+IcUh4Z+uIFl7kZx1dQfKH7HX7Z9ddn3KeDD5eGvf+ywRijOCZ1HcShna+B/lXU9CLwH+ArFyeiHgRW1+b5Z/rxP0uIWyz23XPa1FFeZ/RH4h42oK3po6GqFiAlN0kXA7bbbfnqOiLHLnkJMSJL2lbSrpM0kHUrx6X3EPZSIGJvc2RgT1bMormd/JsWhir+3/fNmS4rY9HXt8JGknShulumnOOE43/YZkranOD48i+IGmDm2HyhPmJ1BcT34I8Axtlsdn4yIiC7p5uGjtcD7be9B8diA4yTtQfHogx/Y3g34QTkOcBjFybPdgHnAmV2sLSIiWuja4SPbqyiuu8b2Q5Juo7jN/wiK2+ahuIv1GuCfyvbzy7syr5e0naQZ5XJamj59umfNmtWtVYiI2CTdeOONa2z3terryTkFSbMoHiNwA9Bf29Dfw5N3Ne7A+ndUrijb1gsFSfMo9iSYOXMmixYt6lrdERGbIklt75bv+tVH5c0qlwDvs/37el+5V7BRJzVsz7c9YHugr69l0EVExCh1NRTKux8vAS6wfWnZPChpRtk/g+KJlVDcFFO/zX7Hsi0iInqka6FQXk10DnCb7dNrXZdTPDud8ue3a+1vVeHFwIMbOp8QERHjr5vnFPYH3gLcLGlJ2fYh4NPAxZLeQfG0xDll3xUUl6PeQXFJ6tu6WFtERLTQzauPfkz7J2IeNLyhPL9wXLfqiYiIkeUxFxERUUkoREREJaEQERGVhEJERFTylNSICeqQ2bNZM9j+Gyun9/dz5cKFPawopoKEQsQEtWZwkMVz57bt32fBgh5WE1NFDh9FREQloRAREZWEQkREVBIKERFRSShEREQloRAREZWEQkREVBIKERFRSShEREQloRAREZWEQkREVBIKERFR6VooSDpX0r2SltbaLpK0pHzdPfTdzZJmSXq01ndWt+qKiIj2uvmU1POALwLnDzXYfuPQsKTTgAdr099pe68u1hMRESPoWijYvlbSrFZ9kgTMAV7erfePiIiN19Q5hdnAoO1f19p2lvRzST+SNLvdjJLmSVokadHq1au7X2lExBTSVCgcBVxYG18FzLS9N3ACsEDS01vNaHu+7QHbA319fT0oNSJi6uh5KEjaHHgdcNFQm+3HbN9XDt8I3Ans3uvaIiKmuib2FA4Gbre9YqhBUp+kaeXwLsBuwF0N1BYRMaV185LUC4HrgOdJWiHpHWXXm1j/0BHA3wA3lZeofgs41vb93aotIiJa6+bVR0e1aT+mRdslwCXdqiUiIjqTO5ojIqKSUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEo3v6P5XEn3SlpaaztV0kpJS8rX4bW+kyTdIemXkl7ZrboiIqK9bu4pnAcc2qL9c7b3Kl9XAEjaA3gT8FflPF+WNK2LtUVERAtdCwXb1wL3dzj5EcA3bD9m+zfAHcB+3aotIiJaa+KcwvGSbioPLz2jbNsBWF6bZkXZ9mckzZO0SNKi1atXd7vWiIgppdehcCawK7AXsAo4bWMXYHu+7QHbA319feNdX0TElLZ5L9/M9uDQsKSzge+UoyuBnWqT7li2RUxah8yezZrBwbb90/v7uXLhwh5WFDGynoaCpBm2V5WjRwJDVyZdDiyQdDrwbGA34Ke9rC1ivK0ZHGTx3Llt+/dZsKCH1UR0pmuhIOlC4EBguqQVwCnAgZL2AgzcDbwbwPYtki4GbgXWAsfZXtet2iIiorWuhYLto1o0n7OB6T8BfKJb9URExMhyR3NERFQSChERUUkoREREJaEQERGVhEJERFQSChERUUkoREREJaEQERGVhEJERFQSChERUUkoREREJaEQERGVhEJERFQSChERUUkoREREJaEQERGVhEJERFQSChERUelaKEg6V9K9kpbW2j4j6XZJN0m6TNJ2ZfssSY9KWlK+zupWXRER0V439xTOAw4d1nYVsKftFwC/Ak6q9d1pe6/ydWwX64qIiDa6Fgq2rwXuH9Z2pe215ej1wI7dev+IiNh4TZ5TeDvw3dr4zpJ+LulHkma3m0nSPEmLJC1avXp196uMiJhCGgkFSScDa4ELyqZVwEzbewMnAAskPb3VvLbn2x6wPdDX19ebgiMipoieh4KkY4BXAW+2bQDbj9m+rxy+EbgT2L3XtUVETHU9DQVJhwIfBF5j+5Fae5+kaeXwLsBuwF29rC0iImDzbi1Y0oXAgcB0SSuAUyiuNtoKuEoSwPXllUZ/A/yzpMeBJ4Bjbd/fcsEREdE1XQsF20e1aD6nzbSXAJd0q5aIqeiQ2bNZMzjYtn96fz9XLlzYw4piMuhaKEREs9YMDrJ47ty2/fssWNDDamKyyGMuIiKiklCIiIhKQiEiIioJhYiIqCQUIiKiklCIiIhKQiEiIioJhYiIqHQUCpL276QtIiImt073FL7QYVtERExiG3zMhaSXAC8F+iSdUOt6OjCtm4VFRETvjfTsoy2BbcvpnlZr/z3w+m4VFRERzdhgKNj+EfAjSefZXtajmiIioiGdPiV1K0nzgVn1eWy/vBtFRUREMzoNhW8CZwFfAdZ1r5yIiGhSp6Gw1vaZXa0kIiIa1+klqf8h6T2SZkjafujV1coiIqLnOt1TOLr8+YFam4FdNjSTpHOBVwH32t6zbNseuIji/MTdwBzbD6j40uYzgMOBR4BjbC/usL6Inhvp6y5XLl/ew2oixkdHoWB751Eu/zzgi8D5tbYTgR/Y/rSkE8vxfwIOA3YrXy8Czix/RkxII33d5fRPfrKH1USMj45CQdJbW7XbPr9Ve63/WkmzhjUfARxYDn8VuIYiFI4Azrdt4HpJ20maYXtVJzVGRMTYdXr4aN/a8NbAQcBi1t8D6FR/bUN/D9BfDu8A1Pe3V5Rt64WCpHnAPICZM2eO4u0jIqKdTg8f/UN9XNJ2wDfG+ua2LckbOc98YD7AwMDARs0bEREbNtpHZz8MjPY8w6CkGQDlz3vL9pXATrXpdizbIiKiRzo9p/AfFFcbQfEgvL8ELh7le15OcTXTp8uf3661Hy/pGxQnmB/M+YSIiN7q9JzCZ2vDa4FltleMNJOkCylOKk+XtAI4hSIMLpb0DmAZMKec/AqKy1HvoLgk9W0d1hYREeOk03MKP5LUz5MnnH/d4XxHtek6qMW0Bo7rZLkREdEdnX7z2hzgp8AbKD7Z3yApj86OiNjEdHr46GRgX9v3AkjqA74PfKtbhUVERO91evXRZkOBULpvI+aNiIhJotM9hf+U9D3gwnL8jRQnhiMiYhMy0nc0P5fiDuQPSHodcEDZdR1wQbeLi4iI3hppT+HzwEkAti8FLgWQ9Pyy79VdrS4iInpqpPMC/bZvHt5Yts3qSkUREdGYkUJhuw30PWU8C4mIiOaNFAqLJL1reKOkdwI3dqekiIhoykjnFN4HXCbpzTwZAgPAlsCR3SwsIiJ6b4OhYHsQeKmklwF7ls3/z/YPu15ZRET0XKfPProauLrLtURERMNyV3JERFQSChERUUkoREREJaEQERGVhEJERFQSChERUen00dnjRtLzgItqTbsAH6V4pMa7gNVl+4ds5/HcERE91PNQsP1LYC8ASdOAlcBlwNuAz9n+bK9rioiIQtOHjw4C7rS9rOE6IiKC5kPhTTz5bW4Ax0u6SdK5kp7RagZJ8yQtkrRo9erVrSaJiIhRaiwUJG0JvAb4Ztl0JrArxaGlVcBpreazPd/2gO2Bvr6+ntQaETFVNLmncBiwuHzoHrYHba+z/QRwNrBfg7VFRExJPT/RXHMUtUNHkmbYXlWOHgksbaSqiCli+fLl7LP77m37p/f3c+XChT2sKCaCRkJB0jbAK4B315r/VdJegIG7h/VFxDjzunUsnju3bf8+Cxb0sJqYKBoJBdsPA88c1vaWJmqJiIgnNX31UURETCBNnlOIaNQhs2ezZnCwbX+OqcdUlFCIKWvN4GCOqUcMk8NHERFRSShEREQloRAREZWEQkREVBIKERFRSShEREQloRAREZWEQkREVHLzWkSMSu4I3zQlFCJiVHJH+KYph48iIqKSUIiIiEpCISIiKgmFiIioJBQiIqLS2NVHku4GHgLWAWttD0jaHrgImEXxPc1zbD/QVI0REVNN03sKL7O9l+2BcvxE4Ae2dwN+UI5HRESPNB0Kwx0BfLUc/irw2gZriYiYcpoMBQNXSrpR0ryyrd/2qnL4HqC/mdIiIqamJu9oPsD2Skn/A7hK0u31TtuW5OEzlQEyD2DmzJm9qTQiYopobE/B9sry573AZcB+wKCkGQDlz3tbzDff9oDtgb6+vl6WHBGxyWskFCRtI+lpQ8PAIcBS4HLg6HKyo4FvN1FfRMRU1dTho37gMklDNSyw/Z+SfgZcLOkdwDJgTkP1xSSQp3RGjL9GQsH2XcALW7TfBxzU+4piMspTOiPG30S7JDUiIhqUUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEpCISIiKgmFiIioJBQiIqKSUIiIiEpCISIiKj0PBUk7Sbpa0q2SbpH03rL9VEkrJS0pX4f3uraIiKlu8wbecy3wftuLJT0NuFHSVWXf52x/toGaIiKCBkLB9ipgVTn8kKTbgB16XUdERPy5JvYUKpJmAXsDNwD7A8dLeiuwiGJv4oEW88wD5gHMnDmzZ7VG7x0yezZrBgfb9q9cvryH1cR4G+nfd3p/P1cuXNjDigIaDAVJ2wKXAO+z/XtJZwIfA1z+PA14+/D5bM8H5gMMDAy4dxVHr60ZHGTx3Llt+6d/8pM9rCbG20j/vvssWNDDamJII1cfSdqCIhAusH0pgO1B2+tsPwGcDezXRG0REVNZE1cfCTgHuM326bX2GbXJjgSW9rq2iIipronDR/sDbwFulrSkbPsQcJSkvSgOH90NvLuB2iIiprQmrj76MaAWXVf0upborpxIjJh8Gr36KDZtOZEYMfnkMRcREVFJKERERCWhEBERlYRCRERUEgoREVFJKERERCWXpEbEhLR8+XL22X33tv25z6U7EgoRMSF53brc59KAHD6KiIhKQiEiIioJhYiIqOScwhSWB9ZFtDdV/z4SClNYHlgXm7KxbtSn6t9HQiHayiWBMZlN1Y36WCUUoq1cEhgx9eREc0REVBIKERFRmXCHjyQdCpwBTAO+YvvTDZc0YU3VqyMiJoJN9ZzbhAoFSdOALwGvAFYAP5N0ue1bm61sYsqJtIjmbKrn3CZUKAD7AXfYvgtA0jeAI4CuhEK3P2mPtPx7Bgd5Vn9/194/Iiavpo4EyPa4L3S0JL0eONT2O8vxtwAvsn18bZp5wLxy9HnAL3tc5nRgTY/fs9uyTpND1mnimyzr8xzbfa06JtqewohszwfmN/X+khbZHmjq/bsh6zQ5ZJ0mvk1hfSba1UcrgZ1q4zuWbRER0QMTLRR+BuwmaWdJWwJvAi5vuKaIiCljQh0+sr1W0vHA9yguST3X9i0NlzVcY4euuijrNDlknSa+Sb8+E+pEc0RENGuiHT6KiIgGJRQiIqKSUBgFSR+TdJOkJZKulPTspmsaC0mfkXR7uU6XSdqu6ZrGStIbJN0i6QlJk/oSQUmHSvqlpDskndh0PWMl6VxJ90pa2nQt40XSTpKulnRr+f/uvU3XNFoJhdH5jO0X2N4L+A7w0aYLGqOrgD1tvwD4FXBSw/WMh6XA64Brmy5kLGqPfjkM2AM4StIezVY1ZucBhzZdxDhbC7zf9h7Ai4HjJuu/U0JhFGz/vja6DTCpz9bbvtL22nL0eor7QyY127fZ7vXd7t1QPfrF9p+AoUe/TFq2rwXub7qO8WR7le3F5fBDwG3ADs1WNToT6pLUyUTSJ4C3Ag8CL2u4nPH0duCipouIyg7A8tr4CuBFDdUSHZA0C9gbuKHZSkYnodCGpO8Dz2rRdbLtb9s+GThZ0knA8cApPS1wI420PuU0J1PsBl/Qy9pGq5N1iuglSdsClwDvG3ZEYdJIKLRh++AOJ70AuIIJHgojrY+kY4BXAQd5kty8shH/RpNZHv0ySUjagiIQLrB9adP1jFbOKYyCpN1qo0cAtzdVy3gov9jog8BrbD/SdD2xnjz6ZRKQJOAc4Dbbpzddz1jkjuZRkHQJxWO7nwCWAcfanrSf3iTdAWwF3Fc2XW/72AZLGjNJRwJfAPqA3wFLbL+y2apGR9LhwOd58tEvn2i4pDGRdCFwIMVjpgeBU2yf02hRYyTpAGAhcDPFdgHgQ7avaK6q0UkoREREJYePIiKiklCIiIhKQiEiIioJhYiIqCQUIiKikpvXIoaR9CyKS0D3pbicdRD4d4r7OF7VZG0R3ZY9hYia8iaky4BrbO9q+68pnhrb32xlEb2RUIhY38uAx22fNdRg+xcUNyZtK+lb5XdPXFAGCJI+KulnkpZKml9rv0bSv0j6qaRfSZpdtj9V0sXls/cvk3TD0Hc+SDpE0nWSFkv6ZvksnYieSShErG9P4MY2fXsD76P4XoNdgP3L9i/a3tf2nsBTKJ4hNWRz2/uV8w09H+s9wAPls/c/Avw1gKTpwIeBg23vAywCThivFYvoREIhonM/tb3C9hPAEmBW2f6y8tP+zcDLgb+qzTP0YLQba9MfQPG9CNheCtxUtr+YInB+ImkJcDTwnO6sSkRrOdEcsb5bgNe36XusNrwO2FzS1sCXgQHbyyWdCmzdYp51jPz3JuAq20dtdNUR4yR7ChHr+yGwlaR5Qw2SXgDMbjP9UACsKY//twuUup8Ac8pl7wE8v2y/Hthf0nPLvm0k7b7xqxAxegmFiJryuySOBA6WdKekW4BPAfe0mf53wNkU3wn9PYpHXY/ky0CfpFuBj1PsnTxoezVwDHChpJuA64D/ObY1itg4eUpqRI9JmgZsYfuPknYFvg88r/wO5ohG5ZxCRO89Fbi6/KYuAe9JIMREkT2FiIio5JxCRERUEgoREVFJKERERCWhEBERlYRCRERU/j9ZHcpv38bZbQAAAABJRU5ErkJggg==\n",
+ "text/plain": [
+ "
"
+ ]
+ },
+ "metadata": {
+ "tags": [],
+ "needs_background": "light"
+ }
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "aeEGEs1_Wy4s",
+ "outputId": "eb661a2c-e190-4cbe-b656-246a4ffd474c"
+ },
+ "source": [
+ "df[\"Change\"].notna().mean()"
+ ],
+ "execution_count": 259,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "0.21338816541009825"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 259
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "D7A_IYtyugHR",
+ "outputId": "fb287ea4-edd2-4616-a7e5-83abf524842e"
+ },
+ "source": [
+ "# Number of users with one bond score and no change tracked\n",
+ "len(newdf[newdf['Change'].isna()])"
+ ],
+ "execution_count": 192,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "3443"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 192
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "f1TD_lC5ugN6",
+ "outputId": "cef23484-62ac-4bf3-b763-37f89870b01c"
+ },
+ "source": [
+ "# Number of users with two bond scores and where change could be tracked\n",
+ "len(newdf[newdf['Change'].notna()])"
+ ],
+ "execution_count": 193,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "934"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 193
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "e4YzH6nwW1jy"
+ },
+ "source": [
+ "# Results\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "D80qjpZO-KTE"
+ },
+ "source": [
+ "On average, where there are two bond scores submitted by the user, the bond increases by **0.21** between the first and second score.\n",
+ "\n",
+ "It must be noted that in **3443** cases the participant recorded an initial bond score, but no subseqent one. Only in **934** out of the **4377** users recorded two bond scores."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ES6GqMvoTEFG"
+ },
+ "source": [
+ "It is worth checking whether there are differences in mean scores between those who submitted only one bond score and those who submitted two.\n",
+ "\n",
+ "Let's look at the distribution of average scores for these two groups, by tenure day:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "bI4UJbtNeYLf"
+ },
+ "source": [
+ "OneScore = newdf[newdf['Change'].isna()]"
+ ],
+ "execution_count": 205,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "dtw_79eaWUbg"
+ },
+ "source": [
+ "TwoScores = newdf[newdf['Change'].notna()]"
+ ],
+ "execution_count": 196,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "-hYq2yGgYGG2",
+ "outputId": "a00ec1c7-84c1-438b-a71a-875fe45d1e31"
+ },
+ "source": [
+ "print(\"Mean bond scores for users who submitted one score:\")\n",
+ "print(OneScore[[4, 5, 6, 14, 15, 16, 'Change']].mean())\n",
+ "print(\"Mean bond scores for users who submitted two scores:\")\n",
+ "print(TwoScores[[4, 5, 6, 14, 15, 16, 'Change']].mean())"
+ ],
+ "execution_count": 198,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Mean bond scores for users who submitted one score:\n",
+ "tenureDay\n",
+ "4 3.780131\n",
+ "5 3.756394\n",
+ "6 3.742317\n",
+ "14 NaN\n",
+ "15 4.312500\n",
+ "16 NaN\n",
+ "Change NaN\n",
+ "dtype: float64\n",
+ "Mean bond scores for users who submitted two scores:\n",
+ "tenureDay\n",
+ "4 3.920115\n",
+ "5 3.924390\n",
+ "6 3.817416\n",
+ "14 4.037109\n",
+ "15 4.044296\n",
+ "16 3.925725\n",
+ "Change 0.111617\n",
+ "dtype: float64\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "eyLQWa2kIIxE"
+ },
+ "source": [
+ "The partcipants who submitted two scores at the two timepoints, on average had slightly higher early bond scores at timepoint one, compared to those who didn't.\n",
+ "\n",
+ "There are 4 instances of outlier participants who submitted their only bond score on day 15, but did not submit an early one on days 4,5 or 6.\n",
+ "\n",
+ "Whilst the sample sizes are different, willingness to submit a second score may suggest a level of engagement."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "U2lAx_XKb_u-"
+ },
+ "source": [
+ "# Task 2 - Is Bond dependent on engagement / activity?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "A7frHyEMvqHP"
+ },
+ "source": [
+ "To attempt to answer this question I will implement the following steps:\n",
+ "\n",
+ "\n",
+ "1. Pivot the user_activity data to reorganise with the tenure day as columns. Fill the Nan cells with 0s for ease of analysis.\n",
+ "2. Add a column to the pivotted table with the mean number of user messages for the 28 day period for each user.\n",
+ "3. Develop a visual to understand the trend in average user engagement over 28 days.\n",
+ "4. Test the correlation between early bond score (days 4-6) and messages sent days 0-6 to see if there is a relationship between user message volumes and bond scores.\n",
+ "5. Test the correlation between change in bond scores and messages sent days 0-28 for a relationship between message volumes and the changes that occured in bond scores.\n",
+ "6. Statistical test to gauge if there is a difference in engagement levels (messages sent) between users who reorded low bond scores and high bond scores.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 204
+ },
+ "id": "YiCK17wnfLKA",
+ "outputId": "0bf0dafb-1df4-4dfd-d2e5-e146c89a07cc"
+ },
+ "source": [
+ "user_activty.head(5)"
+ ],
+ "execution_count": 221,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
"
+ ],
+ "text/plain": [
+ "tenureDay 0 1 2 3 ... 25 26 27 28\n",
+ "userid ... \n",
+ "+/wAfc2I0c831C21wvh2Kcr4DZk= 33.0 0.0 0.0 30.0 ... 25.0 0.0 25.0 33.0\n",
+ "+0NdvBGRsXuoa20PHou4K3FMlBA= 54.0 30.0 22.0 7.0 ... 0.0 0.0 0.0 0.0\n",
+ "+0eFEPPuFJm9U5lXwlAKw/I+Clo= 51.0 150.0 126.0 0.0 ... 0.0 0.0 0.0 0.0\n",
+ "+0pfU/ormz318pPBTZ6cWtrgHkI= 103.0 74.0 11.0 37.0 ... 0.0 9.0 0.0 56.0\n",
+ "+11s2fkg+oFKje/WvOYnzxbYgtY= 65.0 32.0 11.0 0.0 ... 0.0 0.0 0.0 0.0\n",
+ "\n",
+ "[5 rows x 29 columns]"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 117
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "SxtVDVE_rOhf"
+ },
+ "source": [
+ "Next, we can add a column with the mean number of daily messages for each user for the period:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "3JNkfz7ZMKHm"
+ },
+ "source": [
+ "pivot_averages = pivot_activity.append(pivot_activity.mean(numeric_only=True), ignore_index=True)"
+ ],
+ "execution_count": 118,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "foBPqhJ8raQ9"
+ },
+ "source": [
+ "We can see in the visual below that the average number of messages sent per user drops significantly after the first day, from 80.5 messages per user on tenure day 0, to 43.5 messages per user on tenure day 1. This decrease continues throughout the period, but tails off."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "HSy-bKGrMRqE"
+ },
+ "source": [
+ "plotdfavg = pivot_averages.iloc[-1]"
+ ],
+ "execution_count": 119,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "STBQ4PONRG5r"
+ },
+ "source": [
+ "plotdfavg.reset_index\n",
+ "plotdfavg = pd.DataFrame(plotdfavg)\n",
+ "plotdfavg[\"Tenure_Days\"] = plotdfavg.index\n",
+ "plotdfavg.rename(columns={4377: 'Average_Messages'}, inplace=True)\n",
+ "plotdfavg.head()"
+ ],
+ "execution_count": 123,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 385
+ },
+ "id": "M7sz40--Muj7",
+ "outputId": "9a54785c-413f-42ac-9308-1b0291e5a972"
+ },
+ "source": [
+ "sns.displot(plotdfavg, x = plotdfavg[\"Tenure_Days\"], y=plotdfavg[\"Average_Messages\"], bins = 29, color='blue', alpha = .5, legend=False).set(title='Average Messages Received by Tenure day')\n",
+ "plt.show()"
+ ],
+ "execution_count": 125,
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ "
"
+ ]
+ },
+ "metadata": {
+ "tags": [],
+ "needs_background": "light"
+ }
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jLrBKO5PT3zm"
+ },
+ "source": [
+ "Most messages occur in the earliest tenure days.\n",
+ "\n",
+ "This aligns with what was seen in the distribution graph for bond score-counts, a much higher volume of bond scores were submitted in the earlier tenure days.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "dhiLZVdHrsZu"
+ },
+ "source": [
+ "## Test 1. Engagement and Bond score in the first 6 days"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "TNlouw_Trzfa"
+ },
+ "source": [
+ "The aim in this section is to examine whether there is a correlation between bond score assigned at timepoint one in the first 6 days, and the volume of messages sent by the user in that period."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 235
+ },
+ "id": "UmV62muduL_F",
+ "outputId": "9717ad7f-c8b7-4443-eba3-56e84d50e2dd"
+ },
+ "source": [
+ "early_engagement = pd.DataFrame(pivot_activity, columns=[0,1,2,3,4,5,6])\n",
+ "early_engagement = early_engagement.fillna(0)\n",
+ "early_engagement[\"total_messages\"] = early_engagement.iloc[:, 0:7].sum(axis=1)\n",
+ "early_engagement.head(5)"
+ ],
+ "execution_count": 208,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
"
+ ],
+ "text/plain": [
+ " 0 ... Average_Messages_per_user\n",
+ "userid ... \n",
+ "+/wAfc2I0c831C21wvh2Kcr4DZk= 33.0 ... 22.703704\n",
+ "+0pfU/ormz318pPBTZ6cWtrgHkI= 103.0 ... 27.185185\n",
+ "+4XFQAS/fIojw07hLfZk6PZiGzA= 145.0 ... 54.888889\n",
+ "+6hscmdASu/PfR0HuiO9AzKlNdQ= 52.0 ... 61.592593\n",
+ "+9kMKboj6nxl0kWR3t90grCXt5k= 56.0 ... 7.407407\n",
+ "\n",
+ "[5 rows x 32 columns]"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 227
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "7l-DuJq65uAm"
+ },
+ "source": [
+ "## Result 2. Pearsons r correlation - Change in bond score and user messages"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "3OvA46OnZAXW",
+ "outputId": "7cc6e837-a8f1-485d-b258-cf3ecc93de53"
+ },
+ "source": [
+ "corr2, _ = pearsonr(activityAndChange['Change'], activityAndChange['Total_Messages_per_user'])\n",
+ "print('Pearsons correlation: %.3f' % corr2)"
+ ],
+ "execution_count": 228,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Pearsons correlation: 0.022\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "QOFpN_aI5qxA"
+ },
+ "source": [
+ "The Pearsons correaltion of 0.022 indicates no correlation between change in bond score, and user engagement."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "E0mvUH-J6kk0"
+ },
+ "source": [
+ "## Test 3. Statistical testing of the mean of two groups: low bond score and high bond score"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "FvpFHmi-7B0A"
+ },
+ "source": [
+ "In this final test: \n",
+ "\n",
+ "1. The data is sorted by bond score from timepoint one, from lowest to highest\n",
+ "2. The messages column is split in two; half of the message totals relating to low bond scores are in one list, and the other half relating to high bond scores are in the other.\n",
+ "3. The two lists are tested for normality using a Shapiro-Wilkes test, the lists are identified to be non-parametric\n",
+ "4. A Mann Whitney U test is implemented to compare the two groups, and understand whether there is a significant difference in the total message volumes (and the engagement) of the users with low bond scores, and those with high bond scores\n",
+ "3. The null hypothesis, or baseline assumption that we make, is that there is no significant difference in the mean number of messages sent per user between the low bond-score group and high bond-score group.\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "YzjZ7qXdjz9F"
+ },
+ "source": [
+ "sortedByBond = bondAndEngagement"
+ ],
+ "execution_count": 154,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "w4eb_E0Pl73H"
+ },
+ "source": [
+ "sortedByBond.sort_values(by=['early_bond'], inplace=True)"
+ ],
+ "execution_count": 156,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "B3c-4FIsnIkc"
+ },
+ "source": [
+ "Now that the dataframe is ordered by bond score, Let's split the totoal messages data into two lists, the total message numbers for the lower scores, and the total messages for the higher scores.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "G03XikqMna1f"
+ },
+ "source": [
+ "len(sortedMerged)\n",
+ "lower = sortedMerged[\"total_messages\"].iloc[:2188]\n",
+ "higher = sortedMerged[\"total_messages\"].iloc[2188:]"
+ ],
+ "execution_count": 164,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "RHqlhOfYoTYR"
+ },
+ "source": [
+ "lower = list(lower)\n",
+ "higher = list(higher)"
+ ],
+ "execution_count": 168,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "kxRXlYsxoywh"
+ },
+ "source": [
+ "Shapiro-Wilkes tests below tell us that the distributions of the lists are non-Gaussian, meaning we will use non-parametric testing.\n",
+ "\n",
+ "Alpha is set to 0.5\n",
+ "\n",
+ "Instead of using a parametric Students t-test, we can use a Mann-Whitney U test.\n",
+ "\n",
+ "The Null hypothesis is that there is no difference in the mean number of messages sent between the low-score and high-score groups."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "kNdl8R9Vodt2",
+ "outputId": "6c51cd82-7472-4315-9d43-9aef5a02d677"
+ },
+ "source": [
+ "stat, p = shapiro(lower)\n",
+ "print('Statistics=%.3f, p=%.3f' % (stat, p))\n",
+ "# interpret\n",
+ "alpha = 0.05\n",
+ "if p > alpha:\n",
+ "\tprint('Sample looks Gaussian (fail to reject H0)')\n",
+ "else:\n",
+ "\tprint('Sample does not look Gaussian (reject H0)')"
+ ],
+ "execution_count": 169,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Statistics=0.792, p=0.000\n",
+ "Sample does not look Gaussian (reject H0)\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Q6gG3LCuouKl",
+ "outputId": "b9e9fcd7-0f53-4904-b689-a4df1d3f1f14"
+ },
+ "source": [
+ "stat, p = shapiro(higher)\n",
+ "print('Statistics=%.3f, p=%.3f' % (stat, p))\n",
+ "# interpret\n",
+ "alpha = 0.05\n",
+ "if p > alpha:\n",
+ "\tprint('Sample looks Gaussian (fail to reject H0)')\n",
+ "else:\n",
+ "\tprint('Sample does not look Gaussian (reject H0)')"
+ ],
+ "execution_count": 170,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Statistics=0.809, p=0.000\n",
+ "Sample does not look Gaussian (reject H0)\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "nIfuvpRlpCYU",
+ "outputId": "cff78bac-6e0b-47de-e65f-98fd00bf4502"
+ },
+ "source": [
+ "print(np.mean(lower))\n",
+ "print(np.mean(higher))"
+ ],
+ "execution_count": 175,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "286.3841936957515"
+ ]
+ },
+ "metadata": {
+ "tags": []
+ },
+ "execution_count": 175
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mNgnD4sIpHyn"
+ },
+ "source": [
+ "To test if there is a significant difference between volumes of messages (engagement) sent by participants who submitted low scores and those who submitted high scores, we can pwrform a Mann-Whitney U-test on the two lists."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "fV9qJ9Z0p1Iq"
+ },
+ "source": [
+ "t, pvalue = stats.mannwhitneyu(lower,higher, alternative=None)"
+ ],
+ "execution_count": 171,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "yNQPby3yp5JV",
+ "outputId": "961ceee1-5c6b-4251-ddf1-6d260d23502f"
+ },
+ "source": [
+ "print(t)\n",
+ "print(pvalue)"
+ ],
+ "execution_count": 172,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "2352817.0\n",
+ "0.15780557009578583\n"
+ ],
+ "name": "stdout"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mWUVTIGxq8zG"
+ },
+ "source": [
+ "## Results 3. Comparison of two groups, low bond score and high bond score - do their engaement levels differ?\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "D07kejG4847B"
+ },
+ "source": [
+ "The p-value of **0.158** is significantly above alpha of 0.05, we do not reject the above Null hypothesis: \n",
+ "\n",
+ "Based on this test, there is no significant evidence presented to suggest that the engagement levels of users change in different bond score groupings."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "RCEW4Bob9tDS"
+ },
+ "source": [
+ "# Task 3 - Additional Analysis\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "igd-l5O595e6"
+ },
+ "source": [
+ "1. If further data were available (assuming data protection regulation allowed for the collection of the data) on the user profiles like geography, occupation, age, gender etc, we could potentially derive further insights from the data at a more granular level. We could see whether certain profile groups had a tendency to allocate higher or lower bond scores, or were more engaged than others. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "aj9qA7p6_D8E"
+ },
+ "source": [
+ "2. Further information on the quantity of the data in each message would be interesting to see. It is unclear from the data how long each message interaction is in terms of characters, or the type of language that was used. It would also be of interest to know why there wasn't a second bond score recorded for the majority of users."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "1AaBGE7C_XjG"
+ },
+ "source": [
+ "3. Analysing the chat logs to look at the langauge used would be very interesting for sentiment analysis. This could assist in gauging the 'mood' of the user, in conjunction with the 'moodCategory' variable. We could potentially analyse the terms used in the log, and see if the use of certain terms correlates with user engagement or bond scores."
+ ]
+ }
+ ]
+}
\ No newline at end of file