Project: Logs Analysis

Project I: FSWD Udacity

Introduction

This is a project that is part of the Full Stack Web Developer Nanodegree (FSWD) by Udacity in partnership with Google, AWS, Github, and AT&T.
This project is for stretching the SQL database skills, by getting practice interacting with a live database both from the command line and form the code. The database that has been used contains millions of rows, and I should build and refine complex queries and use them to draw business conclusions from data.

Report Generation

Report generation Building an informative summary from logs is a real task that comes up very often in software engineering. For instance, at Udacity we collect logs to help us measure student progress and the success of our courses. The reporting tools we use to analyze those logs involve hundreds of lines of SQL

Database as shared resource

In this project, you'll work with data that could have come from a real-world web application, with fields representing information that a web server would record, such as HTTP status codes and URL paths. The web server and the reporting tool both connect to the same database, allowing information to flow from the web server into the report.

This shows one of the valuable roles of a database server in a real-world application: it's a point where different pieces of software (a web app and a reporting tool, for instance) can share data.

Installation

To start on this project, you'll need database software (provided by a Linux virtual machine) and the data to analyze.

Prerequisites

You need to install psycopg2 to run the code.

$ pip install psycopg2

The virtual machine

This project makes use a Linux-based virtual machine (VM), and we used tools called Vagrant and VirtualBox to install and manage the VM.

Install VirtualBox

VirtualBox is the software that actually runs the virtual machine. You can download it from virtualbox.org. Install the platform package for your operating system. You do not need the extension pack or the SDK. You do not need to launch VirtualBox after installing it; Vagrant will do that.

Currently (October 2017), the supported version of VirtualBox to install is version 5.1. Newer versions do not work with the current release of Vagrant.

Ubuntu users: If you are running Ubuntu 14.04, install VirtualBox using the Ubuntu Software Center instead. Due to a reported bug, installing VirtualBox from the site may uninstall other software you need.

Install Vagrant

Vagrant is the software that configures the VM and lets you share files between your host computer and the VM's filesystem. Download it from vagrantup.com. Install the version for your operating system.

Windows users: The Installer may ask you to grant network permissions to Vagrant or make a firewall exception. Be sure to allow this.

Download the VM configuration

There are a couple of different ways you can download the VM configuration.

You can download and unzip this file: FSND-Virtual-Machine.zip This will give you a directory called FSND-Virtual-Machine. It may be located inside your Downloads folder.

Note: If you are using Windows OS you will find a Time Out error, to fix it use the new Vagrant file configuration to replace you current Vagrant file.

Alternately, you can use Github to fork and clone the repository https://github.com/udacity/fullstack-nanodegree-vm.

Either way, you will end up with a new directory containing the VM files. Change to this directory in your terminal with cd. Inside, you will find another directory called vagrant. Change directory to the vagrant directory

Start the virtual machine

From your terminal, inside the vagrant subdirectory, run the command vagrant up. This will cause Vagrant to download the Linux operating system and install it. This may take quite a while (many minutes) depending on how fast your Internet connection is.

When vagrant up is finished running, you will get your shell prompt back. At this point, you can run vagrant ssh to log in to your newly installed Linux VM!

Download the data

Next, download the data here. You will need to unzip this file after downloading it. The file inside is called newsdata.sql. Put this file into the vagrant directory, which is shared with your virtual machine.

To build the reporting tool, you'll need to load the site's data into your local database.

To load the data, cd into the vagrant directory and use the command psql -d news -f newsdata.sql Here's what this command does:

psql — the PostgreSQL command line program -d news — connect to the database named news which has been set up for you -f newsdata.sql — run the SQL statements in the file newsdata.sql Running this command will connect to your installed database server and execute the SQL commands in the downloaded file, creating tables and populating them with data.

This will give you the PostgreSQL database and support software needed for this project. If you have used an older version of this VM, you may need to install it into a new directory.

If you need to bring the virtual machine back online (with vagrant up), do so now. Then log into it with vagrant ssh.

Database

The database contains three tables, and the psql command to see all tables is \dt:

Schema	Name	Type	Owner
public	articles	table	vagrant
public	authors	table	vagrant
public	log	table	vagrant

A description of each table:

Table	Description
articles	stores article information such as the author and title
authors	stores information about the authors such as their name and their author id
logs	sotres information on when articles were accessed, such as date and request response

The psql command to see each table structure is \d.

articles Table

\d articles

column	Type	Modifiers
author	integer	not null
title	text	not null
slug	text	not null
lead	text
body	text
time	timestamp with time zone	default now()
id	integer	not null default nextval('articles_id_seq'::regclass)

Indexes: "articles_pkey" PRIMARY KEY, btree (id) "articles_slug_key" UNIQUE CONSTRAINT, btree (slug)

Foreign-key constraints: "articles_author_fkey" FOREIGN KEY (author) REFERENCES authors(id)

authors Table

\d authors

Column	Type	Modifiers
name	text	not null
bio	text
id	integer	not null default nextval('authors_id_seq'::regclass)

Indexes: "authors_pkey" PRIMARY KEY, btree (id)

Referenced by: TABLE "articles" CONSTRAINT "articles_author_fkey" FOREIGN KEY (author) REFERENCES authors(id)

log Table

\d log

Column	Type	Modifiers
path	text
ip	inet
method	text
status	text
time	timestamp with time zone	default now()
id	integer	not null default nextval('log_id_seq'::regclass)

Indexes: "log_pkey" PRIMARY KEY, btree (id)

The database that you're working with in this project is running PostgreSQL

Usage

Running the Program

Download the file log_main.py into the vagrant folder, and run the application with the following command: python3 log_main.py.

You can compare the results with the data contained in the outputExample.txt file. The data should be the same.

Questions to Solve

What are the most popular three articles of all time? Which articles have been accessed the most? Present this information as a sorted list with the most popular article at the top.
Who are the most popular article authors of all time? That is, when you sum up all of the articles each author has written, which authors get the most page views? Present this as a sorted list with the most popular author at the top.
On which days did more than 1% of requests lead to errors? The log table includes a column status that indicates the HTTP status code that the news site sent to the user's browser. (Refer to this lesson for more information about the idea of HTTP status codes.)

Authors

Oscar Reyes

License

This project is licensed under MIT license

Acknowledgements

Thanks to Udacity

Name	Name	Last commit message	Last commit date
Latest commit History 6 Commits 6 Commits
README.md	README.md
_config.yml	_config.yml
log_main.py	log_main.py
outputExample.txt	outputExample.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project: Logs Analysis

Project I: FSWD Udacity

Introduction

Report Generation

Database as shared resource

Installation

Prerequisites

The virtual machine

Install VirtualBox

Install Vagrant

Download the VM configuration

Start the virtual machine

Download the data

Database

articles Table

authors Table

log Table

Usage

Running the Program

Questions to Solve

Authors

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

Folders and files

Latest commit

History

Repository files navigation

Project: Logs Analysis

Project I: FSWD Udacity

Introduction

Report Generation

Database as shared resource

Installation

Prerequisites

The virtual machine

Install VirtualBox

Install Vagrant

Download the VM configuration

Start the virtual machine

Download the data

Database

articles Table

authors Table

log Table

Usage

Running the Program

Questions to Solve

Authors

License

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages