The Hourly Count Paradox: Why Daily Totals Tell a Different Story / Part 2: Using Laravel to Explore Inaccurate User Visit Metrics

Jul 14, 2024

mysqlanalytics dashboard

The Hourly Count Paradox: Why Daily Totals Tell a Different Story / Part 2: Using Laravel to Explore Inaccurate User Visit Metrics

To effectively mitigate overcounting in hourly metrics, we need to refine our SQL queries and data structure. We'll start by creating a user_visits table and seeding it with realistic data to illustrate the solution.

Step 1: Creating the user_visits Table

First, generate the migration for the user_visits table:

php artisan make:migration create_user_visits_table

Then, update the generated migration file in database/migrations/yyyy_mm_dd_hhmmss_create_user_visits_table.php:

<?php

use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;

return new class extends Migration
{
/**
* Run the migrations.
*/
public function up(): void
{
Schema::create('user_visits', function (Blueprint $table) {
$table->id();
$table->unsignedBigInteger('user_id');
$table->timestamps();
});
}

/**
* Reverse the migrations.
*/
public function down(): void
{
Schema::dropIfExists('user_visits');
}
};

Run the migration to create the table:

php artisan migrate

Step 2: Seeding the user_visits Table

We use a seeder to populate the user_visits table with sample data:

php artisan make:seeder UserVisitSeeder

Then, update the generated seeder file in database/seeders/UserVisitSeeder.php:

<?php

namespace Database\Seeders;

use Illuminate\Database\Seeder;
use Carbon\Carbon;
use App\Models\User;
use App\Models\UserVisit;

class UserVisitSeeder extends Seeder
{
public function run()
{
$users = User::all();

foreach ($users as $user) {
// Generate visit patterns for each user
$visitStart = Carbon::createFromTime(
rand(6, 20), rand(0, 59), 0, 'Asia/Bangkok'
)->setTimezone('UTC'); // Set timezone to UTC for
// initial visit start

// First visit
UserVisit::create([
'user_id' => $user->id,
'created_at' => $visitStart,
]);

// Determine if there will be another visit on
// the same day
if (rand(0, 3) < 3) {
// Ensure revisits are within the same hour
// or next hour
$visitTime = $visitStart->copy()
->addMinutes(rand(0, 60))
->setTimezone('Asia/Bangkok');

$visitTimeUTC = $visitTime->copy()
->setTimezone('UTC');

// Check if the revisit time is within
// open hours (6:00 AM to 9:00 PM)
if (
$visitTime->hour >= 6 &&
$visitTime->hour < 21
) {
UserVisit::create([
'user_id' => $user->id,
'created_at' => $visitTimeUTC,
]);
}
}
}
}
}

Run the seeder to populate the table:

php artisan db:seed --class=UserVisitSeeder

Explanation of the Seeder Code:

  • Generating Visit Patterns: For each user, we generate a random visit time within gym hours (6:00 AM to 8:59 PM) in the Asia/Bangkok timezone, then convert it to UTC for storage.
  • First Visit: Each user has at least one visit recorded.
  • Random Revisit: There's a 75% chance (rand(0, 3) < 3) for each user to have an additional visit within the same or the next hour. This revisit is also converted from Asia/Bangkok to UTC.
  • Open Hours Check: The revisit is only recorded if it falls within gym hours (6:00 AM to 9:00 PM).

Step 3: Retrieving Unique Visits by Hour

Now, we need to retrieve the unique visits by hour while addressing overcounting. Here's how we do it in a Laravel controller:

Why This Approach Is Incorrect

Using the flawed method to count member visits leads to overcounting. Here’s an example of such an approach:

<?php

namespace App\Http\Controllers\Api;

use App\Http\Controllers\Controller;
use App\Models\UserVisit;
use Illuminate\Support\Facades\DB;
use Carbon\Carbon;
use Illuminate\Http\Request;

class UserVisitController extends Controller
{
public function getUniqueVisitsByHour(Request $request)
{
$selectedDate = $request->input('selectedDate');

$startDateTime = Carbon::parse(
"{$selectedDate} 00:00:00",
'Asia/Bangkok'
)->setTimezone('UTC');

$endDateTime = Carbon::parse(
"{$selectedDate} 23:59:59",
'Asia/Bangkok'
)->setTimezone('UTC');

$uniqueVisitsByHour = DB::table('user_visits')
->select(
DB::raw(
'DATE_FORMAT(
CONVERT_TZ(created_at, "+00:00", "+07:00"),
"%Y-%m-%d %H:00:00"
) AS hour_slot'
),
DB::raw('COUNT(DISTINCT user_id) AS unique_visits')
)
->whereBetween('created_at', [
$startDateTime,
$endDateTime
])
->groupBy('hour_slot')
->orderBy('hour_slot')
->get();

return $uniqueVisitsByHour;
}
}

Explanation of the getUniqueVisitsByHour Method:

  • Input Date: We get the selectedDate from the request input.
  • Date Range: We set the start and end times for the selected date, converting them from Asia/Bangkok to UTC.
  • SQL Query: We use raw SQL to:Convert created_at to the Asia/Bangkok timezone.Format the time to the nearest hour (%Y-%m-%d %H:00:00).Count the distinct user IDs for each hour slot.
  • Grouping and Ordering: We group the results by hour_slot and order them by the same column to ensure chronological order.

This method does not effectively differentiate between multiple visits by the same member within different hour slots, leading to overcounting. As a result, the total unique visits for each hour may inaccurately represent the actual unique visits for the day.

Example of Incorrect Results

Here’s an example of the incorrect results generated by the above method:

Sum of member visit count by Hour Vs Member visit count by Day

In this image, the unique visits per hour appear inflated due to repeated counts from multiple visits by the same user.

To Be Continued in Part 3 ...