Anyways, without further ado, let’s talk databases…
As it currently stands today, the MySQL database for DanielMenjivar.com is comprised of 27 relational InnoDB tables. If you look at the website, that may seem like a lot, (it is), but like I’ve said many times before: the public-facing website is only a very small part of that site. The backend part is more than just a simple CMS – it’s a complete system that includes things like income tracking for gigs/events; a CRM for chart orders; income, expense and time tracking for chart orders; statistics and analytics for viewed content, events and chart orders; engagement tracking; etc. So there’s a lot more than meets the eye for sure. To most people, it would probably seem like overkill to do all that for such a "small" site, but that’s the analytical side of me taking over. The nice thing is that I only have to enter information into one system and it all works together to make my life much easier. When I’m zoned into writing music and fulfilling orders for charts, I don’t have to worry about anything other than writing good music – everything else just falls into place…
I can’t stress enough the importance of planning/designing your database correctly from the start and making sure it’s efficient and scalable. If you spend the time to do it right in the beginning, you’ll save so much time later on, plus, writing your models will be much easier and they’ll make more sense to you too. Also, learn some SQL!
For this article, I’m not going to get into how to plan and design your database, when or why to use InnoDB over MyISAM, the advantages of a normalized relational database, how and when to use triggers, etc., nor am I going to get into optimizing your MySQL server settings – all of which are very important things to learn – and all things which I spent a lot of time researching and learning. There are many really good articles online that can help you with that, (I may cover these topics in a future article), but for this article, I’m only going to get into the various date and time types in MySQL and I’ll discuss when and why to use each one.
According to the MySQL reference manual, the date and time types in MySQL are
YEAR. In my database for DanielMenjivar.com, I use all of these types, except for
Whenever you need just the date portion of a date (no time), this is the appropriate place to use the
DATE type. If you need both the date and time though, then use
DATETIME instead (or
TIMESTAMP, but I’ll get into that further below).
An example of where I’ve used this is in the table that contains my media (audio/video) samples – I want to keep track of the date when the media sample was recorded, but not the exact time.
Before redesigning the database, I used to use the
DATE type in the table that holds my events, in conjunction with a
TIME column, so I had something like
start_date, start_time, end_date, end_time but in this case it’s much better to use two
DATETIME columns instead. My logic was that some events are all-day events, so for those there was a
NULL value stored in the
TIME columns, but now a MySQL trigger (before insert & before update) checks the value of an
all_day column and if it’s
TRUE, it updates the time portion on
end_time (which are now
DATETIME columns) like this:
IF NEW.all_day = 1 THEN BEGIN SET NEW.start_time = Date(NEW.start_time); SET NEW.end_time = TIMESTAMP(Date(NEW.end_time),'23:59:59'); END; END IF
It makes a lot more sense to do it this way – if you think about it, an all-day event really is from
23:59:59 anyways. And incase you’re wondering, there’s also a
repetitions column that let’s me have recurring events (for example, an event that runs from 5pm to 10pm on a Thursday and repeats on Friday and Saturday).
If you want to specify a specific time on a specific day, it’s best to use a
DATETIME type for that. An example of when you’d want to use only a
TIME type is if you want to specify a specific time, but when the day is irrelevant. For example, if you were creating a table that held a list of everyday reminders, say to feed your dog every day at 6pm, that’s when using
TIME would make sense.
TIME type is not just to represent a time of day though, it can also be used to represent an elapsed time or time interval. For this reason, values can be positive or negative and can be far greater than
23:59:59. I used the
TIME type in the table that contains my media samples to represent the recorded length of those samples.
Like I mentioned above when discussing how to use the
TIME type, if you want to specify a specific time on a specific day, it’s best to use a
DATETIME type for that. Also, read below on when to use the
TIMESTAMP type instead.
TIMESTAMP is like a
DATETIME field, but with a couple characteristics that make it unique.
First of all, the
TIMESTAMP type has a much smaller range than
DATETIME. You can store values from
'1970-01-01 00:00:01 UTC' to
'2038-01-19 03:14:07 UTC' as opposed to a range of
'1000-01-01 00:00:00' to
'9999-12-31 23:59:59' with
DATETIME. This is why
TIMESTAMP columns only take up 4 bytes vs. 8 bytes for
You may have noticed that the above ranges also specified UTC for
TIMESTAMP columns, whereas no timezone was specified for the ranges of
DATETIME columns. Here’s what the reference manual says about that:
TIMESTAMP values are converted from the current time zone to UTC for storage, and converted back from UTC to the current time zone for retrieval. (This occurs only for the TIMESTAMP data type, not for other types such as DATETIME.) By default, the current time zone for each connection is the server’s time. The time zone can be set on a per-connection basis, as described in Section 9.6, “MySQL Server Time Zone Support”. As long as the time zone setting remains constant, you get back the same value you store. If you store a TIMESTAMP value, and then change the time zone and retrieve the value, the retrieved value is different from the value you stored. This occurs because the same time zone was not used for conversion in both directions. The current time zone is available as the value of the time_zone system variable.
TIMESTAMP values are stored in UTC, but you’ll always see them displayed in the connection’s current timezone. For example, let’s say I save the value
'2011-06-09 12:00:00' in a
TIMESTAMP column while my server is currently set to the America/Toronto timezone. MySQL will save that value as
'2011-06-09 16:00:00 UTC' in the database, but whenever I retrieve that column, I’ll get back
'2011-06-09 12:00:00'. But, if I change my server’s timezone (or my connection’s timezone) to America/Edmonton and then retrieve that column, I’ll get
'2011-06-09 10:00:00' and not
For this reason,
TIMESTAMP columns are perfect when you want to specify an exact point in time when something happened. In my database for DanielMenjivar.com, I use
TIMESTAMP columns in almost every table to track when the row was added and when it was modified. I know what you’re thinking – that you can only have one column with the default value set to
CURRENT_TIMESTAMP and you’re right. So my
date_added columns have a default value of
'0000-00-00 00:00:00' and my
last_modified columns have a default value of
CURRENT_TIMESTAMP with it set to update to
CURRENT_TIMESTAMP "on update". Then, I create a trigger that fires before inserting new records:
BEGIN IF NEW.date_added < '0000-00-00 00:00:01' THEN BEGIN SET NEW.date_added = CURRENT_TIMESTAMP; END; END IF; END
If you don’t need to store values that happened at an exact point in time, or if you need values earlier than 1970 or later than , then use the
DATETIME type instead. Remember that even though the values of
TIMESTAMP columns are stored in UTC and never change, the values that are displayed/returned to you do indeed fluctuate depending on your server’s timezone (or if you change the timezone of your connection). So if you happen to move your database from one server to another server in a different timezone, it will appear to you as though your
TIMESTAMP values have changed if you don’t understand how this works.
DATETIME columns, on the other hand, will never fluctuate and are not affected by the server’s timezone (nor the connection’s timezone). Make sense?
For the most part,
DATETIME is the correct type to use for most dates, unless you want to "stamp" a column with a precise moment in time that’s within the range that
You can’t talk about date and time types without mentioning timezones – and doing so is really opening up a big can of worms. The thing is, timezones can be very complicated and they’re not static – they change all the time.
Way back, (about a year-and-a-half ago), when I decided I should re-do my website from the ground up, I was doing some research into different databases and considered using PostgreSQL instead of MySQL due to their support for TimeZones in
DATETIME fields. I decided to stay with MySQL instead, for various reasons (which I still contemplate once in a while), but the point is, whenever you’re planning/designing a database, it’s very important to consider timezones from the very beginning.
A couple weeks ago I came across an article by Ian Mercer titled "DateTime values should always be stored in UTC" and he makes some very intelligent points about storing
DATETIME values in UTC. it’s a good read and I definitely recommend reading through it, but I don’t, however, agree that
DATETIME values should always be stored in UTC. Usually? Yes. But always? No. To his credit, at the end of his article he does mention three "exceptions" where this might not work, but those three points are, in my opinion, big exceptions that affect a lot of cases – big enough exceptions that I wouldn’t recommend always storing
DATETIME values in UTC. Instead, I think it’s wise to advise that you should always consider how you will deal with timezones when designing your database, and make sure you completely understand the advantages and disadvantages of your implementation.
For example, for each of the events in the table that holds my events, my
end_time columns are always stored in local time. Not local to me, not local to the server, but local to the event itself. Every event references a row from the venues table, which references a row from the cities table, which references a row from the timezones table… The important thing is that I’ve taken the time to consider timezones and how I will implement this in my application.
I actually stumbled across that article because I was considering changing all my events’
DATETIME values to America/Toronto timezone (which is the timezone the server is set to as well). It didn’t occur to me to change them all to UTC, and it wasn’t until reading that article that I was sure that doing this would be a bad idea.
Why? Well there are many reasons. (Note – I’m not saying that doing this is a bad idea for everyone, just that doing this was a bad idea in my case.)
The first reason is that my trigger that I mentioned above for all-day events wouldn’t work anymore. If I was storing all values as UTC and wanted to store an all-day event for today, I’d have to store my
start_date value as
'2011-06-09 06:00:00' if the event was in Calgary, or as
'2011-06-09 04:00:00' if the event was in Toronto. Just storing it as
'2011-06-09 00:00:00' would be incorrect. So changing my values to UTC would also involve me re-writing my trigger to first run a subquery to find out what timezone this event is in and then convert the time to UTC. That would make inserting new records much slower. Of course, I could just handle this conversion on the PHP side of things before inserting the values, but that would mean that I couldn’t enforce data-integrity at the database level…
Given that most people don’t try to enforce data-integrity at the database level anyways, (and try to accomplish it all via PHP), that first reason may not seem like a significant "disadvantage" to some. But it was definitely something for me to consider… (It means that if you ever need to access your database from another application, you’ll have re-write your data-integrity rules each time, and hope that over time, they stay consistent across all apps… But that’s a whole other discussion.)
Here’s a much more significant consideration though: When I display these dates, because they’re events that happen in a specific place, at a specific time, it doesn’t make sense to display them all in UTC – I display all dates and times local to the event. Anyone who wants to come to one of my gigs in Calgary doesn’t care what time that event starts in UTC time or even in America/Toronto time – they only care about what time the event starts in America/Edmonton time – the timezone the actual event is in. To be extra clear, I display all times for my events on the website with an appended timezone anyways (for example, this event with Eduardo Mendonça from 2008). If these
DATETIME values were stored in UTC, it means I’d have to convert every single one back to the local timezone when displaying it. It may not seem like a lot when we’re dealing with just one event, but how about when you’re dealing with hundreds or thousands of events at a time?
This brings me to my next reason, which is one that Ian mentioned in his article, and that’s grouping results. I currently group all my events by month according to the local start time of the event. If I play a gig on March 1 in Japan, for example, that might actually still be February 28 if we’re talking UTC or America/Toronto time… To illustrate this further, if go to DanielMenjivar.com/events/2006/07, I get all events for which is a pretty simple query. But if all
DATETIME values were being stored in UTC, I’d have to convert them all to local time first and it would get much more complicated to retrieve these grouped results – much less efficient…
And lastly, somewhat related to my previous reason for not switching times to UTC is searching. If a user wants to search for all events on a Wednesday, they can currently do that and it’s a relatively simple query that gets executed. But if I stored an event from Japan (for example) as UTC, that might actually be stored in the database as occurring on a Tuesday (UTC) even though it actually occurred on a Wednesday in Japan. Again, it makes the queries much more complicated and therefore, much less efficient.
Once again, I’m not saying that you should never store values in
DATETIME columns as UTC, nor that you should always store them in local time. What I am saying is that you should always consider how you will deal with timezones when designing your database and that sometimes that will mean using only UTC values, and other times it means storing only local values with a reference to the timezone it represents. In my case, it’s actually more advantageous to use local time values than UTC – not because I’m too lazy to re-write things (because I was fully prepared to do so) but because doing so makes things run much more efficiently.
So why was I even considering converting
DATETIME values to the server’s local timezone anyways? To make it more efficient when trying to retrieve only upcoming events. Since all
DATETIME values are currently stored local to the event, it means I have to convert every event to America/Toronto first before I can tell if the event has already passed or not. (I’m not pulling all records though, I’m just using MySQL’s
CONVERT_TZ function within a
WHERE statement.) So, like I said before, you really need to completely understand the advantages and disadvantages of your implementation… In my case, this is not a significant enough disadvantage to merit a change to UTC values, especially given the huge advantages to keeping them local. On top of this, the
WHERE statement I’m using to determine if an event has passed or not isn’t simply comparing my
end_time values to
CURRENT_TIMESTAMP anyways – remember some events are recurring events with variable amounts of repetitions. An event isn’t considered to have passed until the
end_time has passed on the last repetition… it’s a small trade-off, but in my case, keeping
DATETIME values as local to the events is much more efficient than trying to use UTC or even America/Toronto values everywhere. In all my rigorous testing, using
CONVERT_TZ on my
DATETIME values isn’t very slow anyways – in fact, it’s much quicker than using PHP to convert them…)
So again, always consider how you will deal with timezones when designing your database. If you’re thinking it through enough and designing something that will scale, I don’t think the answer is as simple as always using UTC values for
DATETIME columns, and it will usually involve some sort of compromise, somewhere.
I’ve definitely learned a lot from this, and I hope you’ve learned something too. Thanks for reading, and I hope this helps!