MySQL Whats better for speed one table with millions of rows or managing multiple tables?

| | August 6, 2015

Im re working an existing PHP/MySql/JS/Ajax web app that processes a LARGE number of table rows for users. Here’s how the page works currently.

  1. A user uploads a LARGE csv file. The test one I’m working with has 400,000 rows, (each row has 5 columns).
  2. Php creates a brand new table for this data and inserts the hundreds of thousands of rows.
  3. The page then sorts / processes / displays this data back to the user in a useful way. Processing includes searching, sorting by date and other rows and re displaying them without a huge load time (thats where the JS/Ajax comes in).

My question is should this app be placing the data into a new table for each upload or into one large table with an id for each file? I think the origional developer was adding seperate tables for speed purposes. Speed is very important for this.

Is there a faster way? Is there a better mouse trap? Has anyone ever delt with this?

Remember every .csv can contain hundreds of thousands of rows and hundreds of .csv files can be uploaded daily. Though they can be deleted about 24 hrs after they were last used (Im thinking cron job any opinions?)

Thank you all!

A few notes based on comments:

  • All data is unique to each user and changes so the user wont be Re accessing this data after a couple of hours. Only if they accidentally close the window and then come right back would they really re visit for the same .csv.
  • No Foreign keys required all csv’s are private to each user and dont need to be cross referenced.

2 Responses to “MySQL Whats better for speed one table with millions of rows or managing multiple tables?”

  1. Given 105 rows and 102 CSVs per day, you’re looking at 10 million rows per day (and you say you’ll clear that data down regularly). That doesn’t look like a scary figure for a decent db (especially given that you can index within tables, and not across multiple tables).

    Obviously the most regularly used CSVs could be very easily held in memory for speed of access – perhaps even all of them (a very simple calculation based on next to no data gives me a figure of 1Gb if you flush every over 24 hours. 1Gb is not an unreasonable amount of memory these days)

  2. I would shy away from putting all the data into a single table for the simple reason that you cannot change the data structure.

    Since the data is being deleted anyway and you don’t have a requirement to combine data from different loads, there isn’t an obvious reason for putting the data into a single table. The other argument is that the application now works. Do you really want to discover some requirement down the road that implies separate tables after you’ve done the work?

    If you do decide on a single table, then use table partitioning. Since each user is using their own data, you can use partitions to separate each user load into a separate partition. Although there are limits on partitions (such as no foreign keys), this will make access the data in a single table as fast as accessing the original data.

Leave a Reply