Josh Berkus
One of the most rapidly growing types of high-load applications today is the "high-volume data collector". Such applications collect thousands of facts per second and store them in one or more database systems for later summarization and analysis. Examples include fault reporting systems, hardware telemetry, and security and web monitoring. The challenges of such systems are several: coping with billions of inserts, storage of terabytes of data, and the integration of disparate processes, databases, and data processing tools. The biggest challenge, though, is allowing for component upgrade, replacement, and failure while continuing to process data 24/7, because the firehose never, ever, shuts off. PostgreSQL Core Team member Josh Berkus has worked on several of these systems in the last year, including Mozilla's Socorro crash reporting system, monitoring of power generation systems, and high-volume financial transaction reporting. This talk will explore some of the lessons he has learned and open-source tools he has employed in dealing with these applications.