QUAC
1. Installing QUAC
1.1. PyPI and virtualenv (recommended)
1.2. Self-compile
1.2.1. Prerequisites
1.2.2. Install
pip2pi
1.2.3. Prepare the dependency package
1.2.4. Compile and install
1.2.5. Test
2. Collecting data
2.1. Wikipedia
2.2. Twitter
2.2.1. Set up authentication
2.2.2. Run the collector
2.2.3. Build the TSV files
2.2.4. Doing it seriously
3. Preprocessing data
3.1. Time series files
3.1.1. Motivation
3.1.2. File format
3.2. Twitter
3.2.1. Overview
3.2.2. File organization
3.2.3. File formats
3.2.3.1. Raw JSON tweets
3.2.3.2. TSV files
3.2.3.3. Preprocessing metadata file
3.2.3.4. Geo-located tweets
3.2.3.5. Alternatives that were considered and rejected
3.3. Wikimedia pageview logs
3.3.1. Overview
3.3.2. File organization
3.3.3. Article filtering
3.3.4. Pagecount file format
3.3.5. Time series storage
4. Map-Reduce with QUACreduce
4.1. Introduction
4.2. Summary of API
4.3. Example
4.3.1. Create sample input
4.3.2. Define the
map
operator
4.3.3. Define the
reduce
operator
4.3.4. Test the operators together
4.3.5. Prepare the job
4.3.6. Run the job with make
4.3.7. Add more input data
4.3.8. What’s next?
4.4. Distributed QUACreduce
4.4.1. Example
4.5. Drawbacks
4.6. FIXME
4.7. Footnotes
5. Time series analysis
5.1. Input file format
6. Configuration
7. Frequently asked questions (FAQ)
7.1. A caution about tweet IDs
7.2. My python script barfs with
UnicodeDecodeError
when printing tweets!
7.3. Unicode works OK in the terminal, but Python barfs when redirecting stdout
7.4.
collect
--daemon
fails to start and there’s no error message
7.5. How do I quickly see a tweet as it’s “supposed” to look?
8. Limitations
9. Citing QUAC
10. How to contribute
10.1. Basic workflow
10.1.1. Branching model
10.1.2. Doing actual work
10.1.3. Merging to
master
10.1.4. Cutting a release
10.2. Simplifying
cmdtest
updates with
meld
10.3. Code style
10.3.1. Docstrings
10.4. Documentation
10.4.1. Building the docs
10.4.2. Conventions
10.4.3. Publishing to the web
10.4.3.1. Prerequisites
10.4.3.2. Publishing
11. Release notes
11.1. The future
11.2. List of releases
11.2.1. v0.6 (October 20, 2014)
11.2.2. v0.5 (November 13, 2013)
11.2.3. v0.4 (October 10, 2013)
11.2.4. v0.3 (August 5, 2013)
11.2.5. v0.2 (May 30, 2013)
11.2.6. v0.1 (April 26, 2013)
12. Credits
QUAC
Docs
»
12. Credits
12. Credits
ΒΆ
Reid Priedhorsky, Los Alamos National Laboratory
Aron Culotta, Illinois Institute of Technology