Modeling

From GrassrootsPedia

Jump to: navigation, search


Contents

What is modeling?

Level 1: Precinct targeting

Traditionally campaigns have targeted voters by geography. That is they have been forced to assume that everyone in a particular precinct as behaves similarly. Using data on past election results (like NCEC data), campaigns have been able to target some entire precincts as ‘persuadable’ and some as worth of being GOTV’d.

With the advent of good person-level information we can refine our targeting further …

Level 2: Basic person-level targeting

For many years there has been basic person-level data available, which is used to add a finer level of detail to the precinct targeting. This enabled campaigns to subdivide precincts. For example, a campaign can look within a given ‘persuadable’ / GOTV precinct at just registered Democrats. Or just older female Democrats. Or even combining many pieces of basic information to find groups like younger African American democrats.

Good person-level commercial information and advanced computing we can go further …

Level 3: Modeling

Modeling allows us to use computers to find patterns in very complex data sets that we couldn’t find by hand. On a basic level, modeling is the process of taking a small set of data -- typically survey results or IDs -- and then using those results to predict attitudes of an entire universe. In a campaign context, this generally plays out as follows:

1. You have a set of data on people from a voter file. This includes information like gender, vote history, census information, and hopefully some commercial data, such as martial status, length of residence, etc. It's important to distinguish between different groupings of data, like individual, household, census level, county level, and state level data. Having more individual or household level data points (as opposed to census level data points) will greatly improve the quality of your model.

2. You have a subset of info -- support IDs, or a scientific poll taken on a good number of people in your universe. Something like 5,000 IDs would be ideal. The more unbiased the sample of people on whom the IDs are based the better. Its best for it to be as representative as possible of the complete set of people you're interested. Try to make it balanced by region, gender, age and other things. The more unbiased the question that they were asked the better, also.

3. You want to take the lessons of the subset and apply them to the rest of the file. Typically you would want to give everyone on the voter file a score from 0-100 for a yes/no question (e.g., supports your candidate, will turnout to vote, etc.) indicating the percentage likelihood that a given individual will come up positive for that question.

To generate model scores, you follow three steps:

1. Isolate the data that you want (the dependent variable). This is typically the data you've collected, like support scores or survey results. If you are using IDs that are stored in your voter file, consider taking a subset of the IDs that you really trust. You might want to exclude some canvassers that you don't trust, or some IDs entered from an event where everyone at the event was a supporter: they aren't a balanced set of IDs and will sway the model.

2. Figure out what might correlate to the dependent variable. This is a fancy way of describing the vote history, party affiliation, and census and consumer information that you have for most everybody on the file. You don't need to decide yourself what DOES correlate. The program will do this. But gather the data together that MIGHT.

3. The program that you use create the model will tell you which of those variables is important, and will then create a "combination of coefficients" which it will use to create your score.

At the end of this process, you'll have a model. The next step -- which is very important -- is to verify your results. Call through a number of people who had not been previously ID'ed, and see if the model seems like a good predictor of support. The more people you call, the better. You should probably aim to call about a hundred people who the model thought would be 30% likely to support your candidate and see if around 30 of them were. Then call around 100 people who the model thought would be 70% likely to support your candidate. If around 70 of them do, then it looks like your model is a helpful predictor of actual behavior.

Once your model has been verified, append it to your original voter file and start cutting better universes. Organizers and volunteers will be able to pull lists of voters who are, for example, 30%-70% likely to support your candidate for persuasion programs, and 70%+ likely to support your candidate to GOTV them.

Examples of models

Turnout – Turnout scores are typically on a scale of 1-100. This score represents the likelihood that someone will vote in a given election. It is based on past vote history as well as demographic data to help predict a particular voter’s turnout. In general, scores around the middle (i.e. 30-70 or 20-80) represent the best targets for a GOTV program. Voters with a high score (i.e. 80+) are almost certain to vote and are not great targets for a GOTV program. Voters with a low score (0-20) are often voters who are no longer at their address (students, moved voters) where a GOTV program would not be that productive. Turnout scores can also be used in conjunction with traditional vote history. For example, one could select all dropoff voters (04 not 02) and then narrow that screen to all voters with a turnout between 20 and 80.

Candidate Scores – You may see several candidate or party ID scores. They probably mean different things, so a 70% Granholm score does not equate to 70% Granholm support. The percentiles try to account for the differences among the candidates. Selecting scores of 50%+ for either candidate should get you the best scoring half of the state.

FAQ

Q: If your original survey was only of certain demographic or geographic location, can you still apply your model to the entire voter file?

A: Your results are only valid for the universe of your original survey. If you only called women, your model would only be predictive of female behavior. If your IDs are only for people in Baltimore City, beware of applying a model created from them to people who live in rural areas.

Q: Given limited resources and statistical knowledge, how do candidates get through this?

A: Catalist provide lots of data points that can help models to be built. If you're thinking about it, contact Copernicus Analytics, Ken Strasma or other modeling firms. They will talk to you about doing it properly. If there isn't budget for that, some modeling companies will work with you on your data, including finding gaps to fill in, rather than conducting an entire new poll. If, for instance, all of your IDs focus on certain counties, they'll help you by telling you which counties you need to call into to create a valid model for your entire universe.

With that said, more expensive models produce better results, and building a home-grown model, or a model on the cheap, shouldn't been seen as an equal substitute to a full vendor model.

... but don't think that the $100,000 option is the only one out there.

Personal tools