Monday, June 6, 2011

Day I Don't Know but I'm Pretty Much Done!

Sooo I wanted to try a confidence interval of just one proportion to see another way of rejecting the hypothesis. So I did a confidence interval for the national data, which would look like

.7059+/- (1.96)*sqrt(.7059*.2941/20562119)

This gave me the interval (.7057, .7061), which is really ridiculously small, but unsurprisingly so, considering the size of the n. If we do the same for my sample data, we get (.9133, .9933), which also doesn't fit into the other proportion. Therefore we can reject the null hypothesis that p1=p2.

Yet another way to do this, and probably the easier way, is to do the 1-proportion z-test, where we put in the assumed value, the national proportion (I was first unsure of doing this considering that this data was also gathered by another amateur person, but since we are assuming, I'll accept it) as p0, and then put x in as 102 and n in as 107 with the proportion not equal to p0. When I calculated, I got the same z-value and p-value as I did with the 2-proportion z-test, cementing that this null hypothesis should be rejected.

I think that's it. I've tried pretty much everything I can think of to test this hypothesis. While there are a few flaws in my executing the data and in using the normal model cautiously, I believe that even with the perfect data, we would find that we had to reject the null hypothesis, just because of the sheer difference in proportions. I believe this also shows the increase in social networking importance and how the amount of kids using Facebook is in fact much larger than originally thought. It looks like Facebook will be here to stay. Now, about Twitter...

Wednesday, May 25, 2011

Day 8

OKAY, so let's do this shindig:

Statistic: difference of sample proportions
p1= 102/107 (observed) p2= 14514800/20562119 (expected)
p1= 95.3% p2= 70.59%
p1-p2= .2473710184

Standard Deviation of Statistics: sqrt[(.9533*.0467/107)+(.7059*.2941/20562119)]
= .0204 standard deviations

95% confidence interval, two-tail; critical value= 1.96

Confidence Interval: .2474 +/- 1.96*(.0204)
(.207416, .287384)
The true difference between probability 1 and probability 2 is between .207 and .287. Unfortunately, this does not land within enough standard deviations to fail to reject, so we would reject the hypothesis that p1 is equal to p2.

Tuesday, May 24, 2011

Day 7

Finally have all of my data collected! I have answers on whether one has facebook or not for 107 people, which I think will be enough to assume a normal model.

I crossed off the one proportion thing I did in the last post because it was wrong. I'll try to do the correct one today, as well as getting all the data into the 2 proportion z-test.

After that, I only have the two-page paper detailing my analysis of the data collected! yayyyyyyy~

Onto the actual stuff:
Proportion of people on Facebook: 102/107
5 people NOT on Facebook
All replied YES to going on the Internet/going online for leisure (not surprising at all)

84.5% of people 12-24 who go online
87% of people 12-17 who go online
27.4% of people under 20 in the US out of total population ( according to census projections from 1996)
24.3% of people under 18 years in the US from 2009 estimate
9.3% of people on Facebook are under 17 years; 14,514,800 people (as of today)

The problem at this point is finding out which one of these proportions I should use to create the proportion of people who go on facebook over the national population under 18 years. The issue is that checkfacebook.com, which is the best way to see the percentage and the people, goes only from less than 13 to 17, and then has a separate 18-24 group. What to do?

Yeah, I think I'll use the under 18 years stats with the 13-17 stats, since it doesn;t specifically say for the under 18 that 18 is included? So I think that will work.

24.3/100 =x1/307006550(2009 estimate)
x1=74602591.65 kids under 18 years
87/100= x2/74602591.65
x2= 64904254.74
14514800/x2= 22.4% which I'm fairly sure is totally inaccurate.

Now I'm wondering whether or not it would be mroe factual to use the statistic on checkfacebook.com that 70.59% of the online population use a Facebook and use that to find the total number of kids under 18 who use Facebook online. It would make a lot more sense than the confused stuff I did up there >.>

70.59/100= 14514800/x
x=20562119.28, which says the there are about 21 million kids under 18 who go online. That makes more sense, and it keeps the data from he same place and not from a census from 2009 or a census from 1995 or whatever. I think I'll use that as my proportion of national online population of kids under 18 who use Facebook.

2-prop z-test tells me it's totally off and that p1=/ p2. I'll do more of that tomorrow.

Monday, May 23, 2011

Day 6

Information:
87% of internet users 12-17
82% of internet users 18-24
Average % of internet users 12-24 is 84.5

84.5/100= 61525080/x
x= 72810745.56  incorrect assumption, trying to fix it now.

Tests: 2 proportion z-test
Confidence interval: unsure?
95% confidence- two-tail, so 2.5% on each side
z-score of conf interval: 1.96 and -1.96
mean and std dev: done later after data collected

Sources: Statistics textbook
              US Internet Users by Age Group probably not using this
              Facebook Stats

At this point, I just need my data done, and I'll have everything I need to get finished.

Sunday, May 22, 2011

Day 5

Short post today-- finally found the site I was looking for in terms of data collection. It's not a perfect fit, as this is from 12-24 and I was going to do 14-24, but I can include the little slice of 13 years and less and make it okay. As for which percentage to use, since there are two percentages for the 12-17 and 18-24 range, I think I'll just take the average and then do some nifty proportion work (84.5/100 = the number of users on facebook/x) and find that number so I can use it for my n.

As it is the weekend, and a Sunday, I will leave to do my heavy lifting for the rest of the weekdays. I forgot to collect data on Friday, and since Monday's Special Olympics, I'll finish up data Tuesday. Then it's a matter of doing the right test and analyzing the data and writing the 2-page paper. Should be finished by this time next week~! unless I procrastinate

I have 13 page views! And not all of them are mine because 2 are from Germany! Go, Germany statiscians~

Friday, May 20, 2011

Day 3

So, I have the majority of my data at this point--just need to ask the kids in my chem class, and if I can get 9 more data sets, I'll have a nice rounded 100 pieces of data. From the data so far, I have a feeling that the proportions will not be equal to each other, considering that I only have 2 'nos' for being on facebook.

However, I think I'll be changing the proportion I'm using. Initially I said I would use the overall online population in America as my proportion. I realized that asking only high school students would make up a huge error in this experiment, since I wasn't accounting for the older generation, who would definitely be more likely not to have a facebook than high school kids. I just found a little pie chart thing on checkfacebook.com that shows the age groups for the kids going there, so I can add together the 14-17 and 18-24 age range and divide that sum over the total number of the online population, so I can have a more accurate number. However, there is no specific data on the website that states the number of people from 14-24 who are online, so I may have to look that up.

This is actually pretty fun, if I don't think of it overmuch!

ETA: some cool websites I found while trying to find out online population. Of course they give us the online population of 18-24 years old in the UK and Ireland but no in the US. Obstacles, obstacles.
ETA2: another interesting one. I have no idea if it's relevant, but I'm just linking sites on here so I can look them up later~

Thursday, May 19, 2011

Day 2

I find it amusing that this blog has a section that says Stats. Which shows how many people clicked on my blog. So far there are 3 views, which is exactly how many times I checked this place. Hmm, is there a correlation?

So hypothesis testing-- I think it's a fairly easy null and alternative hypothesis to test. Null hypothesis would be that the probability 1=probability 2, in which the first probability is the collected data and the second probability in the national data. The alternative hypothesis will be that probability 1 is not equal to probability 2. I'm not positive about a confidence interval and if I need one, but if I do I'll use a 95% confidence interval.

As for data sampling, I'll do cluster sampling based on my classes, mostly because of ease. Of course, I am making a big assumption that the percentage of kids who go on facebook/the internet and are in 6.0 or AP is the same as those in 4.0 and 5.0, and to be honest the better idea would be a stratified sample, but as I am a bit short on time I'll make the assumption and have that as a possible error in the experiment. I am also assuming that the distribution in approximately normal since I'll have only about 100 people as data, so that's another shaky bit of data. Though from my experience the internet is blind to school level/grades, so I wouldn't expect too large of an issue.

I'll make a nice little sheet for people to write yes of no for having facebook, and I'll start today! Hopefully I can finish by tomorrow~