Traffic analyzed

Posted on Tuesday 26 February 2008
Categories: Uncategorized
View Comments

My blog has been up and running for a couple of months now, and the traffic has been quite good. I’m using Google Analytics to monitor my traffic and some interesting patterns have emerged in the data. It is perhaps not that surprising that the majority of visitors comes through a search engine (about 60 %) and that that search engine is Google. But the interesting thing is that about 50 % that traffic has used search queries contain the words “facebook” and “down”. Entering this query in Google finds my previous entry about Facebook being down for a while, at least in Finland. But the question is, why are so many people searching for information about Facebook being down?

 

Only reason for this is that Facebook has been down and people are seeking information about it. This of course doesn’t mean that Facebook actually has been down for all these people. Some of them may have had their cable disconnected or other computer problems and it may have seemed that Facebook is down, although the whole Internet has been down for that computer. But still, I think that the traffic may indicate that Facebook has been down occassionally in Finland, the US, the UK, Canada and Australia, where most of my traffic comes from.

Workshop: Second Life koulutuskäytössä 16.4.2008

Posted on Monday 18 February 2008
Categories: Uncategorized
View Comments

 Työpajassa tutustutaan esimerkkien avulla Second Lifen tarjoamiin mahdollisuuksiin opetuksessa ja koulutuksessa ja pohditaan yhdessä uusia opetuksellisia käytäntöjä elämyksellisyyden, läsnäolon ja visualisoinnin sekä dramatisoinnin näkökulmista. Päivän aikana työstetään työpajan vetäjien ja osallistujien kokemusten pohjalta erilaisia tapoja käyttää Second Lifea tehokkaana oppimisympäristönä erityyppisissä koulutustilanteissa ja osana eri alojen opetustoimintaa. Työpajassa ei harjoitella Second Lifen peruskäyttöä. Osallistujilta toivotaan siksi perustietoja Second Lifesta (mikä Second Life on), mutta keneltäkään ei edellytetä sen laajempaa virtuaalimaailmojen käyttökokemusta. 

 

INTERAKTIIVINEN TEKNIIKKA KOULUTUKSESSA (ITK) – KONFERENSSI

 

WORKSHOP -PÄIVÄ 16.4.2008

 

Ilmoittautuminen

 

Osallistu työpajaan ja ota Second Life haltuusi!

 

 

 

Ohjelma

10.00 Alustava luento Second Lifesta ja virtuaalisen kolmiulotteisen maailman tuomista mahdollisuuksista kouluttamisessa ja oppimisessa.  

10.30 Teema 1: Virtuaalitilat ja elämyksellisyys oppimisessa. Esitellään session teema jonka jälkeen teeman tarjoamia mahdollisuuksia pohditaan ryhmissä.  

12.00 Lounas (omakustanteinen)  

13.00 Teema 2: Läsnäolon tunne Second Lifessa. Esitellään session teema jonka jälkeen teeman tarjoamia mahdollisuuksia pohditaan ryhmissä.  

14.30 Tauko  

14.50 Teema 3: Mahdollisuudet visualisointiin ja dramatisointiin. Esitellään session teema jonka jälkeen teeman tarjoamia mahdollisuuksia pohditaan ryhmissä.  

16.20 – 17.00 Keskustelua ja yhteenveto  

 

Ilmoittautuminen

 

 

Second Life is a trademark of Linden Research, Inc. This research is not affiliated with or sponsored by Linden Research.

World of Warcraft and Coke add in China

Posted on Friday 15 February 2008
Categories: Uncategorized
View Comments

 Last autumn I was using Second Life on a course in Interactive media. The choice of using Second Life as a platform to deliver lectures was questioned by one my students in her blog. She would have preferred that we used World of Warcraft. Well, the reasons for choosing Second Life are simple. First of all Second Life gives us freedom and tools to create anything from futuristic lecture halls to social groups and structures to experiences. Things not possible in World of Warcraft. And the second reason… the students would probably have kicked my ass during a lecture in WoW.

 

Here’s something for all WoW fans (and also something for me to test my new YouTube plugin for Wordpress).

 

[youtube:http://www.youtube.com/watch?v=Vc8rWbplKhg]

Dark Web Terrorism Research at the University of Arizona

Posted on Friday 15 February 2008
Categories: Uncategorized
View Comments

I first found this story through a recent entry on ReadWriteWeb, which basicly was a summary of the description of the research project at the Artificial Intelligence Lab at the University of Arizona. Here’s how the AILab summarizes their research goal:

 

“The AI Lab Dark Web project is a long-term scientific research program that aims to study and understand the international terrorism (Jihadist) phenomena via a computational, data-centric approach. We aim to collect “ALL” web content generated by international terrorist groups, including web sites, forums, chat rooms, blogs, social networking sites, videos, virtual world, etc.”

 

The goal is clear; collect all available data from the web (even from password protected pages) and analyse it with various methods like social network analysis and content analysis. Social network analysis is very interesting and it has become more and more popular in various fields of research. It has also been previously used to study terrorist cells and the networks they form (look at Uncloaking terrorist networks and Mapping Networks of Terrorist Cells by Valdis E. Krebs, Social Network Analysis of Terrorist Organizations in India by Aparna Basu, Analyzing the Terrorist Social Networks with Visualization Tools by Yang, Liu and Sageman and Social Network Analysis as an Approach to Combat Terrorism: Past, Present, and Future Research by Ressler, to name a few).

 

The amount of data collected for this project is enormous. Two terabytes of data, 500 million pages, files and postings from over 10 000 sites. That is a huge amount of data, which emphasizes the need for good automated content analysis. When the web is used as a source of information, whether it’s link data or content from web pages, the amount of data can be huge and we need reliable tools to find the goldbites from the junk in that data.

 

There is one thing about the project that will hopefully raise some discussion, and that is the collection of content from password protected pages. It’s an ethical dilemma. Is it ethical to crawl password protected pages for research purposes? Is it even ethical to ignore the robots.txt? And can there be any exceptions? Can we crawl password protected pages if the data is collected for a “good cause”? Who should decide whether a cause is good enough?

Web Impact Factor of webometrics sites

Posted on Wednesday 13 February 2008
Categories: Uncategorized
View Comments

The “traditional” formula for counting Web Impact Factors or WIFs is the total number of link pages (both internal inlinks and external inlinks) divided by the total number of web pages indexed by the search engine. This gives a figure of what impact the studied sites have on the web. It’s a simple measure of how often a single page has been linked to in average or what kind of impact a single web page has created on average. Web sites with higher WIF could be considered to be more prestigious or to have higher impact on the web than sites with a lower WIF.  

 

The size of my site is 648 pages and the total number of  inlinks is 787, according to Altavista . Altavista couldn’t search for pages or link pages directly to my blog, so I counted these for the whole site. This gives us a web impact factor of 1.21. Two other webometricians also have a presence on the web. Webometric Thoughts has 863 inlinks and 350 pages. This gives Webometric Thoughts a web impact factor of 2.47. The “original” webometrics blog has 146 inlinks and 100 pages, which gives the site a web impact factor of 1.46. Third place for me then.

 

We are all using a content management system (Joomla) or a blogging system (Blogger and Wordpress Mu) or both. All of these create navigational links automaticly and may have an impact on the web impact factors. We might want to repeat the queries but by leaving the internal inlinks out. This give my site a total of 461 inlinks and a web impact factor of 0.71. Webometric Thoughts has 526 external inlinks and a new web impact factor of 1.50. And finally the first webometrics-blog has 57 external inlinks and a web impact factor of 0.57. A second place for me this time. At least I have the biggest site :-) . But is it possible to count web impact factors for blogs?

 

When looking at the inlinks all three of us receive it comes very fast clear that maybe as much as the majority of the link pages are feed aggregators, aggregating the content from our blogs. These can’t surely indicate impact of our sites? We have just used words in our entries that these feed aggregators have been looking for. It may not be possible to count Web Impact Factors for blogs, at least not with search engines, until feed aggregators can be excluded from the results. Until then, best to use Technorati or Alexa to rank blogs.

 

Look at http://cybermetrics.wlv.ac.uk/QueriesForWebometrics.htm for advanced queries on search engines for webometrics.

Just when I had posted a tweet…

Posted on Tuesday 12 February 2008
Categories: Uncategorized
View Comments

Just when I had posted a tweet saying that you may see an increase of shorter entries, I posted an 11 page manual for BibExcel.

Manual: Co-word analysis on BibExcel

Posted on Tuesday 12 February 2008
Categories: Uncategorized
View Comments

BibExcel is a bibliometric tool-box developed by professor Olle Persson. This manual is based on a manual written by professor Olle Persson. Download this manual as a pdf-file from here.

 

****************************************************

 

Preparing the data 

 

Data that is collected from i.e. Web of Science, WinSpirs/Silverplatter or Endnote must be converted before it can be analyzed. Picture 1 below shows a bibliographic description in such a format that BibExcel can read it. Every field must end with a ”spike” and the last field must end in a ”double spike”. Field tags are given with two letters, followed by a – and a space.

 

FN- DIALOG(R)File   7:Social SciSearch(R)| CZ- (c) 1999 Inst. for Sci Info. All rts. reserv.|
AZ- 03282328|
GA- 160PZ|
TI- Collaboration and author productivity: A paper with a new variable inLotka’s law|
AV- ABSTRACT AVAILABLE|
LA- English|
AU- Gupta BM (REPRINT); Karisiddippa CR|
CS- NATL INST SCI TECHNOL & DEV STUDIES,SCIENTOMETR & INFORMETR GRP/NEW DELHI 110012//INDIA/ (REPRINT); KARNATAK UNIV,DEPT LIB & INFORMAT SCI/DHARWAD 580003/KARNATAKA/INDIA/|
GL- INDIA|
JN- SCIENTOMETRICS, 1999, V44, N1, P129-134
PU- ELSEVIER SCIENCE BV, PO BOX 211, 1000 AE AMSTERDAM, NETHERLANDS|
SN- 0138-9130|
PY- 1999|
DT- Article|
NR- 8|
SF- CC SOCS–Current Contents, Social & Behavioral Sciences|
SC- INFORMATION SCIENCE & LIBRARY SCIENCE|
AB- The paper explores the possibility of using a new variable represented  by the number of collaborators per author as a substitute for the number of papers in Lotka’s distribution to predict the productivity strata.On the basis of a case paper in theoretical population genetics it is concluded|

CR- FELSENSTEIN J, 1981, BIBLIO THEORETICAL P
    LINDSEY D, 1980, V21, P111, SOC STUD SCI
    NICHOLLS PT, 1988, V39, P287, J AM SOC INFORM SCI
    NICHOLLS PT, 1989, V40, P379, J AM SOC INFORM SCI
    PRICE DJD, 1966, V21, P1011, AM PSYCHOL
    QIN J, 1995, P445, 5 INT C INT SOC SCIE
    ROUSSEAU R, 1988, V39, P287, J AM SOC INFORM SCI
    ROUSSEAU R, 1993, V49, P409, J DOC||

 

When you save a file from WoS it is automaticly named as Savedrex in txt type. The file is in other words saved as a text file.You then need to re-save the file. Open it in Microsoft Word, or some other similar program. Save it as a plain text file. In the next small window that opens, mark the box “Insert line breaks” and make sure CR/LF is chosen. This is a good time to change the name, so that the original file will not get messed up even by an accident.

 

Next you need to open this new file in BibExcel and convert it to Dialog format.

 

1. Mark directory, in this case drive C:

 

2. In Select file here, open the folder the file is located in and choose the file from the right.

 

3. Go to  Miscellaneous (Misc) and select  Convert to Dialog-format  and mark Convert from Web of Science. Click OK and the file is converted to Dialog format 4 and saved as a .doc file. The original file is left untouched.

 

 

You can view these two files by marking them from the space above ”View file” button, and clicking on ”View file”. You can see the file in “The list”. You can compare the before and after converting files, and see how the delimiters, spikes and dubble spikes are inserted.

 

 
 

Co-word analysis

 

1. Open your data file in BibExcel (.txt). Choose “View file” to open it in The List.

 

 

 

2. From ”Frequency distribution”-box, choose from the drop-down menu ”Whole string” check the checkbox labeled ”Make new out-file” and write in Old tag-field DE (Descriptors or what ever label you use). This will create a new file, a .oux -file.

 

 

 

3. Choose the .oux -file and open it to The List by clicking on “View file”. Then, from the “Select field to be analysed…” -box, choose from the drop-down menu ”Any; separated field” press on the Prep-button.  This will create a new .out -file, where all the keywords are listed by cases.

 

 

 

4. Open the .out -file in to The List. Choose from the Analyze-menu, ”Frequencies, using outfile-type”. This will create a .frg -file where the keywords are listed with their frequencies (how many times the keyword has appeared in all the cases).

 

 

 

5. Copy these by clicking on Copy and move them to Excel. In Excel sort the keywords descending by their frequency. Then copy the most frequent keywords and their frequencies back to BibExcel.

 

If you choose too many keywords, the maps will be too big to draw or to interpret the results properly. If you choose too few keywords, you probably will not get any interesting results. So choose wisely. Also make sure to choose some meaningful frequencies. If for example the ten most frequent keywords all have a frequency over 50 and the rest of the keywords have only appeared a couple of times each, then it is probably meaningful to take only the ten most frequent keywords to further analysis. Around 40-50 keywords makes still quite nice maps. A maximum is perhaps around 80-90 keywords.

 

Clear The List by clicking on Clear. Paste keywords into BibExcel and to The List by clicking on Paste button.

 

 

 

6. Now mark the .out -file from the file list and choose Analyze –> Co-occurrence –> Make pairs via listbox. Answer NO to the first question and OK to the second. This will create a .coc -file.

 

 

 

 

7. Continue with the .coc -file and choose Analyze –> List units in pairs.
This will result in a .ccc -file.

 

 

 

8. Open the .ccc -file into The List and mark the .coc -file from the filelist.
Choose Analyze –> ”Make a matrix” and answer the question in a way that suits your research. Probably, OK to the first question, YES to Lower left matrix, YES to sort the columns, NO to sort them numerically and finally OK.

 

 

 

9. Then choose Analyze –> Make a map/Systat… and answer OK to all the questions.

 

 

 

10. In the opening window write ”Submit map”. Then back in BibExcel choose  Analyze –> Show map.

 

 

 

11. Choose colors as you please and return to BibExcel by double-clicking on the map.

 

 

 

12. Open the .lab –file into The List by clicking on View File. Then click on ”Show map”.

 

 

 

13. In the map click on Labels.

 

 

 

14. Return to BibExcel by double-clicking on the map. Clear The List and Paste the frequencies that you copied earlier from Excel.

 

 

 

 

 15. Return to the map, click Show map, check the checkbox Zoom+ and click on Circle size. This will change the circle sizes to match their frequencies.

 

 

 

And you’re done. Congratulations!

Content analysis of publications from communication research

Posted on Tuesday 12 February 2008
Categories: Uncategorized
View Comments

Some time ago I did a content analysis of publications from communication research in Finland. This was a small project ordered by the University Network for Communication Sciences in Finland. I studied the possibilities to compare studies within the field of communication and information research. The goal was to look at what 15 different departments in the network might have in common and also to develop some methods to do this.

I collected references from the 2964 publications that I found from the last ten years from 13 departments. I collected the data partly from university libraries and Nordicom’s database and partly by contacting the departments and researchers and asking them to send me lists of their publications. The publications were then indexed with Nordicom’s thesaurus and converted to a format that BibExcel could use. BibExcel was then used to do a co-word analysis of the material.

 

There were some challenges and problems during the data collection. There were great differences in the amount of publications between different departments. Some departments are more practicly oriented and they do not have that many publications. It was also difficult to find all available publications and some publications may have been left outside the analysis. Another challenge was indexing. I indexed all the publications alone and among the publications there were some to me unfamiliar topics. In these cases I was forced to use titles for indexing and in some cases were the titles were not describing enough, I just had to exclude the publication from the analysis. This may have affected the proportion of general and specific keywords. Another problem was that Nordicom have used three different thesauruses and libraries also use different thesauruses. So indexing almost 3000 publications was somewhat a creative task. It is also unclear how the included masters theses might have influenced the results. It can be argued that masters theses would represent the research in the departments, but there might be some differences. The last problem was time. I had a month to do this.

The data was then analyzed with the bibliometric tool-box BibExcel. The first graph was the graph of the whole network, then I used the most frequent words from each department to draw the frequencies on the graph based on the whole network. All the departments gave slightly different graphs, or frequencies for different words, but the underlying graph was stable and didn’t change with the department. This way I could compare the patterns each department gave on the whole university networks graph. Similar pattern indicated similar research, because the same word had been used to describe the research field(s) in the departments.

 

Below are the frequencies of appearance of co-words in the publications from six departments. The patterns are quite similar, indicating similar publications, or topics of publications, hence, similar research. The combining factor for these departments is communication.

There are three departments of information studies in Finland. These can be seen from the image below. These also have a quite similar pattern, indicating similar research.

And finally four more departments which could be called outliers, as they do not clearly share patterns with any other departments.

As a small final exercise I asked the professors from each department to send me key terms describing the research at their department. This exercise was not in any way scientific, because the words were chosen by just one person, but the comparison between what the professors thought of the research at their department would be interesting to compare with the actual research at their department. I had two list of words; the list given by the professors and a list with the most frequent key words from the publications. Then I compared these lists, as shown in the figure below.

This exercise was more of a test of the method and the results. Did I get the right key words? Had I made any major errors while indexing the publications? This exercise showed that the most of the words chosen by the professors were among the most frequent words, confirming the reliability of the results.

 

And because I had the data, I used it to draw the networks based on the co-words for each departments. Below is the pattern for my department. For comparing departments, these do not have as much value as the earlier graphs which were based on a single graph of the whole network, but these single graphs were of more interest for the departments themselves.

 

The results showed that the departments in the university network did indeed have something in common, but that there were some outliers as well. The most interesting part of this study was the method development. The method used here could be useful in studies where one might for instance want to compare how smaller networks or clusters within a larger network relate or compare to each other. The data doesn’t have to be publications and the connections doesn’t have to be co-words. The data could be social ties or information flow. My last advice for anyone considering this kind of study, do not index 3000 publications by yourself. Get some student to do it or do some other study.

You may see more of short entr…

Posted on Monday 11 February 2008
Categories: Uncategorized
View Comments

You may see more of short entries in the future (140 char.), as I’ve embeded my Twitter-stream (www.twitter.com) on my blog now.

I’m trying to embed my tweets …

Posted on Monday 11 February 2008
Categories: Uncategorized
View Comments

I’m trying to embed my tweets directly on my blog.

Next Page »