# Web of Data / Web of Documents: Experiments and results
### - Dataset for Evaluating Query-biased Ranking of LOD entities
* Data Collection
* Queries:
* We took randomly 30 queries from the “Yahoo! Search Query Tiny Sample” offered by [Yahoo! Webscope](http://webscope.sandbox.yahoo.com/catalog.php?datatype=l)
* Documents:
* We submitted the queries to the Google search engine and we kept the top-5 Web pages for each query (==> 150 HTML Web pages).
* We extracted the text of each page.
* Finally, we annotated each text using [DBpedia Spotlight](https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki).
* CrowdSourcing
* MicroTasks
* Considering the length of our texts, the task of evaluating all the annotations of a Web page would be too demanding. Therefore, we divide this task into smaller“microtasks”. A microtask will consist in scoring the relevance of the annotations of a single sentence.
* We used the [CrowdFlower](http://www.crowdflower.com/) crowdsourcing platform.
* Quality Control
* Workers had a maximum of 30 minutes to provide an answer.
* Workers had to spend at least 10 seconds on the job before giving an answer.
* We measured the agreement between workers with the Krippendorff’s alpha coefficient.
* To improve the quality of our dataset, we removed the workers that often disagreed with the majority.
* Aggregation of the Results
* We used majority voting for aggregating the results inside each sentence.
* We used the same majority voting strategy to aggregate the results at the level of a Web page.
* Data
* [Used Queries (TXT)](./data/ranking_dataset/queries.txt)
* [Extracted & Annotated Text (ZIP)](./data/ranking_dataset/html_text.zip)
* [Sentences (ZIP)](./data/ranking_dataset/sentences.zip)
* [CrowdSourcing raw data (CSV)](./data/ranking_dataset/raw_data.csv)
* [Aggregated dataset (CSV)](./data/ranking_dataset/aggr_data.csv)
### - Resource Ranking
* Algorithms
* [LDRANK v0.9](./data/ldrank_09.zip) source code under GPL licence ; it contains the four algorithms being compared in this work (i.e., (i) LDRANK (labeled "HIT+SVD+EQUI" on the graphs below), (ii) the algorithm proposed by [Fafalios and Tzitzikas](http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6881999) (labeled "HIT" on the graphs below), (iii) a new algorithm based on the SVD decomposition (labeled "SVD" on the graphs below), and (iii) a basic PageRank (labeled "EQUI" on the graphs below))
* [Dataset](./data/inputs_20150117_eval_trust.zip) obtained from crowdsourcing and used to compare the algorithms
* Results:
* You can download the [scripts](./data/ldndcg.zip) that have been used to generate the performance and efficiency results given the algorithms and the dataset (you could start by looking at the script named "go.sh").
![NDCG](./img/ndcg.png)
![Performance](./img/clock.png)
### - Sentence Selection
* [Full technical report](./pdf/ml_technical_report.pdf)
* Download datasets:
* Original datasets
* [concept-sentence dataset](./data/concept.csv)
* [main-sentence dataset](./data/main.csv)
* Datasets after resolving the unbalanced classes problem (using the SMOTE filter)
* [concept-sentence dataset](./data/concept.smote.arff)
* [main-sentence dataset](./data/main.smote.arff)
* Used softwares:
* [Weka](http://www.cs.waikato.ac.nz/ml/weka/).
* [SMOTE Filter](http://weka.sourceforge.net/doc.packages/SMOTE/weka/filters/supervised/instance/SMOTE.html).
* Results:
* results using all the features
![results using all the features](./img/ml_results.png)
* machine learning performance results over main-sentence dataset (using features selection)
We apply **InfoGain** filter to select the *start subset* (first line), then we apply a forward **Naive Bayes** wrapper (second line),
finally we measure the prediction performance using 10 fold cross validation, and F measure.
![machine learning performance results over main-sentence dataset (using features selection)](./img/mainfsfilterwrapper.png)
* machine learning performance results over concept-sentence dataset (using features selection)
We apply **InfoGain** filter to select the *start subset* (first line), then we apply a forward **J48** wrapper (second line),
finally we measure the prediction performance using 10 fold cross validation, and F measure.
![machine learning performance results over concept-sentence dataset (using features selection)](./img/conceptfsfilterwrapper.png)
### - ENsEN: Crowdsourcing-based User Evaluation
* We selected randomly 10 tasks from the [“Yahoo! Answers Query To Questions”](http://webscope.sandbox.yahoo.com/catalog.php?datatype=l) dataset.
* Each task was made of three questions on a common topic.
* To each task corresponds a job on the CrowdFlower platform.
* We collected 20 judgments for each task.
* Half of the workers was asked to use our system, and the other half used Google.
* In order to control that a worker answered the task by using our system, we generated a code that the worker had to copy and paste into her answer.
* Only complete answers were considered correct.
* We also monitored the time spent to answer the tasks. Thus, ENsEN is clearly beneficial to its users in terms of usefulness.
* Data:
* [Used Tasks with answers (ZIP)](./data/tasks_text.zip)
* [Aggregated data - ENsEN (XLS)](./data/ensen_tasks.xls)
* [Aggregated data - Google (XLS)](./data/google_tasks.xls)
* Results:
![Time Spent over each task](./img/spent_time.png)
![Correctness](./img/correctness.png)