BIP! Services

@BipServices
18 Followers
9 Following
15 Posts

Amplifying valuable research

Suite of tools & resources to facilitate literature exploration.

Websitehttps://bip.imsi.athenarc.gr/
Twitterhttps://twitter.com/BipFinder
📢 v1.7 of our BIP! NDR dataset, containing ~3.3M citations made by ~183K #openAccess Computer Science conference & workshop articles without DOI, is now out: https://zenodo.org/records/13693321
BIP! NDR (NoDoiRefs): a dataset of citations from papers without DOIs in computer science conferences and workshops

In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, raising limitations to citation analysis. While the Microsoft Academic Graph (MAG) previously addressed this issue by providing substantial coverage, its discontinuation  has created a void in available data. BIP! NDR aims to alleviate this issue and enhance the research assessment processes within the field of Computer Science. To accomplish this, it leverages a workflow that identifies and retrieves Open Science papers lacking DOIs from the DBLP Corpus, and by performing text analysis, it extracts citation information directly from their full text. The current version of the dataset contains ~3.3M citations made by approximately 183K open access Computer Science conference or workshop papers that, according to DBLP, do not have a DOI. File Structure: The dataset is formatted as a JSON Lines (JSONL) file (one JSON Object per line) to facilitate file splitting and streaming.  Each JSON object has three main fields: “_id”: a unique identifier, “citing_paper”, the “dblp_id” of the citing paper, “cited_papers”: array containing the objects that correspond to each reference found in the text of the “citing_paper”; each object may contain the following fields: “dblp_id”: the “dblp_id” of the cited paper. Optional - this field is required if a “doi” is not present. “doi”: the doi of the cited paper. Optional - this field is required if a “dblp_id” is not present. “bibliographic_reference”: the raw citation string as it appears in the citing paper. Changes from previous version: Added more papers from DBLP.

Zenodo
📢 Version 15 of BIP! DB is now available, featuring citation-based indicators calculated for >209mi research products: https://zenodo.org/records/13642957 #opendata
BIP! DB: A Dataset of Impact Measures for Research Products

This dataset contains citation-based impact indicators (a.k.a, "measures") for ~209M distinct PIDs (persistent identifiers) that correspond to research products (scientific publications, datasets, etc). In particular, for each PID, we have calculated the following indicators (organized in categories based on the semantics of the impact aspect that they better capture): Influence indicators (i.e., indicators of the "total" impact of each research product; how established it is in general) Citation Count: The total number of citations of the product, the most well-known influence indicator. PageRank score: An influence indicator based on the PageRank [1], a popular network analysis method. PageRank estimates the influence of each product based on its centrality in the whole citation network. It alleviates some issues of the Citation Count indicator (e.g., two products with the same number of citations can have significantly different PageRank scores if the aggregated influence of the products citing them is very different - the product receiving citations from more influential products will get a larger score).   Popularity indicators (i.e., indicators of the "current" impact of each research product; how popular the product is currently) RAM score: A popularity indicator based on the RAM [2] method. It is essentially a Citation Count where recent citations are considered as more important. This type of "time awareness" alleviates problems of methods like PageRank, which are biased against recently published products (new products need time to receive a number of citations that can be indicative for their impact). AttRank score: A popularity indicator based on the AttRank [3] method. AttRank alleviates PageRank's bias against recently published products by incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher's preference to examine products which received a lot of attention recently. Impulse indicators (i.e., indicators of the initial momentum that the research product received right after its publication) Incubation Citation Count (3-year CC): This impulse indicator is a time-restricted version of the Citation Count, where the time window length is fixed for all products and the time window depends on the publication date of the product, i.e., only citations 3 years after each product's publication are counted. More details about the aforementioned impact indicators, the way they are calculated and their interpretation can be found here and in the respective references (e.g., in [5]). From version 5.1 onward, the impact indicators are calculated in two levels: The PID level (assuming that each PID corresponds to a distinct research product). The OpenAIRE-id level (leveraging PID synonyms based on OpenAIRE's deduplication algorithm [4] - each distinct article has its own OpenAIRE id). Previous versions of the dataset only provided the scores at the PID level. From version 12 onward, two types of PIDs are included in the dataset: DOIs and PMIDs (before that version, only DOIs were included).  Also, from version 7 onward, for each product in our files we also offer an impact class, which informs the user about the percentile into which the product score belongs compared to the impact scores of the rest products in the database. The impact classes are: C1 (in top 0.01%), C2 (in top 0.1%), C3 (in top 1%), C4 (in top 10%), and C5 (in bottom 90%). Finally, before version 10, the calculation of the impact scores (and classes) was based on a citation network having one node for each product with a distinct PID that we could find in our input data sources. However, from version 10 onward, the nodes are deduplicated using the most recent version of the OpenAIRE article deduplication algorithm. This enabled a correction of the scores (more specifically, we avoid counting citation links multiple times when they are made by multiple versions of the same product). As a result, each node in the citation network we build is a deduplicated product having a distinct OpenAIRE id. We still report the scores at PID level (i.e., we assign a score to each of the versions/instances of the product), however these PID-level scores are just the scores of the respective deduplicated nodes propagated accordingly (i.e., all version of the same deduplicated product will receive the same scores). We have removed a small number of instances (having a PID) that were assigned (by error) to multiple deduplicated records in the OpenAIRE Graph. For each calculation level (PID / OpenAIRE-id) we provide five (5) compressed CSV files (one for each measure/score provided) where each line follows the format  "identifier <tab> score <tab> class". The parameter setting of each measure is encoded in the corresponding filename. For more details on the different measures/scores see our extensive experimental study [5] and the configuration of AttRank in the original paper. [3] Files for the OpenAIRE-ids case contain the keyword "openaire_ids" in the filename.   From version 9 onward, we also provide topic-specific impact classes for PID-identified products. In particular, we associated those products with 2nd level concepts from OpenAlex; we chose to keep only the three most dominant concepts for each product, based on their confidence score, and only if this score was greater than 0.3. Then, for each product and impact measure, we compute its class within its respective concepts. We provide finally the "topic_based_impact_classes.txt" file where each line follows the format "identifier <tab> concept <tab> pagerank_class <tab> attrank_class <tab> 3-cc_class <tab> cc_class". The data used to produce the citation network on which we calculated the provided measures have been gathered from the OpenAIRE Graph v8.0.0, including data from (a) OpenCitations' COCI & POCI dataset, (b) MAG [6,7], and (c) Crossref. The union of all distinct citations that could be found in these sources have been considered. In addition, versions later than v.10 leverage the filtering rules described here to remove from the dataset PIDs with problematic metadata. References: [1] R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab. [2] Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380 [3] I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020) [4]  P. Manghi, C. Atzori, M. De Bonis, A. Bardi, Entity deduplication in big data graphs for scholarly communication, Data Technologies and Applications (2020). [5] I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access) [6] Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839 [7] K. Wang et al., "A Review of Microsoft Academic Services for Science of Science Studies", Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045     Find our Academic Search Engine built on top of these data here. Further note, that we also provide all calculated scores through BIP! Finder's API.  Terms of use: These data are provided "as is", without any warranties of any kind. The data are provided under the CC0 license. More details about BIP! DB can be found in our relevant peer-reviewed publication: Thanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia Manola, Paolo Manghi: BIP! DB: A Dataset of Impact Measures for Scientific Publications. WWW (Companion Volume) 2021: 456-460 We kindly request that any published research that makes use of BIP! DB cite the above article.

Zenodo
📢 v1.5 of our BIP! NDR dataset, containing >3.1M citations made by ~176K #openAccess Computer Science conference & workshop articles without DOI, is now out: https://zenodo.org/records/10888704 🤓
BIP! NDR (NoDoiRefs): a dataset of citations from papers without DOIs in computer science conferences and workshops

In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, raising limitations to citation analysis. While the Microsoft Academic Graph (MAG) previously addressed this issue by providing substantial coverage, its discontinuation  has created a void in available data.   BIP! NDR aims to alleviate this issue and enhance the research assessment processes within the field of Computer Science. To accomplish this, it leverages a workflow that identifies and retrieves Open Science papers lacking DOIs from the DBLP Corpus, and by performing text analysis, it extracts citation information directly from their full text. The current version of the dataset contains ~3.1M citations made by approximately 176K open access Computer Science conference or workshop papers that, according to DBLP, do not have a DOI. File Structure: The dataset is formatted as a JSON Lines (JSONL) file (one JSON Object per line) to facilitate file splitting and streaming.  Each JSON object has three main fields: “_id”: a unique identifier, “citing_paper”, the “dblp_id” of the citing paper, “cited_papers”: array containing the objects that correspond to each reference found in the text of the “citing_paper”; each object may contain the following fields: “dblp_id”: the “dblp_id” of the cited paper. Optional - this field is required if a “doi” is not present. “doi”: the doi of the cited paper. Optional - this field is required if a “dblp_id” is not present. “bibliographic_reference”: the raw citation string as it appears in the citing paper. Changes from previous version: Added more papers from DBLP.

Zenodo
📢 News Alert! The new version of BIP! DB is now available, featuring citation-based indicators calculated for >187mi research products. As a bonus, it's the 1st version available under CC0, making the data more accessible for everyone to use.
https://zenodo.org/records/10804822
#opendata
BIP! DB: A Dataset of Impact Measures for Research Products

This dataset contains citation-based impact indicators (a.k.a, "measures") for ~187,8M distinct PIDs (persistent identifiers) that correspond to research products (scientific publications, datasets, etc). In particular, for each PID, we have calculated the following indicators (organized in categories based on the semantics of the impact aspect that they better capture): Influence indicators (i.e., indicators of the "total" impact of each research product; how established it is in general) Citation Count: The total number of citations of the product, the most well-known influence indicator. PageRank score: An influence indicator based on the PageRank [1], a popular network analysis method. PageRank estimates the influence of each product based on its centrality in the whole citation network. It alleviates some issues of the Citation Count indicator (e.g., two products with the same number of citations can have significantly different PageRank scores if the aggregated influence of the products citing them is very different - the product receiving citations from more influential products will get a larger score).   Popularity indicators (i.e., indicators of the "current" impact of each research product; how popular the product is currently) RAM score: A popularity indicator based on the RAM [2] method. It is essentially a Citation Count where recent citations are considered as more important. This type of "time awareness" alleviates problems of methods like PageRank, which are biased against recently published products (new products need time to receive a number of citations that can be indicative for their impact). AttRank score: A popularity indicator based on the AttRank [3] method. AttRank alleviates PageRank's bias against recently published products by incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher's preference to examine products which received a lot of attention recently. Impulse indicators (i.e., indicators of the initial momentum that the research product received right after its publication) Incubation Citation Count (3-year CC): This impulse indicator is a time-restricted version of the Citation Count, where the time window length is fixed for all products and the time window depends on the publication date of the product, i.e., only citations 3 years after each product's publication are counted. More details about the aforementioned impact indicators, the way they are calculated and their interpretation can be found here and in the respective references (e.g., in [5]). From version 5.1 onward, the impact indicators are calculated in two levels: The PID level (assuming that each PID corresponds to a distinct research product). The OpenAIRE-id level (leveraging PID synonyms based on OpenAIRE's deduplication algorithm [4] - each distinct article has its own OpenAIRE id). Previous versions of the dataset only provided the scores at the PID level. From version 12 onward, two types of PIDs are included in the dataset: DOIs and PMIDs (before that version, only DOIs were included).  Also, from version 7 onward, for each product in our files we also offer an impact class, which informs the user about the percentile into which the product score belongs compared to the impact scores of the rest products in the database. The impact classes are: C1 (in top 0.01%), C2 (in top 0.1%), C3 (in top 1%), C4 (in top 10%), and C5 (in bottom 90%). Finally, before version 10, the calculation of the impact scores (and classes) was based on a citation network having one node for each product with a distinct PID that we could find in our input data sources. However, from version 10 onward, the nodes are deduplicated using the most recent version of the OpenAIRE article deduplication algorithm. This enabled a correction of the scores (more specifically, we avoid counting citation links multiple times when they are made by multiple versions of the same product). As a result, each node in the citation network we build is a deduplicated product having a distinct OpenAIRE id. We still report the scores at PID level (i.e., we assign a score to each of the versions/instances of the product), however these PID-level scores are just the scores of the respective deduplicated nodes propagated accordingly (i.e., all version of the same deduplicated product will receive the same scores). We have removed a small number of instances (having a PID) that were assigned (by error) to multiple deduplicated records in the OpenAIRE Graph. For each calculation level (PID / OpenAIRE-id) we provide five (5) compressed CSV files (one for each measure/score provided) where each line follows the format  "identifier <tab> score <tab> class". The parameter setting of each measure is encoded in the corresponding filename. For more details on the different measures/scores see our extensive experimental study [5] and the configuration of AttRank in the original paper. [3] Files for the OpenAIRE-ids case contain the keyword "openaire_ids" in the filename.   From version 9 onward, we also provide topic-specific impact classes for PID-identified products. In particular, we associated those products with 2nd level concepts from OpenAlex; we chose to keep only the three most dominant concepts for each product, based on their confidence score, and only if this score was greater than 0.3. Then, for each product and impact measure, we compute its class within its respective concepts. We provide finally the "topic_based_impact_classes.txt" file where each line follows the format "identifier <tab> concept <tab> pagerank_class <tab> attrank_class <tab> 3-cc_class <tab> cc_class". The data used to produce the citation network on which we calculated the provided measures have been gathered from the OpenAIRE Graph v7.1.0, including data from (a) OpenCitations' COCI & POCI dataset, (b) MAG [6,7], and (c) Crossref. The union of all distinct citations that could be found in these sources have been considered. In addition, versions later than v.10 leverage the filtering rules described here to remove from the dataset PIDs with problematic metadata. References: [1] R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab. [2] Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380 [3] I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020) [4]  P. Manghi, C. Atzori, M. De Bonis, A. Bardi, Entity deduplication in big data graphs for scholarly communication, Data Technologies and Applications (2020). [5] I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access) [6] Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839 [7] K. Wang et al., "A Review of Microsoft Academic Services for Science of Science Studies", Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045     Find our Academic Search Engine built on top of these data here. Further note, that we also provide all calculated scores through BIP! Finder's API.  Terms of use: These data are provided "as is", without any warranties of any kind. The data are provided under the Creative Commons Attribution 4.0 International license. More details about BIP! DB can be found in our relevant peer-reviewed publication: Thanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia Manola, Paolo Manghi: BIP! DB: A Dataset of Impact Measures for Scientific Publications. WWW (Companion Volume) 2021: 456-460 We kindly request that any published research that makes use of BIP! DB cite the above article.

Zenodo
@BipServices @opencitations And don't miss our special gift for #LoveDataWeek: beginning this week, we've transitioned our datasets' default license from CC-BY to CC0. Version 1.4 of the BIP! NDR dataset was the first such release on Monday. 🤓 #LoveData24
If you are using our datasets & services, let's together celebrate #LoveDataWeek by acknowledging the invaluable contributions of the datasets we rely on the most:
- the OpenAIRE Graph
- COCI & POCI by @opencitations
🙏🙏🙏 #LoveData24 #OpenData
📢 v1.4 of our BIP! NDR dataset, containing >2.9M citations made by ~171K #openAccess Computer Science conference & workshop articles without DOI, is now out: https://zenodo.org/records/10651965 🤓
BIP! NDR (NoDoiRefs): a dataset of citations from papers without DOIs in computer science conferences and workshops

In the field of Computer Science, conference and workshop papers serve as important contributions, carrying substantial weight in research assessment processes, compared to other disciplines. However, a considerable number of these papers are not assigned a Digital Object Identifier (DOI), hence their citations are not reported in widely used citation datasets like OpenCitations and Crossref, raising limitations to citation analysis. While the Microsoft Academic Graph (MAG) previously addressed this issue by providing substantial coverage, its discontinuation  has created a void in available data.     BIP! NDR aims to alleviate this issue and enhance the research assessment processes within the field of Computer Science. To accomplish this, it leverages a workflow that identifies and retrieves Open Science papers lacking DOIs from the DBLP Corpus, and by performing text analysis, it extracts citation information directly from their full text. The current version of the dataset contains ~2.9M citations made by approximately 171K open access Computer Science conference or workshop papers that, according to DBLP, do not have a DOI. File Structure: The dataset is formatted as a JSON Lines (JSONL) file (one JSON Object per line) to facilitate file splitting and streaming.  Each JSON object has three main fields: “_id”: a unique identifier, “citing_paper”, the “dblp_id” of the citing paper, “cited_papers”: array containing the objects that correspond to each reference found in the text of the “citing_paper”; each object may contain the following fields: “dblp_id”: the “dblp_id” of the cited paper. Optional - this field is required if a “doi” is not present. “doi”: the doi of the cited paper. Optional - this field is required if a “dblp_id” is not present. “bibliographic_reference”: the raw citation string as it appears in the citing paper. Changes from previous version: Added more papers from DBLP.

Zenodo
📢 BIP! DB v.12 is out!
⬇️ https://zenodo.org/records/10478958
It has been >3y after publishing our v0.1. From ~105mi records to 168.8mi a lot have changed. This version also includes scores for research products with PubMed IDs (for cases without a DOI).
BIP! DB: A Dataset of Impact Measures for Research Products

This dataset contains citation-based impact indicators (a.k.a, "measures") for ~168.8M distinct PIDs (persistent identifiers) that correspond to research products (scientific publications, datasets, etc). In particular, for each PID, we have calculated the following indicators (organized in categories based on the semantics of the impact aspect that they better capture): Influence indicators (i.e., indicators of the "total" impact of each research product; how established it is in general) Citation Count: The total number of citations of the product, the most well-known influence indicator. PageRank score: An influence indicator based on the PageRank [1], a popular network analysis method. PageRank estimates the influence of each product based on its centrality in the whole citation network. It alleviates some issues of the Citation Count indicator (e.g., two products with the same number of citations can have significantly different PageRank scores if the aggregated influence of the products citing them is very different - the product receiving citations from more influential products will get a larger score).   Popularity indicators (i.e., indicators of the "current" impact of each research product; how popular the product is currently) RAM score: A popularity indicator based on the RAM [2] method. It is essentially a Citation Count where recent citations are considered as more important. This type of "time awareness" alleviates problems of methods like PageRank, which are biased against recently published products (new products need time to receive a number of citations that can be indicative for their impact). AttRank score: A popularity indicator based on the AttRank [3] method. AttRank alleviates PageRank's bias against recently published products by incorporating an attention-based mechanism, akin to a time-restricted version of preferential attachment, to explicitly capture a researcher's preference to examine products which received a lot of attention recently. Impulse indicators (i.e., indicators of the initial momentum that the research product received right after its publication) Incubation Citation Count (3-year CC): This impulse indicator is a time-restricted version of the Citation Count, where the time window length is fixed for all products and the time window depends on the publication date of the product, i.e., only citations 3 years after each product's publication are counted. More details about the aforementioned impact indicators, the way they are calculated and their interpretation can be found here and in the respective references (e.g., in [5]). From version 5.1 onward, the impact indicators are calculated in two levels: The PID level (assuming that each PID corresponds to a distinct research product). The OpenAIRE-id level (leveraging PID synonyms based on OpenAIRE's deduplication algorithm [4] - each distinct article has its own OpenAIRE id). Previous versions of the dataset only provided the scores at the PID level. From version 12 onward, two types of PIDs are included in the dataset: DOIs and PMIDs (before that version, only DOIs were included).  Also, from version 7 onward, for each product in our files we also offer an impact class, which informs the user about the percentile into which the product score belongs compared to the impact scores of the rest products in the database. The impact classes are: C1 (in top 0.01%), C2 (in top 0.1%), C3 (in top 1%), C4 (in top 10%), and C5 (in bottom 90%). Finally, before version 10, the calculation of the impact scores (and classes) was based on a citation network having one node for each product with a distinct PID that we could find in our input data sources. However, from version 10 onward, the nodes are deduplicated using the most recent version of the OpenAIRE article deduplication algorithm. This enabled a correction of the scores (more specifically, we avoid counting citation links multiple times when they are made by multiple versions of the same product). As a result, each node in the citation network we build is a deduplicated product having a distinct OpenAIRE id. We still report the scores at PID level (i.e., we assign a score to each of the versions/instances of the product), however these PID-level scores are just the scores of the respective deduplicated nodes propagated accordingly (i.e., all version of the same deduplicated product will receive the same scores). We have removed a small number of instances (having a PID) that were assigned (by error) to multiple deduplicated records in the OpenAIRE Graph. For each calculation level (PID / OpenAIRE-id) we provide five (5) compressed CSV files (one for each measure/score provided) where each line follows the format  "identifier <tab> score <tab> class". The parameter setting of each measure is encoded in the corresponding filename. For more details on the different measures/scores see our extensive experimental study [5] and the configuration of AttRank in the original paper. [3] Files for the OpenAIRE-ids case contain the keyword "openaire_ids" in the filename.   From version 9 onward, we also provide topic-specific impact classes for PID-identified products. In particular, we associated those products with 2nd level concepts from OpenAlex; we chose to keep only the three most dominant concepts for each product, based on their confidence score, and only if this score was greater than 0.3. Then, for each product and impact measure, we compute its class within its respective concepts. We provide finally the "topic_based_impact_classes.txt" file where each line follows the format "identifier <tab> concept <tab> pagerank_class <tab> attrank_class <tab> 3-cc_class <tab> cc_class". The data used to produce the citation network on which we calculated the provided measures have been gathered from the OpenAIRE Graph v7.0.0, including data from (a) OpenCitations' COCI & POCI dataset, (b) MAG [6,7], and (c) Crossref. The union of all distinct citations that could be found in these sources have been considered. In addition, versions later than v.10 leverage the filtering rules described here to remove from the dataset PIDs with problematic metadata. References: [1] R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab. [2] Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380 [3] I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Ranking Papers by their Short-Term Scientific Impact. CoRR abs/2006.00951 (2020) [4]  P. Manghi, C. Atzori, M. De Bonis, A. Bardi, Entity deduplication in big data graphs for scholarly communication, Data Technologies and Applications (2020). [5] I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019 (early access) [6] Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. 2015. An Overview of Microsoft Academic Service (MA) and Applications. In Proceedings of the 24th International Conference on World Wide Web (WWW '15 Companion). ACM, New York, NY, USA, 243-246. DOI=http://dx.doi.org/10.1145/2740908.2742839 [7] K. Wang et al., "A Review of Microsoft Academic Services for Science of Science Studies", Frontiers in Big Data, 2019, doi: 10.3389/fdata.2019.00045     Find our Academic Search Engine built on top of these data here. Further note, that we also provide all calculated scores through BIP! Finder's API.  Terms of use: These data are provided "as is", without any warranties of any kind. The data are provided under the Creative Commons Attribution 4.0 International license. More details about BIP! DB can be found in our relevant peer-reviewed publication: Thanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia Manola, Paolo Manghi: BIP! DB: A Dataset of Impact Measures for Scientific Publications. WWW (Companion Volume) 2021: 456-460 We kindly request that any published research that makes use of BIP! DB cite the above article.

Zenodo
SCHEMA is now on mastodon! We offer to the wider research community an open-source platform to execute computational analysis on heterogeneous clusters, leveraging containerization and workflow management. Stay tuned for our latest updates!
Are you interested to see how we are experimenting with new ideas for academic profiles in the @graspos_project? Our training event on BIP! Scholar (https://bip.imsi.athenarc.gr/scholar) is a nice start.
👉Read more: https://graspos.eu/bip-scholar-redefining-academic-profiles
👉 Slides: https://zenodo.org/records/10067215
@BipServices
BIP! Services - Scholar