wos_processing_pipeline.ipynb minor update, addresses are now properly exploded, updated query keywords + searchresult analysis demo

utku_keyword_suggestion
radvanyimome 2 years ago
parent c1e72fb904
commit 904710e47d

Binary file not shown.

@ -9,7 +9,7 @@ image classification,
reinforcement learning, reinforcement learning,
support vector machine*, support vector machine*,
recommender system*, recommender system*,
random forest, random forest*,
ensemble model*, ensemble model*,
image processing, image processing,
generative network*, generative network*,
@ -29,7 +29,7 @@ convolutional network*,
convolutional neural, convolutional neural,
adversarial network*, adversarial network*,
adversarial neural, adversarial neural,
adversarial machine, adversarial machine*,
autoencoder*, autoencoder*,
gated recurrent unit*, gated recurrent unit*,
perceptron*, perceptron*,
@ -42,7 +42,7 @@ gradient descent,
k-nearest neighbor*, k-nearest neighbor*,
naive bayes, naive bayes,
transfer learning, transfer learning,
fuzzy logic, fuzzy logic*,
backpropagation, backpropagation,
computational modeling, computational modeling,
computational statistic*, computational statistic*,
@ -79,8 +79,8 @@ deep belief network*,
quantum machine learning, quantum machine learning,
artificial immune system*, artificial immune system*,
swarm robotics, swarm robotics,
autonomous agents, autonomous agent*,
machine ethics, machine ethic*,
collaborative filtering, collaborative filtering,
content based filtering, content based filtering,
pervasive computing, pervasive computing,
@ -142,9 +142,31 @@ KNN,
singular value decomposition, singular value decomposition,
regularization, regularization,
turing test, turing test,
turing-test,
computational learning theory, computational learning theory,
backward chaining, backward chaining,
forward chaining, forward chaining,
entity annotation, entity annotation,
entity extraction entity extraction,
scalable computing,
expectation maximization algorithm*,
markov chain,
markov process,
markov decision process,
monte carlo method,
bayesian interference,
kernel method,
eigendecomposition,
eigen decomposition,
kernel method,
radial basis function,
QR decomposition,
LU decomposition,
Cholesky decomposition,
spectral theorem,
model selection,
lagrange multiplier,
convex optimization,
nonlinear optimization,
L? regulari*,
ridge regression,
gaussian process

File diff suppressed because it is too large Load Diff

@ -0,0 +1,13 @@
Publication Years Record Count % of 45 355
2022 9081 20.022
2021 8630 19.028
2020 6800 14.993
2019 5502 12.131
2018 4087 9.011
2017 2816 6.209
2016 2338 5.155
2015 1818 4.008
2014 1571 3.464
2013 1135 2.502
2012 863 1.903
2011 714 1.574

File diff suppressed because one or more lines are too long

@ -2,7 +2,7 @@
"cells": [ "cells": [
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 59, "execution_count": 72,
"metadata": { "metadata": {
"collapsed": true "collapsed": true
}, },
@ -16,7 +16,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 60, "execution_count": 73,
"outputs": [], "outputs": [],
"source": [ "source": [
"agg_df = pd.DataFrame()\n", "agg_df = pd.DataFrame()\n",
@ -39,11 +39,12 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 61, "execution_count": 74,
"outputs": [], "outputs": [],
"source": [ "source": [
"agg_df[\"region\"] = agg_df[\"query\"].apply(lambda x: \"EU+China\" if \"CU\" in x else \"Global\")\n", "agg_df[\"region\"] = agg_df[\"query\"].apply(lambda x: \"EU+China\" if \"CU\" in x else \"Global\")\n",
"agg_df[\"kw_token\"] = agg_df[\"query\"].apply(lambda x: x.split(\"TS=(\")[-1].split(\")\")[0])" "agg_df[\"kw_token\"] = agg_df[\"query\"].apply(lambda x: x.split(\"TS=(\")[-1].split(\")\")[0])\n",
"agg_df[\"kw_token\"] = agg_df[\"kw_token\"].apply(lambda x: \"OR COMPOSITE\" if \" OR \" in x else x)"
], ],
"metadata": { "metadata": {
"collapsed": false "collapsed": false
@ -51,9 +52,11 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 61, "execution_count": 83,
"outputs": [], "outputs": [],
"source": [], "source": [
"agg_df = agg_df[~agg_df[\"Record Count\"].isna()]"
],
"metadata": { "metadata": {
"collapsed": false "collapsed": false
} }
@ -94,13 +97,13 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 64, "execution_count": 84,
"outputs": [ "outputs": [
{ {
"data": { "data": {
"text/plain": "Publication Years\n2022 268\n2021 260\n2019 258\n2020 258\n2018 250\n2017 243\n2016 237\n2015 227\n2014 215\n2013 208\n2012 193\n2011 184\n2023 44\n2014 4\n2019 4\n2017 4\n2018 4\n2020 4\n2022 4\n2021 4\n2016 3\n2015 3\n2013 3\n2012 3\n2011 3\n2023 2\nShowing 25 out of 29 entries 1\nShowing 25 out of 205 entries 1\n8 record(s) (0.025%) do not contain data in the field being analyzed 1\nShowing 25 out of 85 entries 1\nShowing 25 out of 189 entries 1\n1 record(s) (0.011%) do not contain data in the field being analyzed 1\nName: count, dtype: int64" "text/plain": "Publication Years\n2022 314\n2019 305\n2021 305\n2020 302\n2018 296\n2017 287\n2016 281\n2015 271\n2014 258\n2013 251\n2012 233\n2011 224\n2023 52\n2017 4\n2014 4\n2019 4\n2021 4\n2018 4\n2020 4\n2022 4\n2016 3\n2015 3\n2013 3\n2012 3\n2011 3\n2023 2\nName: count, dtype: int64"
}, },
"execution_count": 64, "execution_count": 84,
"metadata": {}, "metadata": {},
"output_type": "execute_result" "output_type": "execute_result"
} }
@ -123,7 +126,7 @@
}, },
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": 65, "execution_count": 85,
"outputs": [], "outputs": [],
"source": [ "source": [
"agg_df.to_excel(r'C:\\Users\\radvanyi\\PycharmProjects\\ZSI_analytics\\WOS\\wos_processed_data\\query_yearly_agg.xlsx', index=False)" "agg_df.to_excel(r'C:\\Users\\radvanyi\\PycharmProjects\\ZSI_analytics\\WOS\\wos_processed_data\\query_yearly_agg.xlsx', index=False)"

File diff suppressed because one or more lines are too long
Loading…
Cancel
Save